After OpenClaw went public earlier this year, the next question I kept seeing in the community was: “Does multi-agent really make a difference? Isn’t splitting things up harder to manage?”

Below is a practical multi-agent architecture: Opus handles strategy, Sonnet handles the repetitive grind, and Codex writes code. I will cover the division logic, shared memory, scheduled collaboration, and the mistakes I ran into.

Why One Is Not Enough

At first, I only had one agent and asked it to do everything: write articles, edit code, run schedules, scan news, all inside the same Claude Code session.

After running that way for a while, the problems started showing up. It wrote articles well, but occasionally introduced bugs while editing code. After editing code, it came back to writing and forgot the tone settings from earlier.

One real incident: I asked Opus to change one time field in cron/jobs.json. It changed the field, but also broke the quote format on another JSON key. Every scheduled job failed that night.

When Codex handled the same task, it changed the format and did not touch one extra character. But when I asked Codex to write social copy, the result sounded like reading a README.

Sonnet is fast and cheap for batch tasks, but when I let it decide an article title, it picked the one with the highest SEO score and the tone was completely wrong.

When one agent does everything, quality drops during role switching. After splitting the work, each agent only does what it is most reliable at, and the error rate drops noticeably.

Penchan surrounded by a few little robot friends, realizing one agent is not enough

The Four-Agent Division of Labor

Opus: Strategy Brain

Opus is the core decision-maker in the whole system.

It is responsible for:

  • Content strategy and article writing
  • Memory system management (deciding what to remember and what to forget)
  • Cross-agent coordination (deciding who should handle each task)
  • Quality review (Opus reviews the code Codex writes and the results Sonnet produces)

There is one hard rule: Opus does not edit code directly. Opus can occasionally break things when it edits code, but it is very good at code review. Let it read, judge, and review. Keep its hands off the keyboard and give implementation to Codex.

Opus sessions are usually the longest because strategy conversations need a lot of context. It reads the most memory files. Project files, MEMORY.md, and the day’s journal together are two or three times heavier than what the other agents read.

Sonnet: Fast Execution

Sonnet handles work that does not require judgment. The standard is simple: if you can write an SOP and have an intern follow it, give it to Sonnet.

Its task list:

  • Capturing screenshots from videos and converting files
  • Batch formatting data
  • Template-based social post drafts
  • Fixed-format data cleanup

Sonnet’s advantage is speed and low token cost. For the price of one Opus task, you can run several Sonnet tasks. For high-volume work that does not require creativity, the cost difference is obvious.

A mistake I hit: I gave Sonnet a task that required judgment, and it very diligently executed the wrong direction. Fixing it later took more time. My stable rule is this: if you need to think before deciding how to do it, it is not a Sonnet task.

Codex: Code Specialist

All code-related work goes through Codex: feature development, bug fixes, test writing, and refactors.

I pin it to the latest ChatGPT version. Lower Codex versions produced unstable code quality for me. That one came from painful experience.

Codex is precise, but it does not proactively think through ambiguity. Give it a clear spec and the code quality is high. Give it a vague request and it may choose an implementation you did not want. So Opus writes the requirements clearly first, then hands them off to Codex.

Workflow: Opus does technical planning and writes the spec → Codex implements → Opus or Sonnet reviews the code. This three-step flow brought the bug rate down.

Penchan and four little robot friends with different personalities working on separate tasks

How Shared Memory Is Set Up

The 4 agents read the same .openclaw/ directory. No extra integration is needed. MEMORY.md is the shared index, and brain.md is the shared working memory.

The main thing is to write each agent’s write scope clearly in AGENTS.md:

AgentCan readCan write
OpusEverythingMemory files, strategy files, brain.md
SonnetTask-relatedTask output files
CodexCode + specCode files

Why separate them? If two agents edit brain.md at the same time, they can create conflicting versions, and spending an hour merging them by hand would not be surprising. The stable approach is to let only Opus write brain.md; the other agents read it but do not write to it.

Penchan placing a shared memory notebook on the desk while little robots take turns reading it

Scheduled Collaboration

Scheduled tasks also have a division of labor.

Using daily social content as an example:

  1. 6:00 a.m.: Opus reads the news summary and decides today’s social topic
  2. 7:00 a.m.: Sonnet uses the topic Opus selected and generates a draft from a template
  3. 8:00 a.m.: Opus reviews the draft and adjusts tone and content
  4. Scheduled publish time: Buffer schedules the social post

This process runs automatically every day. I only step in occasionally during review; most of the time Opus can judge it on its own.

Dependencies between schedules are controlled by time gaps in cron. The 7:00 task assumes the 6:00 result has already been written. If the 6:00 task is delayed or fails, the 7:00 task will also have problems.

You can add a health-check schedule that specifically checks whether the previous task completed normally. This prevents the awkward situation where a schedule breaks overnight and you only notice the next day.

Penchan collaborating with little robots in a warm studio, following the rhythm of a clock

Pitfalls I Hit

Pitfall 1: Letting Opus Edit Code Directly

Opus is very good at code review, but when it edits directly, it occasionally breaks format or logic. It either overthinks or keeps cycling through revisions. Since I separated the roles, code quality has been noticeably more stable. (My read: Opus is capable of writing code, but when it has to make decisions and write code at the same time, the error rate goes up.)

Pitfall 2: Shared Memory Without Write Permissions

Two agents wrote to the same file at the same time and created conflicts. The fix is the write-permission table above.

Pitfall 3: Sonnet Made Decisions It Should Not Have Made

I let Sonnet decide an article title. It picked the title with the highest SEO score, but the tone was completely wrong. Since then, anything requiring taste or judgment goes to Opus.

Pitfall 4: Handoff Files Were Too Scattered

At first, every project had its own handoff file. That meant Opus had to read five or six places just to know what new tasks existed. After consolidating everything into one projects/handoff.md, the flow became much smoother.

Penchan and little robots standing beside messy tools, looking back and organizing the pitfalls they hit

How to Start from One Agent

My recommended expansion route:

Step 1: Opus + Codex. This is the most noticeable split. Keep strategy and code in their own lanes. This step alone gives a clear quality boost.

Step 2: Add Sonnet. Take the mechanical tasks sitting on Opus’s plate and throw them to Sonnet. It saves tokens and time.

Each time you add an agent, run it for two weeks and make sure it is stable before adding the next one. Turning everything on in a rush mostly creates system problems you then have to spend time fixing.

Penchan holding a little robot friend's hand while walking down an evening path

Further Reading


Penchan’s Take

My main stack is three agents on OpenClaw: Opus / Sonnet / Codex. I use it every day to manage two brand accounts, multiple publishing channels, and more than a dozen scheduled tasks. The biggest thing I learned in practice is that Opus is much steadier at code review than hands-on coding. Pairing it with Codex for concrete implementation makes the division very clear. OpenClaw is not a tool everyone should pick up. Make sure you actually have repetitive automation tasks before adopting it, otherwise it is easy to do a lot of setup work for no real gain.

FAQ

Q: Why do I need multiple agents? Isn’t one enough?

One can work, but quality is unstable. The model that is good at writing articles and the model that is good at writing code are strong in very different ways. When each agent only does what it is best at, the overall quality gets much better.

Q: Does multi-agent setup get expensive?

It depends on how you assign work. Opus is the most expensive, but I only use it for strategy and writing. Mechanical tasks go to Sonnet, which is much cheaper. In practice, most monthly cost goes into Opus writing work; Sonnet is not a large share.

Q: How do agents communicate with each other?

Through the file system. Agents do not chat with each other in real time. They pass information through shared memory files. Opus writes a handoff file; Codex reads it and knows what to do. Simple, but effective.

Q: Won’t 4 agents running at the same time conflict with each other?

They will. The fix is clear write permissions: each agent can only write the file scope it owns. Opus writes strategy and memory, Codex only touches code, and two agents should not edit the same file at the same time.

Q: Can I use only 2 agents?

Yes. I recommend starting with Opus + Codex: one handles strategy, one handles code. Once that is stable, then consider adding Sonnet.

Q: How do I know which agent should handle a task?

Rule of thumb: tasks needing judgment and creativity go to Opus, purely mechanical execution goes to Sonnet, and anything code-related goes to Codex. If a task is fuzzy, send it to Opus first and let it decide whether to delegate.


— Penchan