Skip to content

Multi-Agent Systems: When One AI Is Not Enough

Posted on:June 29, 2026

Welcome, Developer đź‘‹

If you followed along with Parts 1 and 2 of this series, you’ve got a working MCP agent with persistent SQLite memory. It manages tasks, it remembers things across restarts, it’s genuinely useful.

And then you try to ask it to do something complex.

Research a topic, summarize the findings, write a report, send it. Or fetch data from three different APIs, normalize it, run an analysis, return a structured result. The agent starts working. It gets partway through. Then it either hits its context limit, loses track of which step it’s on, or produces something that would have been fine on its own but doesn’t fit the rest of the output.

I’ve been there. And tweaking the prompt doesn’t fix it. It’s more of an architecture problem than anything else. The answer for that is multi-agent systems.


Why Single Agents Hit a Ceiling

A single agent has one context window, one thread of reasoning, one set of tools. For simple stuff, that’s fine. For anything with real complexity, it starts breaking down pretty quickly.

Here’s what usually gives first:

Context window exhaustion. Reading 10 documents, reasoning across all of them, and producing structured output is a lot to ask from a single context. The agent either starts truncating or starts hallucinating what it can’t fit anymore.

Sequential bottlenecks. Every step runs in the same agent, so nothing happens in parallel. A research task you could split across three agents in 30 seconds takes three minutes when it all runs in series.

Lack of specialization. An agent asked to be a researcher, a writer, and a code reviewer at the same time ends up being average at all three. Specialists beat generalists on narrow tasks, and that’s true for AI agents too. A focused system prompt and a small tool set produce better results than one giant prompt trying to cover everything.

Error propagation. One step fails and the whole thing falls over. No isolation, no smart retry, no way to pick up from where things actually went wrong.

Multi-agent architecture deals with all of this. But it brings its own set of problems, and it’s worth knowing what those are before you start building.


What a Multi-Agent System Actually Is

A multi-agent system is a group of AI agents that coordinate to get something done. Each agent has a role, a set of tools, and a clear scope. An orchestrator agent coordinates the others, usually called the “router” or the “planner” depending on who you ask.

The mental model that clicked for me: think of it like a small engineering team.

You’ve got a tech lead (the orchestrator) who understands the full picture and breaks it into pieces. You’ve got specialists (subagents) who own specific parts: one does research, one handles writing, one does code, one does QA. The tech lead doesn’t write all the code. They assign it, review the output, and pull the pieces together.

Here’s the orchestrator in its simplest form. Note that it doesn’t execute any of the work itself. It only produces a plan:

// orchestrator.ts
// Uses the official Anthropic SDK. Install with: npm install @anthropic-ai/sdk
import Anthropic from "@anthropic-ai/sdk";
 
// The client reads ANTHROPIC_API_KEY from the environment by default.
const client = new Anthropic();
 
// The orchestrator's only job is to turn a high-level goal into a plan:
// which subtasks exist, and which specialist agent owns each one.
async function planWork(goal: string): Promise<string> {
  const response = await client.messages.create({
    model: "claude-opus-4-6",
    max_tokens: 2048,
    // A tight system prompt is what makes this agent a "planner" and nothing else.
    // Specialization starts here, in the role definition.
    system: `You are an orchestrator agent. Your job is to:
1. Break the user's goal into discrete subtasks
2. Assign each subtask to the right specialist agent
3. Return the plan only. Do NOT attempt the work yourself.
 
Available agents:
- researcher: finds and summarizes information
- writer: produces structured written output
- reviewer: checks output for accuracy and completeness
 
Respond ONLY with JSON in this shape, no prose:
{ "steps": [{ "agent": "researcher", "task": "..." }] }`,
    messages: [{ role: "user", content: goal }],
  });
 
  // A message can contain several content blocks (text, tool_use, etc.).
  // For a plain text plan we expect the first block to be text.
  const block = response.content[0];
  if (block.type !== "text") {
    throw new Error(`Expected a text block, got: ${block.type}`);
  }
  return block.text;
}

The orchestrator plans, delegates, and integrates. It doesn’t do the work. That separation is the whole point, and it’s what lets the system scale beyond what one agent can hold in its head at once.

For everything below, assume a small helper that runs a single specialist agent and returns its text output. It wraps the same messages.create call, just with a role-specific system prompt:

// Runs one specialist agent (researcher, writer, reviewer, ...) and returns its text.
// Each agent name maps to its own system prompt and, in a real system, its own tool set.
async function runAgent(agent: string, task: string): Promise<string> {
  const response = await client.messages.create({
    model: "claude-opus-4-6",
    max_tokens: 4096,
    system: SYSTEM_PROMPTS[agent], // a lookup of role -> system prompt
    messages: [{ role: "user", content: task }],
  });
 
  const block = response.content[0];
  if (block.type !== "text") {
    throw new Error(
      `Agent "${agent}" returned a non-text block: ${block.type}`,
    );
  }
  return block.text;
}

The Three Core Patterns

There are three patterns that cover most of what you’ll actually build. Learn these and you can handle almost any workflow.

1. Sequential Pipeline

Agents run one after another. The output of each one becomes the input to the next. This is the simplest pattern and honestly the right place to start.

User Input → Researcher → Writer → Reviewer → Final Output

Use this when the steps have strict ordering. Research has to finish before writing makes sense. Writing has to finish before review makes sense.

async function sequentialPipeline(topic: string): Promise<string> {
  // Each step blocks on the previous one because each depends on its output.
  const research = await runAgent(
    "researcher",
    `Research this topic: ${topic}`,
  );
  const draft = await runAgent(
    "writer",
    `Write a report from this research:\n\n${research}`,
  );
  const final = await runAgent(
    "reviewer",
    `Review and improve this draft:\n\n${draft}`,
  );
 
  return final;
}

The catch is that it’s as slow as the sum of all its steps. If one agent fails, the whole pipeline goes down. Add retry logic at each step or you’ll spend a lot of time babysitting half-finished runs.

2. Parallel Fan-Out

The orchestrator fires off multiple agents at the same time and waits for all of them. Use this when the subtasks don’t depend on each other.

                 → Agent A (topic 1) →
User Input → Orchestrator → Agent B (topic 2) → Aggregator → Output
                 → Agent C (topic 3) →

The naive version uses Promise.all, but be careful: it rejects as soon as any single promise rejects. One agent throws and you lose every result, including the ones that already succeeded. For independent work, that’s usually the wrong trade-off.

Promise.allSettled is the safer default. It waits for every promise to finish and tells you, per agent, whether it succeeded or failed:

async function parallelFanOut(topics: string[]): Promise<string[]> {
  // Kick off all agents at once. They run concurrently, not in series.
  const settled = await Promise.allSettled(
    topics.map((topic) =>
      runAgent("researcher", `Research this topic: ${topic}`),
    ),
  );
 
  // Keep the successes. The type guard narrows the union so `.value` is typed as string.
  const successes = settled
    .filter(
      (r): r is PromiseFulfilledResult<string> => r.status === "fulfilled",
    )
    .map((r) => r.value);
 
  // Surface the failures instead of swallowing them. In production, log these.
  const failures = settled.filter((r) => r.status === "rejected");
  if (failures.length > 0) {
    console.warn(`${failures.length}/${topics.length} research agents failed`);
  }
 
  return successes;
}

This is where multi-agent systems start pulling ahead on speed. Three research tasks that each take 20 seconds finish in about 20 seconds total, not 60. The aggregator step (here, just keeping the successes) is where you’d merge results before handing them to the next stage.

3. Hierarchical with Feedback Loops

The most powerful pattern, and also the most complex. A reviewer agent checks the output, and if it isn’t good enough, the work goes back for another attempt with the feedback attached.

Orchestrator → Worker → Output → Reviewer → [pass: done | fail: retry with feedback]
async function withFeedbackLoop(task: string, maxRetries = 3): Promise<string> {
  let attempt = 0;
  let output = "";
 
  // The retry cap is not optional. Without it, a worker and reviewer that never
  // agree will loop forever and burn tokens. Always bound the loop.
  while (attempt < maxRetries) {
    output = await runAgent("writer", task);
 
    // The reviewer is a gate. It decides whether the work moves forward.
    const review = await runAgent(
      "reviewer",
      `Reply with PASS or FAIL on the first line, then explain:\n\n${output}`,
    );
 
    if (review.startsWith("PASS")) {
      return output;
    }
 
    // Feed the rejection back in so the next attempt is informed, not blind.
    task = `${task}\n\nYour previous attempt was rejected.\nReviewer feedback:\n${review}\n\nAddress these issues and try again.`;
    attempt++;
  }
 
  // Hit the cap without a PASS. Return the best effort and let the caller decide.
  // Some systems escalate to a human here instead.
  return output;
}

This is the pattern I reach for when quality actually matters and I can’t afford to just ship the first thing the agent produces. The reviewer acts as a gate. Work doesn’t move forward until it passes, or until you run out of retries and fall back to a human.


The Coordination Problems No One Warns You About

Here’s the part most posts skip. Multi-agent systems look great in demos. In production, you run into coordination problems that have nothing to do with how smart your agents are.

State Sharing

Agents don’t share memory by default. If Agent A figures something out, Agent B has no idea unless you explicitly pass it over. Each runAgent call is a fresh request with no knowledge of the others.

A shared context object that travels between agents solves this. Treat it like application state: pass it in, return a new copy with the updates, never mutate it in place:

interface AgentContext {
  goal: string; // the original objective, carried through every step
  findings: Record<string, string>; // what each agent has produced so far, keyed by agent
  errors: string[]; // non-fatal problems worth carrying forward
  completedSteps: string[]; // audit trail of which agents have run
}
 
async function runWithContext(
  agentName: string,
  task: string,
  context: AgentContext,
): Promise<{ result: string; updatedContext: AgentContext }> {
  // Give the agent what previous agents discovered, so it doesn't start from zero.
  const prompt = `Context so far:\n${JSON.stringify(context.findings, null, 2)}\n\nYour task: ${task}`;
 
  const result = await runAgent(agentName, prompt);
 
  // Return a NEW context object instead of mutating the old one.
  // Immutable updates keep state predictable when agents run concurrently.
  return {
    result,
    updatedContext: {
      ...context,
      findings: { ...context.findings, [agentName]: result },
      completedSteps: [...context.completedSteps, agentName],
    },
  };
}

The immutability matters more than it looks. The moment you have agents running in parallel and mutating a shared object, you get races that are painful to reproduce. New copy every time sidesteps the whole class of bug.

Infinite Loops

Feedback loops are great until a worker and a reviewer can’t agree on what “good” means and they just keep bouncing work back and forth.

You saw the guard already: cap the retries, track the attempt count, and have a fallback for when you hit the limit. The fallback is the part people forget. Returning a best-effort result or escalating to a human is almost always better than looping until the request times out.

Tool Conflicts

When multiple agents share the same tools (a database, an API, a file system), you get race conditions. Two agents writing to the same resource at the same time will produce results you didn’t expect and probably can’t reproduce.

The straightforward fix is isolation: give each agent its own scoped access. Two agents need the database? Give each one a separate connection scoped to its slice of the data, ideally with permissions narrow enough that one agent physically can’t touch another’s rows. Don’t share a single mutable resource across concurrent agents and hope for the best.

Cost and Latency Accumulation

Every agent call is an API call that costs tokens and time. Three agents running in parallel with up to three retries each means up to nine calls for a single user request. At scale that adds up fast, and not in a good way.

Track token usage per agent and per workflow from the start. Set hard limits. Know what your p99 latency looks like when agents start hitting retry limits, because that’s the number your users actually feel. These are operational concerns, not implementation details, and skipping them early means debugging production incidents later.


A2A: A Real Protocol for Agents Talking to Each Other

Everything above wires agents together with plain function calls inside one codebase. That works when you own all the agents. It stops working the moment you need to talk to an agent built by another team, on another framework, running on another server.

That’s the problem A2A (Agent-to-Agent) solves. It started at Google in April 2025 and is now governed by the Linux Foundation, reaching a stable v1.0 in 2026 with adoption across Microsoft, AWS, Salesforce, and others. The useful way to think about it: MCP connects an agent to its tools; A2A connects an agent to other agents. They’re complementary, and most serious agentic systems end up using both.

The protocol runs over HTTPS using JSON-RPC 2.0, and it’s built around a few core concepts:

An Agent Card is just JSON, so it’s easy to read:

{
  "name": "research-agent",
  "description": "Researches topics and returns structured summaries",
  "url": "https://agents.example.com/research",
  "version": "1.0.0",
  "capabilities": { "streaming": true },
  "skills": [
    {
      "id": "summarize-topic",
      "description": "Given a topic, return a sourced summary",
      "inputModes": ["text"],
      "outputModes": ["text"]
    }
  ]
}

The flow is: your client agent fetches a remote agent’s Card from its well-known URL, authenticates using the scheme the Card declares, then sends a Task and tracks it until it reaches a terminal state. You don’t hand-roll any of this. The official SDKs (Python, JavaScript, Java, Go, .NET) handle the transport, and native support ships in frameworks like LangGraph, CrewAI, and Semantic Kernel.

The practical takeaway: for agents inside one service, direct function calls like the patterns above are simpler and you should prefer them. Reach for A2A when agents need to cross a boundary, a different team, a different framework, a different deployment, and you want a standard contract instead of bespoke glue code for every integration.


What to Build Next

If you’ve been following the series, here’s where I’d go from here:

  1. Start with the sequential pipeline. Take the MCP agent from Part 2 and wire up a second agent that takes its output as input. Get comfortable passing context before you touch parallelism.

  2. Add parallel fan-out for independent tasks. Find a workflow where multiple topics need research at the same time. Use Promise.allSettled, keep the successes, and log the failures instead of swallowing them.

  3. Add a reviewer with a feedback loop. Pick your most quality-sensitive output and add a second agent whose only job is to reject work that doesn’t pass. Watching those two interact teaches you a lot about prompt design.

  4. Instrument from day one. Log agent name, task, token usage, latency, and success or failure on every call. You’ll need this data sooner than you think, and it’s much harder to add after something breaks.

Multi-agent systems are harder to debug than single agents because failures are distributed. When something goes wrong, you don’t always know which agent caused it or at which step. Good logging is what keeps that from becoming a nightmare.


Conclusion

Single agents are the right starting point. Simpler to build, easier to debug, good enough for most things. But there’s a ceiling, and once you hit it, no amount of prompt tuning is going to fix it.

Multi-agent systems push that ceiling way up. The same principles that make distributed systems work, namely specialization, parallelism, isolation, and clear interfaces, apply here too. That’s not a coincidence. A multi-agent system is a distributed system, and the moment you treat it like one, the hard parts get a lot more predictable.

The patterns aren’t that hard. The coordination problems are real but manageable if you plan for failure from the start and don’t skip the logging.

Start with the sequential pipeline. Add parallelism when you need the speed. Add feedback loops when quality is non-negotiable. Reach for A2A when your agents need to cross a boundary. And keep your context object explicit, immutable, and always in scope.

See you in the next post, Developer. Stay focused!