prompt

How can we achieve best practices for constructing multi-agent AI systems?

Lately, I've been hearing a lot about multi-agent AI systems. As someone who is always thinking about not just using these services but building them myself, I've been keen to know how to construct high-performance AI agents. Last week, Anthropic published an article titled, "How we built our multi-agent research system(1)," which describes their construction method in detail. So today, using this article as a reference, I'd like to explore the best practices for creating multi-agent AI systems with all of you. Let's get started!

 



1. Why do we need so many agents?

ChatGPT, which debuted at the end of November 2022, was a single model. Since then, several services using generative AI have appeared, but initially, most of them used a single AI. So why have we recently seen a rise in methods that connect multiple generative AIs to operate as a single system? I believe it's because it has become clear that there are limits to what a single generative AI can accomplish when faced with complex tasks. It has gradually become apparent that by connecting and integrating several agents, even complex tasks can be handled. This trend has become particularly noticeable in conjunction with the performance improvements of standalone generative AI models like Gemini 1.5 Pro and OpenAI's o3.

 

2. What kind of agent structure should we build?

The Anthropic article included a wonderful chart that I'd love to reference. The key lies with the "Lead agent" and the "sub-agents" placed beneath it.

Here is Anthropic's explanation: "The multi-agent architecture in action: user queries flow through a lead agent that creates specialized subagents to search for different aspects in parallel" . While the chart shows three sub-agents, it's a matter of course that more may be needed to handle more complex tasks.

 

3. How do you coordinate many agents?

I've described the move to multi-agent AI as if it's all upside, but it requires numerous AI agents to function as expected. Getting a desired response from a single generative AI can be quite a challenge, so is it even possible to control multiple, simultaneously operating AI agents to meet our expectations? The key seems to lie in the "prompt." In fact, the Anthropic article contains countless, very helpful methods for prompt creation. Here, I'd like to introduce two representative examples. For the rest, I highly recommend reading the original article for yourself.

"Teach the orchestrator how to delegate. In our system, the lead agent decomposes queries into subtasks and describes them to subagents. Each subagent needs an objective, an output format, guidance on the tools and sources to use, and clear task boundaries. Without detailed task descriptions, agents duplicate work, leave gaps, or fail to find necessary information.

"Guide the thinking process. Extended thinking mode, which leads Claude to output additional tokens in a visible thinking process, can serve as a controllable scratchpad. The lead agent uses thinking to plan its approach, assessing which tools fit the task, determining query complexity and subagent count, and defining each subagent’s role.

In a nutshell, I think it comes down to "describing things meticulously." Apparently, simple and short instructions like "Research the semiconductor shortage" did not work well, so it seems necessary to write prompts for multi-agent AI as meticulously as possible. I'm going to work on writing better prompts from now on.

 

What did you think? It appears that various techniques are necessary to make multi-agent AI systems operate as intended. As the performance of generative AI improves in the future, the required orchestration techniques will also change. I want to continue to stay updated and incorporate the latest cutting-edge technologies. That's all for today. Stay tuned!



Toshi Stats Co., Ltd. provides a wide range of AI-related services. Please see here for more details!

Copyright © 2025 Toshifumi Kuga. All right reserved

1) ,  "How we built our multi-agent research system”,   Anthropic,  June 13, 2025









Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The Cutting Edge of Prompt Engineering: A Look at Silicon Valley Startup

Hello everyone. How often do you find yourselves writing prompts? I imagine more and more of you are writing them daily and conversing with generative AI. So today, we're going to look at the state of cutting-edge prompt engineering, using a case study from a Silicon Valley startup. Let's get started.

 

1. "Parahelp," a Customer Support AI Startup

There's a startup in Silicon Valley called "Parahelp" that provides AI-powered customer support. Impressively, they have publicly shared some of their internally developed prompt know-how (1). In the hyper-competitive world of AI startups, I want to thank the Parahelp management team for generously sharing their valuable knowledge to help those who come after them. The details are in the link below for you to review, but my key takeaway from their know-how is this: "The time spent writing the prompt itself isn't long, but what's crucial is dedicating time to the continuous process of executing, evaluating, and improving that prompt."

When we write prompts in a chat, we often want an immediate answer and tend to aim for "100% quality on the first try." However, it seems the style in cutting-edge prompt engineering is to meticulously refine a prompt through numerous revisions. For an AI startup to earn its clients' trust, this expertise is essential and may very well be the source of its competitive advantage. I believe "iteration" is the key for prompts as well.

 

2. Prompts That Look Like a Computer Program

Let's take a look at a portion of the published prompt. This is a prompt for an AI agent to behave as a manager, and even this is only about half of the full version.

structures of prompts

Here is my analysis of the prompt above:

  • Assigning a persona (in this case, the role of a manager)

  • Describing tasks clearly and specifically

  • Listing detailed, numbered instructions

  • Providing important points as context

  • Defining the output format

I felt it adheres to the fundamental structure of a good prompt. Perhaps because it has been forged in the fierce competition of Silicon Valley, it is written with incredible precision. There's still more to it, so if you're interested, please view it from the link. It's written in even finer detail, and with its heavy use of XML tags, you could almost mistake it for a computer program. Incredible!

 

3. The Future of Prompt Engineering

I imagine that committing this much time and cost to prompt engineering is a high hurdle for the average business person. After learning the basics of prompt writing, many people struggle with what the next step should be.

One tip is to take a prompt you've written and feed it back to the generative AI with the task, "Please improve this prompt." This is called a "meta-prompt." Of course, the challenges of how to give instructions and how to evaluate the results still remain. At Toshi Stats, we plan to explore meta-prompts further.

 

So, what did you think? Even the simple term "prompt" has a lot of depth, doesn't it?As generative AI continues to evolve, or as methods for creating multi-AI agents advance, I believe prompt engineering itself will also continue to evolve. It's definitely something to keep an eye on. I plan to provide an update on this topic in the near future.

That's all for today. Stay tuned!

 

ToshiStats Co., Ltd. offers various AI-related services. Please check them out here!

 

Copyright © 2025 Toshifumi Kuga. All rights reserved.

  1. Prompt design at Parahelp, Parahelp, May 28, 2025

 






Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.





Google DeepMind's new prompt engineering technique, "Many-Shot In-Context Learning," is amazing!

I recently came across an interesting research paper, "Many-Shot In-Context Learning" (1), by Google DeepMind, and I'd like to share a brief overview. Although it's a highly technical paper, it offers valuable insights that we can apply to our own prompt writing. Let's dive in.

 

1. Utilizing Context Effectively

When you write prompts for language models or generative AI like ChatGPT, you probably input the information you want, like a search engine, such as "What is the capital of Japan?" However, generative AI can handle much larger amounts of information. For example, as shown in the chart below, you can load a PDF document and then write a prompt like "Summarize this," and the AI will output a summary of the PDF's content. Think of a prompt as an "instruction to the generative AI." The additional information you provide is called the context.

 


2. What's Needed to Use Generative AI in a Business Setting

Now that we have a basic understanding of how to use generative AI, let's consider what's needed to use it in a company or business setting. Obviously, when you represent your company and interact with customers, you wouldn't express "personal opinions or feelings." You wouldn't say, "I personally don't think this new product will sell." Specifically, companies have established rules and manuals that employees must follow. Normally, employees cannot violate these rules. Therefore, to use generative AI in a company, it must output answers that comply with each company's "rules and manuals," not just general answers. So, how do you convey these rules to the generative AI? One way is to input the "rules and manuals" directly into the generative AI along with the prompt, as shown in the chart above. Many recent generative AIs have "context windows" of 100,000 tokens or more. This represents the amount of information that can be input and output at once, and 100,000 tokens is about 70,000 words in English. You can input a considerable amount of "rules and manuals." Some models, like Google's Gemini 1.5 Pro, can input up to 2 million tokens, which is enough for about 3,000 pages of English manuals. That's amazing. These context windows are sometimes called "long context windows."

 


3. Many-Shot In-Context Learning

"Many-Shot In-Context Learning" is a technique that utilizes these "long context windows" even more effectively. You may have heard of a similar term, "Few-Shot Learning." "Few-Shot Learning" is a method where you first provide the generative AI with a few "question and answer pairs" as examples and then ask the question you want to know. For instance, you might give examples like "The capital of the United States is Washington, D.C." and "The capital of China is Beijing," and then ask the AI, "What is the capital of Japan?" "Many-Shot In-Context Learning" increases the number of these "question and answer pairs" to 10-10,000. This is said to improve accuracy. The graph below shows that in machine translation and summarization tasks, increasing the number of examples to 500-1,000 improves accuracy. 2 to the power of 10 is 1024. The idea is to put as many examples as possible into the "long context window" since it can easily handle them.

The relationship between accuracy and the number of examples in machine translation and summarization.

 


What do you think? If simply increasing the number of examples improves accuracy, it might be worth trying. For those who say, "I can't create so many examples myself," "Many-Shot In-Context Learning" also suggests a method to create synthetic data using an LLM (language model). If you're interested, please check out the paper. But if it's just about 10 examples, you could probably create them yourself. I'll give it a try and update here if I get good results. That's all for today. Stay tuned!

 






1) "Many-Shot In-Context Learning", Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle, Google DeepMind, 22 May 2024,  https://arxiv.org/abs/2404.11018



Copyright © 2024 Toshifumi Kuga. All right reserved




Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.