We are already hearing from many in the field that with the arrival of GPT-5, "the writing style is different from GPT-4o and earlier" and "its performance as an agent is on another level." Here, we will build upon the key points from OpenAI's "GPT-5 Prompt Guide (1)" and organize, from a practical perspective, "how to write prompts to stably reproduce desired behaviors." The following three keywords are key:

GPT-5 acts very proactively as an AI agent.
Self-reflection and guiding principles.
Instruction following with "surgical precision."

Let's delve into each of these.

1. GPT-5 acts very proactively as an AI agent.

GPT-5's enhanced capabilities in tool-calling, understanding long contexts, and planning allow it to proceed autonomously even with ambiguous tasks. Whether you "harness" or "suppress" this capability depends on how you design the agent's "eagerness").

1-1. Controlling Eagerness with Prompts

To suppress eagerness, intentionally limit the depth of exploration and explicitly set caps on parallel searches or additional tool calls. This is effective in situations where processing time and cost are priorities, or when requirements are clear and exploration needs to be minimized.

To enhance eagerness, explicitly state rules for persistence, such as "Do not end the turn until the problem is fully resolved" and "Even with uncertainty, proceed with the best possible plan." This is suitable for long-duration tasks where you want the agent to see them through to completion with minimal check-ins with the user.

Practical Snippet (To suppress eagerness):

<context_gathering>
Goal: Reach a conclusion quickly with minimal information gathering.
Method: A single-batch search, starting broad and then narrowing down. Avoid duplicate searches.
Budget: A maximum of 2 tool calls.
Escape: If a conclusion is reasonably certain, accept minor incompleteness to provide an early answer.
</context_gathering>

Practical Snippet (To encourage eagerness):

<persistence>
Do not end the turn until the problem is completely resolved.
Reason through uncertainty and continue with the best possible plan.
Minimize clarifying questions. Adopt reasonable assumptions and state them later.
</persistence>

1-2. Visualize with a "Tool Preamble"

When the agent outputs a long rollout during execution, having it first provide a brief summary—explaining the objective, outlining the plan, noting progress, and confirming completion—makes it easier for the user to follow along and creates a better user experience.

Recommended Snippet:

<tool_preambles>
First, restate the user's goal in a single sentence. Follow with a bulleted list of the planned steps.
During execution, add concise progress logs sequentially.
Finally, provide a summary that clearly distinguishes between the "Plan" and the "Actual Results."
</tool_preambles>

2. Self-reflection and Guiding Principles

GPT-5 excels at "internally refining" the quality of its output through self-reflection. However, if the criteria for judging quality are not established beforehand, this reflection can become unproductive. This is where guiding principles and a private rubric are effective.

2-1. Provide a "Self-Grading Scorecard" with a Private Rubric

For zero-to-one generation tasks (e.g., creating a new web app, drafting specifications), have the model internally create a scorecard with 5-7 evaluation criteria. Then, have it repeatedly rewrite and re-evaluate its output based on these criteria.

Rubric Generation Snippet:

<self_reflection>
Define the conditions that a world-class deliverable should meet across 5-7 categories (e.g., UI quality, readability, robustness, extensibility, accessibility, accountability). Score your own proposal against these criteria, identify shortcomings, and redesign. The rubric itself should not be shown to the user.
</self_reflection>

2-2. Reduce Inconsistency with Guiding Principles

For ongoing development or modifying existing code, first provide the project's conventions by clearly stating its design principles, directory structure, and UI standards. This ensures that the model's suggested improvements and changes integrate naturally with the existing culture.

Guiding Principles Snippet (Example):

<guiding_principles>
Clarity and Reusability: Keep components small and reusable. Group them and avoid duplication.
Consistency: Unify tokens, typography, and spacing.
Simplicity: Avoid unnecessary complexity in styling and logic.
</guiding_principles>

2-3. Separately Control Verbosity and Reasoning Effort

GPT-5 can control its verbosity (the length of the final answer) and its reasoning_effort (the depth of thought) independently. This allows for context-specific overrides, such as "be concise in prose, but provide detailed explanations in code." The guide introduces a practical example of prompt tuning by Cursor, which is worth checking out. A useful tip for fast mode (minimal reasoning) is to require a brief summary of its thinking or plan at the beginning to assist its process.

3. GPT-5's Instruction Following has "Surgical Precision"

GPT-5 is extremely sensitive to the accuracy and consistency of instructions. Contradictory requests or ambiguous prompts waste reasoning resources and degrade output quality. Therefore, it is crucial to "structure" your instruction hierarchy to prevent contradictions before they occur.

3-1. Design to Avoid Contradictions

Take the example of a healthcare administrator scheduling a patient appointment based on symptoms. "Exceptions," such as altering preceding steps only in emergencies, must be clearly stated so they do not conflict with standard procedures.

Bad Example: The instructions "Do not schedule without consent" and "First, automatically secure the fastest same-day slot" coexist.
Correct Example: When "Always check the profile" and "In an emergency, immediately direct to 911" coexist, the exception rule is declared first.

OpenAI offers the following warning:

We understand that the process of building prompts is an iterative one, and that many prompts are living documents, constantly being updated by different stakeholders. But that’s why it is even more important to thoroughly review for instructions that are phrased improperly. We have already seen multiple early users discover ambiguities and contradictions within their core prompt libraries when they did such a review. Removing them dramatically streamlined and improved GPT-5's performance. We encourage you to test your prompts with our Prompt Optimizer tool to identify these kinds of issues.

How was that? In this article, we explored key points for prompt design from OpenAI's GPT-5 Prompt Guide (1). GPT-5 is a "partner in practice," combining powerful autonomy with precise instruction following. Try incorporating the points discussed today into your prompts and take your AI agents to the next level. That's all for today. Stay tuned!

1) GPT-5 prompting_guide, OpenAI, August 7, 2025

You can enjoy our video news ToshiStats-AI from this link, too!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Lately, I've been hearing a lot about multi-agent AI systems. As someone who is always thinking about not just using these services but building them myself, I've been keen to know how to construct high-performance AI agents. Last week, Anthropic published an article titled, "How we built our multi-agent research system(1)," which describes their construction method in detail. So today, using this article as a reference, I'd like to explore the best practices for creating multi-agent AI systems with all of you. Let's get started!

1. Why do we need so many agents?

ChatGPT, which debuted at the end of November 2022, was a single model. Since then, several services using generative AI have appeared, but initially, most of them used a single AI. So why have we recently seen a rise in methods that connect multiple generative AIs to operate as a single system? I believe it's because it has become clear that there are limits to what a single generative AI can accomplish when faced with complex tasks. It has gradually become apparent that by connecting and integrating several agents, even complex tasks can be handled. This trend has become particularly noticeable in conjunction with the performance improvements of standalone generative AI models like Gemini 1.5 Pro and OpenAI's o3.

2. What kind of agent structure should we build?

The Anthropic article included a wonderful chart that I'd love to reference. The key lies with the "Lead agent" and the "sub-agents" placed beneath it.

Here is Anthropic's explanation: "The multi-agent architecture in action: user queries flow through a lead agent that creates specialized subagents to search for different aspects in parallel" . While the chart shows three sub-agents, it's a matter of course that more may be needed to handle more complex tasks.

3. How do you coordinate many agents?

I've described the move to multi-agent AI as if it's all upside, but it requires numerous AI agents to function as expected. Getting a desired response from a single generative AI can be quite a challenge, so is it even possible to control multiple, simultaneously operating AI agents to meet our expectations? The key seems to lie in the "prompt." In fact, the Anthropic article contains countless, very helpful methods for prompt creation. Here, I'd like to introduce two representative examples. For the rest, I highly recommend reading the original article for yourself.

"Teach the orchestrator how to delegate. In our system, the lead agent decomposes queries into subtasks and describes them to subagents. Each subagent needs an objective, an output format, guidance on the tools and sources to use, and clear task boundaries. Without detailed task descriptions, agents duplicate work, leave gaps, or fail to find necessary information.

"Guide the thinking process. Extended thinking mode, which leads Claude to output additional tokens in a visible thinking process, can serve as a controllable scratchpad. The lead agent uses thinking to plan its approach, assessing which tools fit the task, determining query complexity and subagent count, and defining each subagent’s role.

In a nutshell, I think it comes down to "describing things meticulously." Apparently, simple and short instructions like "Research the semiconductor shortage" did not work well, so it seems necessary to write prompts for multi-agent AI as meticulously as possible. I'm going to work on writing better prompts from now on.

What did you think? It appears that various techniques are necessary to make multi-agent AI systems operate as intended. As the performance of generative AI improves in the future, the required orchestration techniques will also change. I want to continue to stay updated and incorporate the latest cutting-edge technologies. That's all for today. Stay tuned!

Toshi Stats Co., Ltd. provides a wide range of AI-related services. Please see here for more details!

1) , "How we built our multi-agent research system”, Anthropic, June 13, 2025

Let's Explore the Best Practices for Crafting GPT-5 Prompts!