promt

Let's Explore the Best Practices for Crafting GPT-5 Prompts!

We are already hearing from many in the field that with the arrival of GPT-5, "the writing style is different from GPT-4o and earlier" and "its performance as an agent is on another level." Here, we will build upon the key points from OpenAI's "GPT-5 Prompt Guide (1)" and organize, from a practical perspective, "how to write prompts to stably reproduce desired behaviors." The following three keywords are key:

  1. GPT-5 acts very proactively as an AI agent.

  2. Self-reflection and guiding principles.

  3. Instruction following with "surgical precision."

Let's delve into each of these.

 




 

1. GPT-5 acts very proactively as an AI agent.

GPT-5's enhanced capabilities in tool-calling, understanding long contexts, and planning allow it to proceed autonomously even with ambiguous tasks. Whether you "harness" or "suppress" this capability depends on how you design the agent's "eagerness").


1-1. Controlling Eagerness with Prompts

To suppress eagerness, intentionally limit the depth of exploration and explicitly set caps on parallel searches or additional tool calls. This is effective in situations where processing time and cost are priorities, or when requirements are clear and exploration needs to be minimized.

To enhance eagerness, explicitly state rules for persistence, such as "Do not end the turn until the problem is fully resolved" and "Even with uncertainty, proceed with the best possible plan." This is suitable for long-duration tasks where you want the agent to see them through to completion with minimal check-ins with the user.

Practical Snippet (To suppress eagerness):

<context_gathering>
Goal: Reach a conclusion quickly with minimal information gathering.
Method: A single-batch search, starting broad and then narrowing down. Avoid duplicate searches.
Budget: A maximum of 2 tool calls.
Escape: If a conclusion is reasonably certain, accept minor incompleteness to provide an early answer.
</context_gathering>

Practical Snippet (To encourage eagerness):

<persistence>
Do not end the turn until the problem is completely resolved.
Reason through uncertainty and continue with the best possible plan.
Minimize clarifying questions. Adopt reasonable assumptions and state them later.
</persistence>

1-2. Visualize with a "Tool Preamble"

When the agent outputs a long rollout during execution, having it first provide a brief summary—explaining the objective, outlining the plan, noting progress, and confirming completion—makes it easier for the user to follow along and creates a better user experience.

Recommended Snippet:

<tool_preambles>
First, restate the user's goal in a single sentence. Follow with a bulleted list of the planned steps.
During execution, add concise progress logs sequentially.
Finally, provide a summary that clearly distinguishes between the "Plan" and the "Actual Results."
</tool_preambles>
 
 

2. Self-reflection and Guiding Principles

GPT-5 excels at "internally refining" the quality of its output through self-reflection. However, if the criteria for judging quality are not established beforehand, this reflection can become unproductive. This is where guiding principles and a private rubric are effective.


2-1. Provide a "Self-Grading Scorecard" with a Private Rubric

For zero-to-one generation tasks (e.g., creating a new web app, drafting specifications), have the model internally create a scorecard with 5-7 evaluation criteria. Then, have it repeatedly rewrite and re-evaluate its output based on these criteria.

Rubric Generation Snippet:

<self_reflection>
Define the conditions that a world-class deliverable should meet across 5-7 categories (e.g., UI quality, readability, robustness, extensibility, accessibility, accountability). Score your own proposal against these criteria, identify shortcomings, and redesign. The rubric itself should not be shown to the user.
</self_reflection>

2-2. Reduce Inconsistency with Guiding Principles

For ongoing development or modifying existing code, first provide the project's conventions by clearly stating its design principles, directory structure, and UI standards. This ensures that the model's suggested improvements and changes integrate naturally with the existing culture.

Guiding Principles Snippet (Example):

<guiding_principles>
Clarity and Reusability: Keep components small and reusable. Group them and avoid duplication.
Consistency: Unify tokens, typography, and spacing.
Simplicity: Avoid unnecessary complexity in styling and logic.
</guiding_principles>

2-3. Separately Control Verbosity and Reasoning Effort

GPT-5 can control its verbosity (the length of the final answer) and its reasoning_effort (the depth of thought) independently. This allows for context-specific overrides, such as "be concise in prose, but provide detailed explanations in code." The guide introduces a practical example of prompt tuning by Cursor, which is worth checking out. A useful tip for fast mode (minimal reasoning) is to require a brief summary of its thinking or plan at the beginning to assist its process.

 
 


3. GPT-5's Instruction Following has "Surgical Precision"

GPT-5 is extremely sensitive to the accuracy and consistency of instructions. Contradictory requests or ambiguous prompts waste reasoning resources and degrade output quality. Therefore, it is crucial to "structure" your instruction hierarchy to prevent contradictions before they occur.



3-1. Design to Avoid Contradictions

Take the example of a healthcare administrator scheduling a patient appointment based on symptoms. "Exceptions," such as altering preceding steps only in emergencies, must be clearly stated so they do not conflict with standard procedures.

  • Bad Example: The instructions "Do not schedule without consent" and "First, automatically secure the fastest same-day slot" coexist.

  • Correct Example: When "Always check the profile" and "In an emergency, immediately direct to 911" coexist, the exception rule is declared first.

OpenAI offers the following warning:

We understand that the process of building prompts is an iterative one, and that many prompts are living documents, constantly being updated by different stakeholders. But that’s why it is even more important to thoroughly review for instructions that are phrased improperly. We have already seen multiple early users discover ambiguities and contradictions within their core prompt libraries when they did such a review. Removing them dramatically streamlined and improved GPT-5's performance. We encourage you to test your prompts with our Prompt Optimizer tool to identify these kinds of issues.

 
 

How was that? In this article, we explored key points for prompt design from OpenAI's GPT-5 Prompt Guide (1). GPT-5 is a "partner in practice," combining powerful autonomy with precise instruction following. Try incorporating the points discussed today into your prompts and take your AI agents to the next level. That's all for today. Stay tuned!

 
 

Copyright © 2025 Toshifumi Kuga. All right reserved

1) GPT-5 prompting_guide, OpenAI, August 7, 2025

You can enjoy our video news ToshiStats-AI from this link, too!

 

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

"REST MEETS REACT" is a new prompt-engineering method using synthetic data. It holds immense potential for enhancing AI without relying on human-generated data

Happy New Year! Thank you for your continued support. Promptly, Google DeepMind has announced a new, advanced prompt engineering method suitable for the new year. It is a paper titled "REST MEETS REACT: SELF-IMPROVEMENT FOR MULTI-STEP REASONING LLM AGENT"(1). It incorporates fine-tuning with synthetic data, which looks promising! Let's get started.

 

1.Prompt Structure

This prompt is designed with a web Q&A system in mind that answers complex questions. The structure is as follows:

The blue part in the figure above represents the flow of the agent described in the prompt, aiming to answer complex questions using web search. In the latter half, "Relevance self-check" and "Grounding self-check" are functions for the agent to check its own answers. It's a self-check function. For a detailed explanation of the entire flow, please refer to the paper.

 

2. "Reward Model" - The Key to Success

Now, let's explain the core part of self-improvement. In a nutshell, it's about "creating new high-quality data and fine-tuning the model with it." . This function consists of three parts:

  • Grow: Start with a model capable of running Search Agent, using Google PaLM 2-L model for this purpose. Trajectories are collected based on a selected set of 2000 public questions. Trajectory, though an unfamiliar term, refers to the reasoning process and is commonly used in reinforcement learning.

  • Improve: Convert trajectories into data for fine-tuning, using the Reward model to select only high-quality data. No external data, like labels, are used.

  • Fine-tuning: Fine-tune a new model of the same size with this new data, ensuring it performs better than the original.

This process is repeated with the better model using the new data. As a result, accuracy improves while maintaining the original data, without adding external data. Therefore, the accuracy of the Reward model in ranking is crucial. The Reward model is constructed as a set of prompts in this paper. Let's look more closely at these prompts, showing only the initial part.

  • The goal of this rating is to filter out bad actions so that they'll be excluded from the fine-tuning dataset.

  • Overall, we want the agent to produce relevant and grounded answers with minimal steps. Anything deviating from this goal is considered bad.

  • If any element (thoughts, comments, etc.) is empty, then it's automatically bad.

"Filter out" indicates a method of discarding items that don't meet the standards and adopting only the high-quality data that remains. Please see the paper (p19) for details.

 




3.Improve Accuracy with Synthetic Data

Papers including this one have been published in late 2023, focusing on using the Reward model to create high-quality synthetic data for model fine-tuning and accuracy improvement. Vigorous research is expected to continue in 2024, yielding various results. Especially in the LLM field, collecting high-quality training data is becoming increasingly difficult, and fine-tuning with synthetic data is anticipated as a solution.


 


How was it? The improvement in model accuracy with synthetic data is expected to be a very effective development method for startups like us, who cannot collect vast amounts of data independently. Our blog will continue to follow these synthetic data and other technological innovations, so stay tuned. Wishing you a great year!






1) “REST MEETS REACT: SELF-IMPROVEMENT FOR MULTI-STEP REASONING LLM AGENT" Renat Aksitov†1 , Sobhan Miryoosefi†1 , Zonglin Li†1 , Daliang Li†1 , Sheila Babayan†2 , Kavya Kopparapu†2 , Zachary Fisher1 , Ruiqi Guo1 , Sushant Prakash1 , Pranesh Srinivasan3 , Manzil Zaheer2 , Felix Yu1 , and Sanjiv Kumar1,    1Google Research, 2Google DeepMind, 3Google †Core contributors, 15 Dec 2023, https://arxiv.org/abs/2312.10003





Copyright © 2023 Toshifumi Kuga. All right reserved





Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.