prompt engineering

Let's Explore the Best Practices for Crafting GPT-5 Prompts!

We are already hearing from many in the field that with the arrival of GPT-5, "the writing style is different from GPT-4o and earlier" and "its performance as an agent is on another level." Here, we will build upon the key points from OpenAI's "GPT-5 Prompt Guide (1)" and organize, from a practical perspective, "how to write prompts to stably reproduce desired behaviors." The following three keywords are key:

  1. GPT-5 acts very proactively as an AI agent.

  2. Self-reflection and guiding principles.

  3. Instruction following with "surgical precision."

Let's delve into each of these.

 




 

1. GPT-5 acts very proactively as an AI agent.

GPT-5's enhanced capabilities in tool-calling, understanding long contexts, and planning allow it to proceed autonomously even with ambiguous tasks. Whether you "harness" or "suppress" this capability depends on how you design the agent's "eagerness").


1-1. Controlling Eagerness with Prompts

To suppress eagerness, intentionally limit the depth of exploration and explicitly set caps on parallel searches or additional tool calls. This is effective in situations where processing time and cost are priorities, or when requirements are clear and exploration needs to be minimized.

To enhance eagerness, explicitly state rules for persistence, such as "Do not end the turn until the problem is fully resolved" and "Even with uncertainty, proceed with the best possible plan." This is suitable for long-duration tasks where you want the agent to see them through to completion with minimal check-ins with the user.

Practical Snippet (To suppress eagerness):

<context_gathering>
Goal: Reach a conclusion quickly with minimal information gathering.
Method: A single-batch search, starting broad and then narrowing down. Avoid duplicate searches.
Budget: A maximum of 2 tool calls.
Escape: If a conclusion is reasonably certain, accept minor incompleteness to provide an early answer.
</context_gathering>

Practical Snippet (To encourage eagerness):

<persistence>
Do not end the turn until the problem is completely resolved.
Reason through uncertainty and continue with the best possible plan.
Minimize clarifying questions. Adopt reasonable assumptions and state them later.
</persistence>

1-2. Visualize with a "Tool Preamble"

When the agent outputs a long rollout during execution, having it first provide a brief summary—explaining the objective, outlining the plan, noting progress, and confirming completion—makes it easier for the user to follow along and creates a better user experience.

Recommended Snippet:

<tool_preambles>
First, restate the user's goal in a single sentence. Follow with a bulleted list of the planned steps.
During execution, add concise progress logs sequentially.
Finally, provide a summary that clearly distinguishes between the "Plan" and the "Actual Results."
</tool_preambles>
 
 

2. Self-reflection and Guiding Principles

GPT-5 excels at "internally refining" the quality of its output through self-reflection. However, if the criteria for judging quality are not established beforehand, this reflection can become unproductive. This is where guiding principles and a private rubric are effective.


2-1. Provide a "Self-Grading Scorecard" with a Private Rubric

For zero-to-one generation tasks (e.g., creating a new web app, drafting specifications), have the model internally create a scorecard with 5-7 evaluation criteria. Then, have it repeatedly rewrite and re-evaluate its output based on these criteria.

Rubric Generation Snippet:

<self_reflection>
Define the conditions that a world-class deliverable should meet across 5-7 categories (e.g., UI quality, readability, robustness, extensibility, accessibility, accountability). Score your own proposal against these criteria, identify shortcomings, and redesign. The rubric itself should not be shown to the user.
</self_reflection>

2-2. Reduce Inconsistency with Guiding Principles

For ongoing development or modifying existing code, first provide the project's conventions by clearly stating its design principles, directory structure, and UI standards. This ensures that the model's suggested improvements and changes integrate naturally with the existing culture.

Guiding Principles Snippet (Example):

<guiding_principles>
Clarity and Reusability: Keep components small and reusable. Group them and avoid duplication.
Consistency: Unify tokens, typography, and spacing.
Simplicity: Avoid unnecessary complexity in styling and logic.
</guiding_principles>

2-3. Separately Control Verbosity and Reasoning Effort

GPT-5 can control its verbosity (the length of the final answer) and its reasoning_effort (the depth of thought) independently. This allows for context-specific overrides, such as "be concise in prose, but provide detailed explanations in code." The guide introduces a practical example of prompt tuning by Cursor, which is worth checking out. A useful tip for fast mode (minimal reasoning) is to require a brief summary of its thinking or plan at the beginning to assist its process.

 
 


3. GPT-5's Instruction Following has "Surgical Precision"

GPT-5 is extremely sensitive to the accuracy and consistency of instructions. Contradictory requests or ambiguous prompts waste reasoning resources and degrade output quality. Therefore, it is crucial to "structure" your instruction hierarchy to prevent contradictions before they occur.



3-1. Design to Avoid Contradictions

Take the example of a healthcare administrator scheduling a patient appointment based on symptoms. "Exceptions," such as altering preceding steps only in emergencies, must be clearly stated so they do not conflict with standard procedures.

  • Bad Example: The instructions "Do not schedule without consent" and "First, automatically secure the fastest same-day slot" coexist.

  • Correct Example: When "Always check the profile" and "In an emergency, immediately direct to 911" coexist, the exception rule is declared first.

OpenAI offers the following warning:

We understand that the process of building prompts is an iterative one, and that many prompts are living documents, constantly being updated by different stakeholders. But that’s why it is even more important to thoroughly review for instructions that are phrased improperly. We have already seen multiple early users discover ambiguities and contradictions within their core prompt libraries when they did such a review. Removing them dramatically streamlined and improved GPT-5's performance. We encourage you to test your prompts with our Prompt Optimizer tool to identify these kinds of issues.

 
 

How was that? In this article, we explored key points for prompt design from OpenAI's GPT-5 Prompt Guide (1). GPT-5 is a "partner in practice," combining powerful autonomy with precise instruction following. Try incorporating the points discussed today into your prompts and take your AI agents to the next level. That's all for today. Stay tuned!

 
 

Copyright © 2025 Toshifumi Kuga. All right reserved

1) GPT-5 prompting_guide, OpenAI, August 7, 2025

You can enjoy our video news ToshiStats-AI from this link, too!

 

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Prompt Optimization: The Secret to Building Better AI Agents?

The instructions that humans write for generative AI are called "prompts." There are many books and blogs out there that offer guidance on how to write them. Many of you have probably tried, and it's surprisingly difficult, isn't it? While no programming language is required, you have to go through a lot of trial and error to get the output you want from a generative AI. This process can be quite time-consuming, isn't well-systematized, and you often have to start from scratch for each new task.

So, this time, we'd like to experiment with "what happens if we have a generative AI write the prompts for us?" Let's get started.

 


1. Prompt Optimization

In 2023, Google DeepMind released a research paper titled "LARGE LANGUAGE MODELS AS OPTIMIZERS"(1).

This paper explored the use of LLMs to optimize prompts, and it seems to have worked well for several tasks. While a human writes the initial prompt, subsequent improvements are delegated to the LLM (the optimizer). The LLM is also responsible for judging whether the result was successful or not (the evaluator), meaning this approach can be applied even without labeled data that provides the correct answers. This is very helpful, as tasks involving generative AI often lack labeled data. Below is a flowchart of this process, which is effectively the automation of prompt engineering. This is professionally referred to as "prompt optimization." The specific method we adopted for this experiment is called OPRO (Optimization by PROmpting).






2. Experiment with a Customer Complaint Classification Task

Similar to our blog post on July 26th, we set up a task to predict which financial product a bank's customer complaint is about. We used an LLM to solve a classification task where it selects one of the following six financial products. We used gemini-2.5-flash for this experiment, with a sample size of 100 customer complaints.

  • Mortgage

  • Checking or savings account

  • Student loan

  • Money transfer, virtual currency, or money service

  • Bank account or service

  • Consumer Loan

In this experiment, the LLM handled the prompt generation, but a meta-prompt was necessary to further improve the resulting prompts. I wrote the meta-prompt as follows. Essentially, it tells the LLM to "please further improve the resulting prompt."

We had the LLM generate 20 prompts, and the results are shown below. The final number is the accuracy. An accuracy of 0.8 means 80 out of 100 cases were correct. Since this data came with labeled data, calculating the accuracy was easy.

We adopted the second prompt from the list, which had the best accuracy of 0.89 in this experiment. When we ported this prompt to our regular experimental environment and ran it, the accuracy exceeded 0.9, as shown below. We've done this task many times before, but this is the first time we've surpassed 0.9 accuracy. That's amazing!

 






3. What Does the Future of Prompt Engineering Look Like?

As you can see, it seems possible to optimize prompts by leveraging the power of generative AI. Of course, when considering cost and time, the results might not always be worth the effort. Nevertheless, I feel there's a strong need for prompt automation. Researchers worldwide are currently exploring various methods, so many things that aren't possible now will likely become possible in the near future. Prompt engineering techniques will continue to evolve, and I'm looking forward to these technological developments and plan to try out various methods myself.

 

So, what did you think? The ability of an AI agent to fully utilize the power of generative AI and improve itself without human intervention is called "Recursive-self-improvement." At ToshiStats, we will continue to provide the latest updates on this topic. Please look forward to it. Stay tuned!

 

Copyright © 2025 Toshifumi Kuga. All right reserved

1) LARGE LANGUAGE MODELS AS OPTIMIZERS Chengrun Yang Xuezhi Wang Yifeng Lu Hanxiao Liu Quoc V. Le Denny Zhou Xinyun Chen , Google DeepMind

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Prompt Engineering Mastery: The Fast Track

Since the debut of ChatGPT at the end of November 2022, the way we give instructions to computers has completely changed. Previously, programming languages like Python were necessary, but with ChatGPT, it's now possible to give instructions using the "natural languages" we use every day, such as English and Japanese. These natural language instructions are called "prompts." It has been about two and a half years since prompts came into use, and many people are likely experimenting with various prompts daily. As this is a new technology, systematically learning it can be challenging. However, Google has released a free white paper (1) of over 60 pages on the topic, so let's explore it for some hints. Let's begin!

 

1. Grasping the Basic Concepts

We often see simple prompt guides like "The Top 20 Prompts You Need to Know." However, it's impossible to effectively interact with a generative AI, which holds a vast amount of knowledge, with just about 20 prompts. While it may seem like a shortcut, memorizing a recommended list of 20 prompts each time is laborious and inefficient. Various studies are being conducted on how to write prompts, and the theoretical background is being investigated. While it's difficult for the average person to grasp everything, Google's white paper summarizes it concisely as follows:

  • Zero-shot prompting

  • Few-shot prompting

  • System prompting

  • Role prompting

  • Contextual prompting

  • Step-back prompting

  • Chain of thought

  • Self-consistency

  • Tree of thoughts

For example, the second method, "Few-shot prompting," is a technique to elicit more accurate answers from a generative AI by providing it with specific examples in "question and answer pairs." The other methods also have their own theoretical backgrounds and wide ranges of application. Rather than rote memorization, it's important to first understand the concepts and then apply them. I cannot explain them all here, so I encourage you to read the original document. I recommend taking your time to learn them one by one.

 

2. Memorize Useful Words

That said, taking the first step to actually write a prompt can be quite daunting. Google has provided a list of recommended verbs, which I'd like to introduce here. Choosing from these verbs to craft your prompts might help you create good ones, so it's worth a try.

Act, Analyze, Categorize, Classify, Contrast, Compare, Create, Describe, Define, Evaluate, Extract, Find, Generate, Identify, List, Measure, Organize, Parse (especially for sentences and data grammatically), Pick, Predict, Provide, Rank, Recommend, Return, Retrieve (information, etc.), Rewrite, Select, Show, Sort, Summarize, Translate, Write

When you're unsure what to write, these verbs might give you a hint. This list includes many that I frequently use myself.

 

3. Finding Hints from Actual Examples

When you actually try out prompts, you'll find that some cases work well while others don't. The white paper summarizes these into 15 Best Practices. Here, I'll introduce an example from page 56.

Be specific about the output

Be specific about the desired output. A concise instruction might not guide the LLM enough

or could be too generic. Providing specific details in the prompt (through system or context

prompting) can help the model to focus on what’s relevant, improving the overall accuracy.

Examples:

DO:

Generate a 3 paragraph blog post about the top 5 video game consoles.

The blog post should be informative and engaging, and it should be

written in a conversational style.

DO NOT:

Generate a blog post about video game consoles.

Indeed, we tend to write simple prompts like the bad example. However, if we can add a bit more information and write like the good example, the information we receive will be better tailored to our needs. Just knowing this can change how you write prompts from now on. This white paper is full of such examples, so I highly recommend you read it for yourself.

 

How was that? I hope this serves as a reference for your prompt learning journey. Prompt engineering is still in its infancy, making it a great time to start learning. Let's conclude with a message from Google: "You don’t need to be a data scientist or a machine learning engineer – everyone can write a prompt. (1)"

Stay tuned!




Copyright © 2025 Toshifumi Kuga. All right reserved

1) , "Prompt Engineering”, Google, Feb 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.