AI agent

A Sweet Strategy: Selling Cakes in Wealthy Residential Areas !

Has everyone ever thought about starting a cake shop? As a cake lover myself, I often find myself wondering, "What kind of cake would be perfect?" However, developing a concrete business strategy is a real challenge. That's why this time, I'd like to conduct a case study with the support of an "AI marketing-agency." Let's get started.


1. Selling Cakes in an Upscale Kansai Neighborhood

The business scenario I've prepared for this case is a simple one:

Goal: To sell premium fruit cakes in the Kansai region.

  • Cake Features: Premium shortcakes featuring strawberries, peaches, and muscat grapes.

  • Target Audience: Women in their 20s to 40s living in upscale residential areas.

  • Stores: 3 cafes near Yamate Dentetsu Ashiya Station, 1 cafe near Kaigan Dentetsu Ashiya Station.

  • Direct Sales Outlet: 1 store inside the Yamate Dentetsu Ashiya Station premises.

  • Branding: The brand's primary color will be blue, with the website and logo also unified in blue.

  • Current Plan: In the process of planning a sales promotion for the autumn season.

From here, what kind of concrete business strategy can we derive? First, I'll input the business scenario into the AI marketing-agency.

The first thing it does is automatically generate 10 cool domain names.

It's hard to choose, but for now, I'll proceed with branding using "PremiumAshiyaCake.com".

 

2. A Practical Business Strategy

Now, let's ask the AI marketing-agency to formulate a business strategy for selling our premium fruit cakes in Kansai. When prompted to input the necessary information, I re-entered the business scenario, and the following business strategy was generated in about two minutes. Amazing!

It's a long document, over five pages, so I can't share it all, but here is the "Core of the Marketing Strategy."

  • Overall Approach: Direct Response that Inspires Aspiration

    • We will build an aspirational, luxury brand image through beautiful content, and then convert that desire into immediate store visits using precisely targeted calls-to-action (CTAs).

  • Core Message and Positioning:

    • Positioning Statement: For the discerning women of Kansai, Premium Ashiya Cake is the patisserie that transforms a moment into a cherished memory with its exquisitely crafted seasonal shortcakes.

    • Tagline / Core Message: "Premium Ashiya Cake: An exquisite moment, crafted for you."

  • Key Pillars of the Strategy:

    • Visual Elegance and a "Blue" Signature: All visuals must be of professional, magazine-quality. The brand color "blue" will be used as a sophisticated accent in styling—such as on blue ribbons, parts of the tableware, or as background elements—to create a recognizable and unique visual signature.

    • Hyper-local Exclusivity: Marketing efforts will be geographically and demographically laser-focused on the target audience residing in Ashiya and its surrounding affluent areas. This creates an "in-the-know" allure for locals.

    • Seasonal Storytelling: Treat each season's campaign as a major event. We will build a narrative around the star ingredients, such as Shine Muscat grapes from a specific partner farm, to build anticipation and justify the premium price point.

This is wonderfully practical content. The keywords I provided—"blue," "Ashiya," and "muscat"—have been skillfully integrated into the strategy.

 

3. The Logo is Excellent, Too—This is Usable!

Because I specified in the initial business scenario that I wanted to "unify the color scheme based on blue," it created this cool logo for me. It really looks like something I could use right away. Google's image generation AI, Imagen 3.0, is used here. The quality of this AI is always highly rated, so it's no surprise that the logo generated this time is also of outstanding quality.

 

So, what did you think of the AI marketing-agency? The business strategy is professional, and it's amazing how it automatically created the domain names and logo with such excellent results. Although I couldn't introduce it this time, it also includes a website creation feature. It's surprising that a tool this high-performance is actually available for free. A development kit called "Google ADK" is provided as open-source, and the AI marketing-agency from this article can be downloaded and used for free as Sample (1). For those who can use Python, I think you'll get the hang of it with a little practice. The operational costs are also limited to the usage fees for Google Gemini 2.5 Pro, so the cost-effectiveness is outstanding. I encourage you all to give it a try.

Please note that this story is a work of fiction and does not represent anything that actually exists. That's all for today, stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Marketing Agency, Google, May 2025



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

How to Turn GPT-5 into a Pro Marketing Analyst with AI Agents!

A while back, I introduced a guide to prompting GPT-5, but it can be quite a challenge to write a perfect prompt from scratch. Not to worry! You can actually have GPT-5 write prompts for GPT-5. Pretty cool, right? Let's take a look at how.

 

1. Using GPT-5 to Do a Marketer's Job

I have some global sales data for stickers(1). Based on this data, I want to develop a sales strategy.

                 Global Sticker Sales Records

In a typical company, a data scientist would analyze the data, and a marketing manager would then create an action plan based on the results. We're going to see if we can get GPT-5 to handle this entire process. Of course, this requires a good prompt, but what kind of prompt is best? This is where it gets tricky. The principle I always adhere to is this: "Data analysis is a means, not an end." There are many data analysis methods, so the same data can be analyzed in various ways. However, what we really want is a sales strategy that boosts revenue. With this in mind, let's reconsider what makes a good prompt.

It's a bit of a puzzle, but I've managed to draft a preliminary version.

 

2. Using Metaprompting to Improve the Prompt with GPT-5

Now, let's have GPT-5 improve the prompt I quickly drafted. The image below shows the process. The first red box is my draft prompt.

                    Metaprompt

The second red box explicitly states the principle: "Perform data analysis with the goal of creating a Marketing strategy." When you provide the data and run this prompt, GPT-5 creates the improvement suggestions you see below, which are very detailed. I actually ran this process twice to get a better result.

                   Final Prompt

 

3. The Result: GPT-5 Generates MARKETING Strategy!

Running the final prompt took about a minute and produced the following output. The detailed analysis and resulting insights are directly connected to marketing actions, staying true to our initial principle. It's fantastic.

The output is concise and perfect for busy executives. Creating this content on my own would likely take an entire day, but with GPT-5, the whole process—including the time it took to draft the initial prompt by myself —takes only about 30 minutes. This really shows how powerful GPT-5 is.

 

What do you think? This time, we explored a method for getting GPT-5 to improve its own prompts. This technique is called Metaprompting, and it's described in the OpenAI GPT-5 Prompting Guide (2).

I encourage you to try Metaprompting starting today and take your AI agent to the next level. That's all for now! Stay tuned!

 



You can enjoy our video news ToshiStats-AI from this link, too!

 

Copyright © 2025 Toshifumi Kuga. All right reserved

1)Forecasting Sticker Sales, kaggle, January 1,2025

2) GPT-5 prompting_guide, OpenAI, August 7, 2025


Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Let's Explore the Best Practices for Crafting GPT-5 Prompts!

We are already hearing from many in the field that with the arrival of GPT-5, "the writing style is different from GPT-4o and earlier" and "its performance as an agent is on another level." Here, we will build upon the key points from OpenAI's "GPT-5 Prompt Guide (1)" and organize, from a practical perspective, "how to write prompts to stably reproduce desired behaviors." The following three keywords are key:

  1. GPT-5 acts very proactively as an AI agent.

  2. Self-reflection and guiding principles.

  3. Instruction following with "surgical precision."

Let's delve into each of these.

 




 

1. GPT-5 acts very proactively as an AI agent.

GPT-5's enhanced capabilities in tool-calling, understanding long contexts, and planning allow it to proceed autonomously even with ambiguous tasks. Whether you "harness" or "suppress" this capability depends on how you design the agent's "eagerness").


1-1. Controlling Eagerness with Prompts

To suppress eagerness, intentionally limit the depth of exploration and explicitly set caps on parallel searches or additional tool calls. This is effective in situations where processing time and cost are priorities, or when requirements are clear and exploration needs to be minimized.

To enhance eagerness, explicitly state rules for persistence, such as "Do not end the turn until the problem is fully resolved" and "Even with uncertainty, proceed with the best possible plan." This is suitable for long-duration tasks where you want the agent to see them through to completion with minimal check-ins with the user.

Practical Snippet (To suppress eagerness):

<context_gathering>
Goal: Reach a conclusion quickly with minimal information gathering.
Method: A single-batch search, starting broad and then narrowing down. Avoid duplicate searches.
Budget: A maximum of 2 tool calls.
Escape: If a conclusion is reasonably certain, accept minor incompleteness to provide an early answer.
</context_gathering>

Practical Snippet (To encourage eagerness):

<persistence>
Do not end the turn until the problem is completely resolved.
Reason through uncertainty and continue with the best possible plan.
Minimize clarifying questions. Adopt reasonable assumptions and state them later.
</persistence>

1-2. Visualize with a "Tool Preamble"

When the agent outputs a long rollout during execution, having it first provide a brief summary—explaining the objective, outlining the plan, noting progress, and confirming completion—makes it easier for the user to follow along and creates a better user experience.

Recommended Snippet:

<tool_preambles>
First, restate the user's goal in a single sentence. Follow with a bulleted list of the planned steps.
During execution, add concise progress logs sequentially.
Finally, provide a summary that clearly distinguishes between the "Plan" and the "Actual Results."
</tool_preambles>
 
 

2. Self-reflection and Guiding Principles

GPT-5 excels at "internally refining" the quality of its output through self-reflection. However, if the criteria for judging quality are not established beforehand, this reflection can become unproductive. This is where guiding principles and a private rubric are effective.


2-1. Provide a "Self-Grading Scorecard" with a Private Rubric

For zero-to-one generation tasks (e.g., creating a new web app, drafting specifications), have the model internally create a scorecard with 5-7 evaluation criteria. Then, have it repeatedly rewrite and re-evaluate its output based on these criteria.

Rubric Generation Snippet:

<self_reflection>
Define the conditions that a world-class deliverable should meet across 5-7 categories (e.g., UI quality, readability, robustness, extensibility, accessibility, accountability). Score your own proposal against these criteria, identify shortcomings, and redesign. The rubric itself should not be shown to the user.
</self_reflection>

2-2. Reduce Inconsistency with Guiding Principles

For ongoing development or modifying existing code, first provide the project's conventions by clearly stating its design principles, directory structure, and UI standards. This ensures that the model's suggested improvements and changes integrate naturally with the existing culture.

Guiding Principles Snippet (Example):

<guiding_principles>
Clarity and Reusability: Keep components small and reusable. Group them and avoid duplication.
Consistency: Unify tokens, typography, and spacing.
Simplicity: Avoid unnecessary complexity in styling and logic.
</guiding_principles>

2-3. Separately Control Verbosity and Reasoning Effort

GPT-5 can control its verbosity (the length of the final answer) and its reasoning_effort (the depth of thought) independently. This allows for context-specific overrides, such as "be concise in prose, but provide detailed explanations in code." The guide introduces a practical example of prompt tuning by Cursor, which is worth checking out. A useful tip for fast mode (minimal reasoning) is to require a brief summary of its thinking or plan at the beginning to assist its process.

 
 


3. GPT-5's Instruction Following has "Surgical Precision"

GPT-5 is extremely sensitive to the accuracy and consistency of instructions. Contradictory requests or ambiguous prompts waste reasoning resources and degrade output quality. Therefore, it is crucial to "structure" your instruction hierarchy to prevent contradictions before they occur.



3-1. Design to Avoid Contradictions

Take the example of a healthcare administrator scheduling a patient appointment based on symptoms. "Exceptions," such as altering preceding steps only in emergencies, must be clearly stated so they do not conflict with standard procedures.

  • Bad Example: The instructions "Do not schedule without consent" and "First, automatically secure the fastest same-day slot" coexist.

  • Correct Example: When "Always check the profile" and "In an emergency, immediately direct to 911" coexist, the exception rule is declared first.

OpenAI offers the following warning:

We understand that the process of building prompts is an iterative one, and that many prompts are living documents, constantly being updated by different stakeholders. But that’s why it is even more important to thoroughly review for instructions that are phrased improperly. We have already seen multiple early users discover ambiguities and contradictions within their core prompt libraries when they did such a review. Removing them dramatically streamlined and improved GPT-5's performance. We encourage you to test your prompts with our Prompt Optimizer tool to identify these kinds of issues.

 
 

How was that? In this article, we explored key points for prompt design from OpenAI's GPT-5 Prompt Guide (1). GPT-5 is a "partner in practice," combining powerful autonomy with precise instruction following. Try incorporating the points discussed today into your prompts and take your AI agents to the next level. That's all for today. Stay tuned!

 
 

Copyright © 2025 Toshifumi Kuga. All right reserved

1) GPT-5 prompting_guide, OpenAI, August 7, 2025

You can enjoy our video news ToshiStats-AI from this link, too!

 

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Prompt Optimization: The Secret to Building Better AI Agents?

The instructions that humans write for generative AI are called "prompts." There are many books and blogs out there that offer guidance on how to write them. Many of you have probably tried, and it's surprisingly difficult, isn't it? While no programming language is required, you have to go through a lot of trial and error to get the output you want from a generative AI. This process can be quite time-consuming, isn't well-systematized, and you often have to start from scratch for each new task.

So, this time, we'd like to experiment with "what happens if we have a generative AI write the prompts for us?" Let's get started.

 


1. Prompt Optimization

In 2023, Google DeepMind released a research paper titled "LARGE LANGUAGE MODELS AS OPTIMIZERS"(1).

This paper explored the use of LLMs to optimize prompts, and it seems to have worked well for several tasks. While a human writes the initial prompt, subsequent improvements are delegated to the LLM (the optimizer). The LLM is also responsible for judging whether the result was successful or not (the evaluator), meaning this approach can be applied even without labeled data that provides the correct answers. This is very helpful, as tasks involving generative AI often lack labeled data. Below is a flowchart of this process, which is effectively the automation of prompt engineering. This is professionally referred to as "prompt optimization." The specific method we adopted for this experiment is called OPRO (Optimization by PROmpting).






2. Experiment with a Customer Complaint Classification Task

Similar to our blog post on July 26th, we set up a task to predict which financial product a bank's customer complaint is about. We used an LLM to solve a classification task where it selects one of the following six financial products. We used gemini-2.5-flash for this experiment, with a sample size of 100 customer complaints.

  • Mortgage

  • Checking or savings account

  • Student loan

  • Money transfer, virtual currency, or money service

  • Bank account or service

  • Consumer Loan

In this experiment, the LLM handled the prompt generation, but a meta-prompt was necessary to further improve the resulting prompts. I wrote the meta-prompt as follows. Essentially, it tells the LLM to "please further improve the resulting prompt."

We had the LLM generate 20 prompts, and the results are shown below. The final number is the accuracy. An accuracy of 0.8 means 80 out of 100 cases were correct. Since this data came with labeled data, calculating the accuracy was easy.

We adopted the second prompt from the list, which had the best accuracy of 0.89 in this experiment. When we ported this prompt to our regular experimental environment and ran it, the accuracy exceeded 0.9, as shown below. We've done this task many times before, but this is the first time we've surpassed 0.9 accuracy. That's amazing!

 






3. What Does the Future of Prompt Engineering Look Like?

As you can see, it seems possible to optimize prompts by leveraging the power of generative AI. Of course, when considering cost and time, the results might not always be worth the effort. Nevertheless, I feel there's a strong need for prompt automation. Researchers worldwide are currently exploring various methods, so many things that aren't possible now will likely become possible in the near future. Prompt engineering techniques will continue to evolve, and I'm looking forward to these technological developments and plan to try out various methods myself.

 

So, what did you think? The ability of an AI agent to fully utilize the power of generative AI and improve itself without human intervention is called "Recursive-self-improvement." At ToshiStats, we will continue to provide the latest updates on this topic. Please look forward to it. Stay tuned!

 

Copyright © 2025 Toshifumi Kuga. All right reserved

1) LARGE LANGUAGE MODELS AS OPTIMIZERS Chengrun Yang Xuezhi Wang Yifeng Lu Hanxiao Liu Quoc V. Le Denny Zhou Xinyun Chen , Google DeepMind

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

I tried creating and implementing an AI app with no-code on Google AI Studio, and it was amazing!

Google has been rapidly releasing generative AI and related products recently, with Google AI Studio (1) particularly standing out as a developer platform. It integrates the latest image and video generation AI, truly embodying a multimodal platform. What's more, it's free up to a certain limit, making it a powerful ally for startups like ours. So, let's actually create an AI application with this platform!


1. Google AI Studio Portal

Below is the Google AI Studio portal. It has so many features that an AI beginner might get confused without prior knowledge. I suppose that's why it's a developer-oriented platform. By clicking the button in the red box, you'll be taken to a site where you can create an application simply by writing a prompt.

Google AI Studio

Here's the prompt I used this time.

"As a 'Complaint Categorization Agent,' you are an expert at understanding which product a customer is complaining about. You can select only one product from the complaint. Comprehensively analyze the provided complaint and classify it into one of the following categories:

  • Mortgage

  • Checking or savings account

  • Student loan

  • Money transfer, virtual currency, or money service

  • Bank account or service

  • Consumer Loan

Your output should be only one of the above categories. All samples must be classified into one of these classes. Results for all samples are required. Create a GUI that adds the ability to input a CSV file of customer complaints and generate a graph showing the distribution of customer complaint classes. Add features to the GUI to add labeled data independently of the customer complaint CSV file, calculate and display accuracy, and display a confusion matrix of the results."

Just by typing this prompt into the box and running it, the application described below is created. I didn't use any coding like Python at all. It's amazing!



2. Tackling a Real Classification Task with the Created App

After two or three attempts, the final application I built is shown below. It handles the task of classifying bank customer complaints by financial product. This time, I've set it to six types of financial products, but generative AI can achieve high accuracy even without prior training, so it's possible to classify many more classes if desired.

Input Screen

We import customer complaints via a CSV file. This time, I'll use 100 complaints. Furthermore, if ground truth data is available, I've added functionality to output accuracy and a confusion matrix. Below are the actual classification results. The distribution of the six financial products is displayed. It seems this customer complaint data primarily concerns mortgages.

Class Distribution

Here's the crucial classification accuracy. This time, we achieved over 80% accuracy, at 83%, without any prior training. It's incredible!

Classification accuracy

The confusion matrix, often used in classification tasks, can also be displayed. This not only provides a numerical accuracy but also shows where classification errors frequently occur, making it easier to set guidelines for improving accuracy and enabling more effective improvements.

Confusion Matrix

 

3. Agent Evaluation

What I realized when creating this app was that if some evaluation metric is available, the quality of discussions for subsequent improvements deepens. Trying with just a few samples won't give a good grasp of the generative AI's behavior. Ideally, preparing at least 10, and ideally 100 or more, samples with corresponding ground truth data, and having the AI app output evaluation metrics, would enable effective accuracy improvement suggestions. This theme is called "Agent evaluation," and I believe it will become essential for building practical AI applications in the future.

 

What do you think? Despite not doing any programming at all this time, I was able to create such an amazing AI application. Google AI Studio integrates perfectly with Google Cloud, allowing you to deploy your app to the cloud with a single button and use it worldwide. Toshi Stats will continue to challenge ourselves by building various AI applications. Stay tuned!

 

Copyright © 2025 Toshifumi Kuga. All right reserved

1) Google AI Studio

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The Cutting Edge of Prompt Engineering: A Look at Silicon Valley Startup

Hello everyone. How often do you find yourselves writing prompts? I imagine more and more of you are writing them daily and conversing with generative AI. So today, we're going to look at the state of cutting-edge prompt engineering, using a case study from a Silicon Valley startup. Let's get started.

 

1. "Parahelp," a Customer Support AI Startup

There's a startup in Silicon Valley called "Parahelp" that provides AI-powered customer support. Impressively, they have publicly shared some of their internally developed prompt know-how (1). In the hyper-competitive world of AI startups, I want to thank the Parahelp management team for generously sharing their valuable knowledge to help those who come after them. The details are in the link below for you to review, but my key takeaway from their know-how is this: "The time spent writing the prompt itself isn't long, but what's crucial is dedicating time to the continuous process of executing, evaluating, and improving that prompt."

When we write prompts in a chat, we often want an immediate answer and tend to aim for "100% quality on the first try." However, it seems the style in cutting-edge prompt engineering is to meticulously refine a prompt through numerous revisions. For an AI startup to earn its clients' trust, this expertise is essential and may very well be the source of its competitive advantage. I believe "iteration" is the key for prompts as well.

 

2. Prompts That Look Like a Computer Program

Let's take a look at a portion of the published prompt. This is a prompt for an AI agent to behave as a manager, and even this is only about half of the full version.

structures of prompts

Here is my analysis of the prompt above:

  • Assigning a persona (in this case, the role of a manager)

  • Describing tasks clearly and specifically

  • Listing detailed, numbered instructions

  • Providing important points as context

  • Defining the output format

I felt it adheres to the fundamental structure of a good prompt. Perhaps because it has been forged in the fierce competition of Silicon Valley, it is written with incredible precision. There's still more to it, so if you're interested, please view it from the link. It's written in even finer detail, and with its heavy use of XML tags, you could almost mistake it for a computer program. Incredible!

 

3. The Future of Prompt Engineering

I imagine that committing this much time and cost to prompt engineering is a high hurdle for the average business person. After learning the basics of prompt writing, many people struggle with what the next step should be.

One tip is to take a prompt you've written and feed it back to the generative AI with the task, "Please improve this prompt." This is called a "meta-prompt." Of course, the challenges of how to give instructions and how to evaluate the results still remain. At Toshi Stats, we plan to explore meta-prompts further.

 

So, what did you think? Even the simple term "prompt" has a lot of depth, doesn't it?As generative AI continues to evolve, or as methods for creating multi-AI agents advance, I believe prompt engineering itself will also continue to evolve. It's definitely something to keep an eye on. I plan to provide an update on this topic in the near future.

That's all for today. Stay tuned!

 

ToshiStats Co., Ltd. offers various AI-related services. Please check them out here!

 

Copyright © 2025 Toshifumi Kuga. All rights reserved.

  1. Prompt design at Parahelp, Parahelp, May 28, 2025

 






Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.





Google DeepMind Announces "AlphaEvolve," Hinting at an Intelligence Explosion!

Google DeepMind has unveiled a new research paper today, introducing "AlphaEvolve" (1), a coding agent that leverages evolutionary computation. It's already garnering significant attention due to its broad applicability and proven successes, such as discovering more efficient methods for matrix calculations in mathematics and improving efficiency in Google's data centers. Let's dive a little deeper into what makes it so remarkable.

 

LLMs Empowered with Evolutionary Computation

In a nutshell, "AlphaEvolve" can be described as an "agent that leverages LLMs to the fullest to evolve code." To briefly touch upon "evolutionary computation," it's an algorithm that mimics the process of evolution in humans and living organisms to improve systems, replicating genetic crossover and mutation on a computer. Traditionally, the function responsible for this, called an "Operator," had to be set by humans. "AlphaEvolve" automates the creation of Operators with the support of LLMs, enabling more efficient code generation. That sounds incredibly powerful! While evolutionary computation itself isn't new, with practical applications dating back to the 2000s, its combination with LLMs appears to have unlocked new capabilities. The red box in the diagram below indicates where evolutionary computation is applied.

 

2. Continued Evolution with Meta-Prompts

I'm particularly intrigued by the "prompt_sampler" mentioned above because this is where "meta-prompts" are executed. The paper explains, "Meta prompt evolution: instructions and context suggested by the LLM itself in an additional prompt-generation step, co-evolved in a separate database analogous to the solution programs." It seems that prompts are also evolving! The diagram below also shows that accuracy decreases when meta-prompt evolution is not applied compared to when it is.

This is incredible! With an algorithm like this, I'd certainly want to apply it to my own tasks.

 

3. Have We Taken a Step Closer to an Intelligence Explosion?

Approximately a year ago, researcher Leopold Aschenbrenner published a paper (2) predicting that computers would surpass human performance by 2030 as a result of an intelligence explosion. The graph below illustrates this projection. This latest "AlphaEvolve" can be seen as having acquired the ability to improve its own performance. This might just be a step closer to an intelligence explosion. It's hard to imagine the outcome of countless AI agents like this, each evolving independently, but it certainly feels like something monumental is on the horizon. After all, computers operate 24 hours a day, 365 days a year, so once they acquire self-improvement capabilities, their pace of evolution is likely to accelerate. He refers to this as "recursive self-improvement" (p47).

 



What are your thoughts? The idea of AI surpassing humans can be a bit challenging to grasp intuitively, but just thinking about what AI agents might be like around 2027 is incredibly exciting. I'll be sure to provide updates if a sequel to "AlphaEvolve" is released in the future. That's all for now. Stay tuned!

 


1) AlphaEvolve: A coding agent for scientific and algorithmic discovery Alexander Novikov* , Ngân Vu˜ * , Marvin Eisenberger* , Emilien Dupont* , Po-Sen Huang* , Adam Zsolt Wagner* , Sergey Shirobokov* , Borislav Kozlovskii* , Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli and Matej Balog* Google DeepMind ,16 May, 2025

2) S I T U AT I O N A L AWA R E N E S S  The Decade Ahead, Leopold Aschenbrenner, June 2024


 


Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes, the software and the contents.

We Built a Customer Complaint Classification Agent with Google's New AI Agent Framework "ADK"

On April 9th, Google released a new AI agent framework called "ADK" (Agent Development Kit). It's an excellent framework that incorporates the latest multi-agent technology while also being user-friendly, allowing implementation in about 100 lines of code. At Toshi Stats, we decided to immediately try creating a customer complaint classification agent using ADK.

 

1. Customer Complaint Classification Task

Banks receive various complaints from customers. We want to classify these complaints based on which financial product they concern. Specifically, this is a 6-class classification task where we choose one from the following six financial products. Random guessing would yield an accuracy below 20%.

Financial products to classify

 

2. Implementation with ADK

Now, let's move on to the ADK implementation. We'll defer to the official documentation for file structure and other details, and instead show how to write the AI agent below. The "instruction" part is particularly important; writing this carefully improves accuracy. This is what's known as a "prompt". In this case, we've specifically instructed it to select only one from the six financial products. Other parts are largely unchanged from what's described in tutorials, etc. It has a simple structure, and I believe it's not difficult once you get used to it.

AI agent implementation with ADK

 

3. Accuracy Verification

We created six classification examples and had the AI agent provide answers. In the first example, I believe it answered "student loan" based on the word "graduation." It's quite smart! Also, in the second example, it's presumed to have answered "mortgage " based on the phrase "prime location." ADK has a built-in UI like the one shown below, which is very convenient for testing immediately after implementation.

ADK user interface

The generative AI model used this time, Google's "gemini-2.5-flash-04-17," is highly capable. When tasked with a 6-class classification problem using 100 actual customer complaints received by a bank, it typically achieves an accuracy of over 80%. For simple examples like the ones above, it wouldn't be surprising if it achieved 100% accuracy.

 

So, what did you think? This was our first time covering ADK, but I feel it will become popular due to its high performance and ease of use. Combined with A2A(2), which was announced by Google around the same time, I believe use cases will continue to increase. We're excited to see what comes next! At Toshi Stats, we will continue to build even more advanced AI agents with ADK. Stay tuned!

 



1) Agent Development Kit,  Google, April 9th, 2025
2) Agent2Agent.  Google, April 9th, 2025

 



Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.