Toshifumi Kuga

November 28, 2025

Nano Banana Pro, Gemini 3, generative ai, image generation

Game Changer: How Nano Banana Pro is Redefining Digital Marketing!

Toshifumi Kuga

November 28, 2025

Nano Banana Pro, Gemini 3, generative ai, image generation

Just fresh off the heels of last week's new model release, Google has debuted yet another new image generation model: Nano Banana Pro (Gemini 3 Pro Image). Rumors on the street say it boasts incredible performance. So, let's dive in and test it out to see its potential capabilities.

1. The Latest Tokyo Fashion Trends

Fashion evolves with every season, and keeping up with the trends can be a challenge. However, the internet is overflowing with the latest style information. I figured that by feeding this real-time data into generative AI, we could generate images of models wearing the styles currently in vogue. Let's give it a try. Below is the original image of the model. She is wearing an outfit typical of Japanese autumn.

I fed this original image and the prompt "Perform Google Search for current Tokyo fashion trends for 20s lady and apply that style to the model in the attached photo. 4 images are needed." into Nano Banana Pro.

The same model appears in all four images, maintaining consistency. Furthermore, the latest fashion trends have been incorporated thanks to Google Search. This is wonderful. Nano Banana Pro's Grounding feature using Google Search is excellent. As the model updates in the future, we can expect the accuracy of capturing trendy fashion to improve even further.

2. Creating a Signature Cafe Menu

Next, I want to devise a set menu featuring shortcake and coffee for opening a cafe in Ashiya, a high-end residential area in Japan. For this one too, I prepared a prompt to generate the image after researching currently popular cakes using Google Search.

"I am opening a cafe in Ashiya, Japan, featuring a fruit shortcake and coffee set as the signature dish. Use Google Search to identify current cake trends in Ashiya City. Then, create a high-quality menu image for this set that includes a description and price in English, incorporating the local trends."

I generated the following Japanese and English versions of the menu.

Both the Japanese and English text are perfect. I think this is a huge leap forward, especially since AI image generation has struggled to correctly render local languages like Japanese until now. I’m sure it will work well with other local languages too. It looks like Nano Banana Pro will be able to perform globally, regardless of language.

3. 3D Visualization of Loss Functions

Raising the abstraction level a bit, I want to execute a 3D visualization of a loss function—a topic often discussed when building targeting models for marketing—and clearly explain the concept of the gradient descent method. Nano Banana Pro can understand even theoretical and highly abstract phenomena like loss functions and map them in 3D. Below is the result. You can see at a glance how the parameters get stuck in a local minimum and cannot reach the point where the loss function is at its global minimum. Amazing.

How was it? Even from these few experiments, the excellence of Nano Banana Pro is clear. I have a hunch that Nano Banana Pro is going to change the very methods of digital marketing. I felt particularly strong potential in the Grounding feature using Google Search. I plan to cover Nano Banana Pro again in the near future.

That’s all for today. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1) Introducing Nano Banana Pro, Google, 20 Nov 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

October 15, 2025

Andrew Ng, AI agent, generative ai, Agentic AI

Your Guide to AI Agents: Insights from Andrew Ng's Latest Course

Toshifumi Kuga

October 15, 2025

Andrew Ng, AI agent, generative ai, Agentic AI

A new online course called "Agentic AI" (1) has been released by DeepLearning AI. The creator is Andrew Ng, an adjunct professor at Stanford University, who is also famous for his past machine learning-related courses. For me, this is the first course I've taken from him since the Deep Learning Specialization in 2018. I've just completed it, and I'd like to share my thoughts and a recommendation.

1. Course Overview

The course is divided into five modules, each consisting of 5-7 short videos (about 5-10 minutes each), a quiz, and coding tasks using jupyter notebook. By passing each assignment, you are ultimately awarded a certificate of completion. The level is listed as intermediate; while a basic knowledge of Python is necessary, I believe that even those without specialized knowledge in AI can progress through the material and naturally come to understand it. The main topics are as follows:

Reflection: AI critiques its own work and iterates to improve quality—like code review, but automated.

Tool Use: Connect AI to databases, APIs, and external services so it can actually perform actions, not just generate text.

Planning: Break complex tasks into executable steps that AI can follow and adapt when things don’t go as expected.

Multi-Agent: Coordinate multiple specialized AI systems to handle different parts of a complex workflow.

Created by Andrew Ng, who teaches at Stanford while concurrently doing practical consulting work, I found the course to have a wonderful balance between theory and practice.

2. Reflection and Tool Use

The second and third modules are critical technologies for the future realization of AGI. In particular, "Reflection," where an AI improves itself, is also known as Recursive Self Improvement and is a field being researched worldwide. This module introduces a method that allows even non-experts to incorporate reflection functionality, which I am very eager to try implementing. Additionally, using tools allows a generative AI to incorporate information that is difficult to acquire on its own, thereby enhancing the AI agent's capabilities. Furthermore, this information can be applied to the "Reflection" process, promising a synergistic effect. I'm also keen to implement this and see what kind of information can be integrated.

3. Error Analysis

As Andrew Ng states, this fourth module is, in my opinion, the most important and valuable content in the course. Generative AI is excellent, but it is not perfect. There is still a considerable possibility that it will produce incorrect answers. Therefore, to raise its accuracy to a practical level, the course emphasizes the importance of adopting a strategy that quickly identifies the parts of the overall process with the lowest performance and allocates resources to improving those areas. I can certainly see how for a complex AI agent that may contain numerous sub-agents, identifying and prioritizing the reinforcement of its weaknesses is incredibly important in practical applications.

So, what did you think? With a flood of AI-related news every day, many people are likely wondering, "How should I proceed with my AI projects from now on?" I believe this course provides a valuable perspective for thinking in the medium to long term. While it is a paid course, it is not as expensive as university tuition, and I highly recommend trying it. Incidentally, because I studied intensively, I was able to receive my certificate in about three days. It's certainly possible for a business professional to complete it over a long weekend.

Well, that's all for today. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1) Agentic AI, Andrew Ng, DeepLearning AI, Oct 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

September 14, 2025

AI agent, ADK, generative ai, marketing

A Sweet Strategy: Selling Cakes in Wealthy Residential Areas !

Toshifumi Kuga

September 14, 2025

AI agent, ADK, generative ai, marketing

Has everyone ever thought about starting a cake shop? As a cake lover myself, I often find myself wondering, "What kind of cake would be perfect?" However, developing a concrete business strategy is a real challenge. That's why this time, I'd like to conduct a case study with the support of an "AI marketing-agency." Let's get started.

1. Selling Cakes in an Upscale Kansai Neighborhood

The business scenario I've prepared for this case is a simple one:

Goal: To sell premium fruit cakes in the Kansai region.

Cake Features: Premium shortcakes featuring strawberries, peaches, and muscat grapes.
Target Audience: Women in their 20s to 40s living in upscale residential areas.
Stores: 3 cafes near Yamate Dentetsu Ashiya Station, 1 cafe near Kaigan Dentetsu Ashiya Station.
Direct Sales Outlet: 1 store inside the Yamate Dentetsu Ashiya Station premises.
Branding: The brand's primary color will be blue, with the website and logo also unified in blue.
Current Plan: In the process of planning a sales promotion for the autumn season.

From here, what kind of concrete business strategy can we derive? First, I'll input the business scenario into the AI marketing-agency.

The first thing it does is automatically generate 10 cool domain names.

It's hard to choose, but for now, I'll proceed with branding using "PremiumAshiyaCake.com".

2. A Practical Business Strategy

Now, let's ask the AI marketing-agency to formulate a business strategy for selling our premium fruit cakes in Kansai. When prompted to input the necessary information, I re-entered the business scenario, and the following business strategy was generated in about two minutes. Amazing!

It's a long document, over five pages, so I can't share it all, but here is the "Core of the Marketing Strategy."

Overall Approach: Direct Response that Inspires Aspiration
- We will build an aspirational, luxury brand image through beautiful content, and then convert that desire into immediate store visits using precisely targeted calls-to-action (CTAs).
Core Message and Positioning:
- Positioning Statement: For the discerning women of Kansai, Premium Ashiya Cake is the patisserie that transforms a moment into a cherished memory with its exquisitely crafted seasonal shortcakes.
- Tagline / Core Message: "Premium Ashiya Cake: An exquisite moment, crafted for you."
Key Pillars of the Strategy:
- Visual Elegance and a "Blue" Signature: All visuals must be of professional, magazine-quality. The brand color "blue" will be used as a sophisticated accent in styling—such as on blue ribbons, parts of the tableware, or as background elements—to create a recognizable and unique visual signature.
- Hyper-local Exclusivity: Marketing efforts will be geographically and demographically laser-focused on the target audience residing in Ashiya and its surrounding affluent areas. This creates an "in-the-know" allure for locals.
- Seasonal Storytelling: Treat each season's campaign as a major event. We will build a narrative around the star ingredients, such as Shine Muscat grapes from a specific partner farm, to build anticipation and justify the premium price point.

This is wonderfully practical content. The keywords I provided—"blue," "Ashiya," and "muscat"—have been skillfully integrated into the strategy.

3. The Logo is Excellent, Too—This is Usable!

Because I specified in the initial business scenario that I wanted to "unify the color scheme based on blue," it created this cool logo for me. It really looks like something I could use right away. Google's image generation AI, Imagen 3.0, is used here. The quality of this AI is always highly rated, so it's no surprise that the logo generated this time is also of outstanding quality.

So, what did you think of the AI marketing-agency? The business strategy is professional, and it's amazing how it automatically created the domain names and logo with such excellent results. Although I couldn't introduce it this time, it also includes a website creation feature. It's surprising that a tool this high-performance is actually available for free. A development kit called "Google ADK" is provided as open-source, and the AI marketing-agency from this article can be downloaded and used for free as Sample (1). For those who can use Python, I think you'll get the hang of it with a little practice. The operational costs are also limited to the usage fees for Google Gemini 2.5 Pro, so the cost-effectiveness is outstanding. I encourage you all to give it a try.

Please note that this story is a work of fiction and does not represent anything that actually exists. That's all for today, stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1) Marketing Agency, Google, May 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

September 8, 2025

GPT-5, generative ai, marketing, sales forecasting

Unlocking Sales Forecasts: Can GPT-5 Reveal the Most Important Data?

Toshifumi Kuga

September 8, 2025

GPT-5, generative ai, marketing, sales forecasting

Have you ever found yourself in marketing, wanting to predict sales and gathering a ton of data? For example, let's say you have sticker sales data (1) like the set below. The num_sold column represents the number of units sold. This is actually a large dataset with over 200,000 entries. So, among these data columns (which we call "features"), which one is the most important for predicting sales? They all seem important, and it's impossible to check all 200,000 records one by one. So, let's try asking the generative AI, GPT-5.

1. Asking GPT-5 with a Prompt

To identify the important features for a prediction, you first have to create a predictive model. This is a task that data scientists perform all the time. However, they usually create these models by coding in Python, which can be a high barrier for the average business person. So, isn't there an easier way? Yes, and this is where prompts come in handy. If you can give instructions to GPT-5 with a prompt, no coding is necessary. Here is the prompt I created for this task.

Key points of the prompt:

Use HistGradientBoostingRegressor from sklearn.
Evaluate the error using mean_absolute_percentage_error.
Split the data into train-data and test-data at an 80:20 ratio.
Display the top 10 feature importances with their original variable names.
Print the results as numerical output.

By getting the top 10 feature importances, we can understand which data column is the most significant. I won't explain the predictive model itself this time, so for those who want to dive deeper, please refer to a machine learning textbook.

2. The Code Actually Being Executed

Based on the prompt above, GPT-5 generated the following Python code on its own. It might look complicated to non-specialists, but rest assured, we don't have to touch Python at all. However, we can review this code to see how the calculation is being done, so it's by no means a black box. I believe this transparency is very important when using GPT-5 in a business context.

GPT-5's code for building the prediction model

3. "Product" Was the Most Important!

Ultimately, we got the following result.

Feature Importance Ranking

A higher "importance" value in the table above means the feature is more significant. This analysis revealed that "product" was overwhelmingly important. It seems that thinking about "what is selling" is essential. This is followed by "store" and "country". This suggests that considering "in what kind of store" and "in which country" is also crucial.

So, what did you think? This time, we instructed GPT-5 with a prompt to calculate which features are most important for predicting sales. It's true that you might run into errors along the way that GPT-5 has to correct itself, so I felt that having some basic knowledge of machine learning is beneficial. However, we were able to get the result without the user having to write any Python, which means marketing professionals can start trying this out today. I hope you can use the method we introduced today in your own marketing work. That's all for now. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1)Forecasting Sticker Sales, kaggle, January 1,2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

August 24, 2025

AI agent, GPT5, generative ai, promt, prompt engineering

Let's Explore the Best Practices for Crafting GPT-5 Prompts!

Toshifumi Kuga

August 24, 2025

AI agent, GPT5, generative ai, promt, prompt engineering

We are already hearing from many in the field that with the arrival of GPT-5, "the writing style is different from GPT-4o and earlier" and "its performance as an agent is on another level." Here, we will build upon the key points from OpenAI's "GPT-5 Prompt Guide (1)" and organize, from a practical perspective, "how to write prompts to stably reproduce desired behaviors." The following three keywords are key:

GPT-5 acts very proactively as an AI agent.
Self-reflection and guiding principles.
Instruction following with "surgical precision."

Let's delve into each of these.

1. GPT-5 acts very proactively as an AI agent.

GPT-5's enhanced capabilities in tool-calling, understanding long contexts, and planning allow it to proceed autonomously even with ambiguous tasks. Whether you "harness" or "suppress" this capability depends on how you design the agent's "eagerness").

1-1. Controlling Eagerness with Prompts

To suppress eagerness, intentionally limit the depth of exploration and explicitly set caps on parallel searches or additional tool calls. This is effective in situations where processing time and cost are priorities, or when requirements are clear and exploration needs to be minimized.

To enhance eagerness, explicitly state rules for persistence, such as "Do not end the turn until the problem is fully resolved" and "Even with uncertainty, proceed with the best possible plan." This is suitable for long-duration tasks where you want the agent to see them through to completion with minimal check-ins with the user.

Practical Snippet (To suppress eagerness):

<context_gathering>
Goal: Reach a conclusion quickly with minimal information gathering.
Method: A single-batch search, starting broad and then narrowing down. Avoid duplicate searches.
Budget: A maximum of 2 tool calls.
Escape: If a conclusion is reasonably certain, accept minor incompleteness to provide an early answer.
</context_gathering>

Practical Snippet (To encourage eagerness):

<persistence>
Do not end the turn until the problem is completely resolved.
Reason through uncertainty and continue with the best possible plan.
Minimize clarifying questions. Adopt reasonable assumptions and state them later.
</persistence>

1-2. Visualize with a "Tool Preamble"

When the agent outputs a long rollout during execution, having it first provide a brief summary—explaining the objective, outlining the plan, noting progress, and confirming completion—makes it easier for the user to follow along and creates a better user experience.

Recommended Snippet:

<tool_preambles>
First, restate the user's goal in a single sentence. Follow with a bulleted list of the planned steps.
During execution, add concise progress logs sequentially.
Finally, provide a summary that clearly distinguishes between the "Plan" and the "Actual Results."
</tool_preambles>

2. Self-reflection and Guiding Principles

GPT-5 excels at "internally refining" the quality of its output through self-reflection. However, if the criteria for judging quality are not established beforehand, this reflection can become unproductive. This is where guiding principles and a private rubric are effective.

2-1. Provide a "Self-Grading Scorecard" with a Private Rubric

For zero-to-one generation tasks (e.g., creating a new web app, drafting specifications), have the model internally create a scorecard with 5-7 evaluation criteria. Then, have it repeatedly rewrite and re-evaluate its output based on these criteria.

Rubric Generation Snippet:

<self_reflection>
Define the conditions that a world-class deliverable should meet across 5-7 categories (e.g., UI quality, readability, robustness, extensibility, accessibility, accountability). Score your own proposal against these criteria, identify shortcomings, and redesign. The rubric itself should not be shown to the user.
</self_reflection>

2-2. Reduce Inconsistency with Guiding Principles

For ongoing development or modifying existing code, first provide the project's conventions by clearly stating its design principles, directory structure, and UI standards. This ensures that the model's suggested improvements and changes integrate naturally with the existing culture.

Guiding Principles Snippet (Example):

<guiding_principles>
Clarity and Reusability: Keep components small and reusable. Group them and avoid duplication.
Consistency: Unify tokens, typography, and spacing.
Simplicity: Avoid unnecessary complexity in styling and logic.
</guiding_principles>

2-3. Separately Control Verbosity and Reasoning Effort

GPT-5 can control its verbosity (the length of the final answer) and its reasoning_effort (the depth of thought) independently. This allows for context-specific overrides, such as "be concise in prose, but provide detailed explanations in code." The guide introduces a practical example of prompt tuning by Cursor, which is worth checking out. A useful tip for fast mode (minimal reasoning) is to require a brief summary of its thinking or plan at the beginning to assist its process.

3. GPT-5's Instruction Following has "Surgical Precision"

GPT-5 is extremely sensitive to the accuracy and consistency of instructions. Contradictory requests or ambiguous prompts waste reasoning resources and degrade output quality. Therefore, it is crucial to "structure" your instruction hierarchy to prevent contradictions before they occur.

3-1. Design to Avoid Contradictions

Take the example of a healthcare administrator scheduling a patient appointment based on symptoms. "Exceptions," such as altering preceding steps only in emergencies, must be clearly stated so they do not conflict with standard procedures.

Bad Example: The instructions "Do not schedule without consent" and "First, automatically secure the fastest same-day slot" coexist.
Correct Example: When "Always check the profile" and "In an emergency, immediately direct to 911" coexist, the exception rule is declared first.

OpenAI offers the following warning:

We understand that the process of building prompts is an iterative one, and that many prompts are living documents, constantly being updated by different stakeholders. But that’s why it is even more important to thoroughly review for instructions that are phrased improperly. We have already seen multiple early users discover ambiguities and contradictions within their core prompt libraries when they did such a review. Removing them dramatically streamlined and improved GPT-5's performance. We encourage you to test your prompts with our Prompt Optimizer tool to identify these kinds of issues.

How was that? In this article, we explored key points for prompt design from OpenAI's GPT-5 Prompt Guide (1). GPT-5 is a "partner in practice," combining powerful autonomy with precise instruction following. Try incorporating the points discussed today into your prompts and take your AI agents to the next level. That's all for today. Stay tuned!

1) GPT-5 prompting_guide, OpenAI, August 7, 2025

You can enjoy our video news ToshiStats-AI from this link, too!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

August 17, 2025

GPT5, OpenAI, generative ai

Unexpected Weakness Revealed! What Happened When I Tried Image Analysis with the New "GPT-5" Generative AI from OpenAI

Toshifumi Kuga

August 17, 2025

GPT5, OpenAI, generative ai

OpenAI's New Generative AI "GPT-5" Has Arrived. I Tried Image Analysis and Discovered a Surprising Weakness! (1)

The long-awaited new generative AI, "GPT-5," has been released by OpenAI. I believe its multimodal capabilities have also improved, so I decided to upload a few images and run some simple tests. Let's get started.

1.The car is stopped, but why is it stopped?

The image shows a Mazda passenger car on display inside a train station (Hiroshima Station). This is just an exhibit car, but I thought GPT-5 could answer if it understood the background. It seems to have correctly recognized that this is an indoor space and not a public road. The answer was correct.

2.How many minutes until departure?

This is a common scenario when traveling. I asked how many minutes until the train I was planning to board, "Nozomi 104," would depart. The key was whether GPT-5 could understand that the large displayed time was the current time. This time, it also worked out well.

3.Which way should I go for car number 4?

This is another common travel situation. At a Shinkansen platform at Tokyo Station, I wanted to go to car number 4, and I asked which way to go, left or right, based on the sign above. The result was correct.

4. I want to go to Shin-Osaka Station. How many trains can I take?

The last one is a difficult question. This is a Shinkansen information board at Tokyo Station, and it shows 16 trains in total. When I asked, "I want to go to Shin-Osaka Station," it replied with 8 trains. This is the number of trains with Shin-Osaka as the destination, which is a bit of a simplistic answer. For example, a Shinkansen bound for Hakata also stops at Shin-Osaka. It seems that GPT-5, in its default mode, didn't think that far ahead.

To redeem itself, I switched to "Thinking" mode and tried one more time. As expected, it considered the intermediate stops and answered 14 trains, excluding the trains bound for Nagoya. That's the correct answer.

So, what do you think? Overall, the performance is excellent. GPT-5 is said to use a "real-time router" that defaults to "Auto" and automatically switches to "Thinking" for difficult tasks. However, since it's just been released, this switching might not always work perfectly. As the examples above show, although "Thinking" mode was appropriate in some cases, it didn't activate automatically. Therefore, if you feel something is "a little off," I recommend switching to "Thinking" mode. I hope it will become more stable over time. I look forward to covering GPT-5 again in the future. Stay tuned!

1) GPT-5 System Card., OpenAI, August 7, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

August 10, 2025

Prompt Optimization, AI agent, AGI, prompt engineering, Google DeepMind, generative ai

Prompt Optimization: The Secret to Building Better AI Agents?

Toshifumi Kuga

August 10, 2025

Prompt Optimization, AI agent, AGI, prompt engineering, Google DeepMind, generative ai

The instructions that humans write for generative AI are called "prompts." There are many books and blogs out there that offer guidance on how to write them. Many of you have probably tried, and it's surprisingly difficult, isn't it? While no programming language is required, you have to go through a lot of trial and error to get the output you want from a generative AI. This process can be quite time-consuming, isn't well-systematized, and you often have to start from scratch for each new task.

So, this time, we'd like to experiment with "what happens if we have a generative AI write the prompts for us?" Let's get started.

1. Prompt Optimization

In 2023, Google DeepMind released a research paper titled "LARGE LANGUAGE MODELS AS OPTIMIZERS"(1).

This paper explored the use of LLMs to optimize prompts, and it seems to have worked well for several tasks. While a human writes the initial prompt, subsequent improvements are delegated to the LLM (the optimizer). The LLM is also responsible for judging whether the result was successful or not (the evaluator), meaning this approach can be applied even without labeled data that provides the correct answers. This is very helpful, as tasks involving generative AI often lack labeled data. Below is a flowchart of this process, which is effectively the automation of prompt engineering. This is professionally referred to as "prompt optimization." The specific method we adopted for this experiment is called OPRO (Optimization by PROmpting).

2. Experiment with a Customer Complaint Classification Task

Similar to our blog post on July 26th, we set up a task to predict which financial product a bank's customer complaint is about. We used an LLM to solve a classification task where it selects one of the following six financial products. We used gemini-2.5-flash for this experiment, with a sample size of 100 customer complaints.

Mortgage
Checking or savings account
Student loan
Money transfer, virtual currency, or money service
Bank account or service
Consumer Loan

In this experiment, the LLM handled the prompt generation, but a meta-prompt was necessary to further improve the resulting prompts. I wrote the meta-prompt as follows. Essentially, it tells the LLM to "please further improve the resulting prompt."

We had the LLM generate 20 prompts, and the results are shown below. The final number is the accuracy. An accuracy of 0.8 means 80 out of 100 cases were correct. Since this data came with labeled data, calculating the accuracy was easy.

We adopted the second prompt from the list, which had the best accuracy of 0.89 in this experiment. When we ported this prompt to our regular experimental environment and ran it, the accuracy exceeded 0.9, as shown below. We've done this task many times before, but this is the first time we've surpassed 0.9 accuracy. That's amazing!

3. What Does the Future of Prompt Engineering Look Like?

As you can see, it seems possible to optimize prompts by leveraging the power of generative AI. Of course, when considering cost and time, the results might not always be worth the effort. Nevertheless, I feel there's a strong need for prompt automation. Researchers worldwide are currently exploring various methods, so many things that aren't possible now will likely become possible in the near future. Prompt engineering techniques will continue to evolve, and I'm looking forward to these technological developments and plan to try out various methods myself.

So, what did you think? The ability of an AI agent to fully utilize the power of generative AI and improve itself without human intervention is called "Recursive-self-improvement." At ToshiStats, we will continue to provide the latest updates on this topic. Please look forward to it. Stay tuned!

1) LARGE LANGUAGE MODELS AS OPTIMIZERS Chengrun Yang Xuezhi Wang Yifeng Lu Hanxiao Liu Quoc V. Le Denny Zhou Xinyun Chen , Google DeepMind

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

August 3, 2025

spatial understanding, Google AI Studio, generative ai, gemini25flash

Back to object detection after a break! Generative AI shows no signs of slowing down

Toshifumi Kuga

August 3, 2025

spatial understanding, Google AI Studio, generative ai, gemini25flash

It's remarkable to see the rapid progress of generative AI. Recently, the improvement in multimodal capabilities, which process information like images and videos in addition to natural language, has been outstanding. This is sometimes referred to as AI's "spatial understanding." Let's briefly experiment with what kind of information generative AI can extract from images to check the performance of the current Gemini 2.5-flash model.

1. Google AI Studio

I'll be using the familiar generative AI development platform, Google AI Studio (1), again. I've prepared a no-code app for spatial understanding. It can display the number of identified objects and their coordinates. For example, for "hands," it shows them like this. It accurately identifies two hands.

2. Generative AI Understands the Meaning of Words and Can Identify Objects

So, what about a task that requires understanding the positional relationship between a flower and a hand, such as "a hand holding a flower"? The result is a successful identification.

Conversely, what about a task like "a hand not holding a flower"? The result is also a successful identification. This is impressive; it identified it with no problem.

Next, can it identify an object based solely on its positional relationship? Let's ask it to identify "what's on the hamburg." It easily answered "fried egg." While this generative AI, Gemini, has been touted for its high-performance image processing since its debut in December 2023, I'm honestly surprised it can do this much.

3. Can It Identify Station Names from a Sign?

Let's try a slightly more difficult task. This is a section of a subway station sign in Kuala Lumpur, the capital of Malaysia. Let's see if it can identify the three stations between Ampang Park and Chan Sow Lin from this image of the sign.

The result was that it accurately identified the three stations. This is a task that requires it to not only read the text in the image correctly but also understand the positional relationship of the stations. It accomplished this without any difficulty. I have nothing more to say; it's amazing!

What do you think? I'm sure many of you are surprised by the high level of spatial understanding. Generative AI is still in its early stages, so its performance will continue to improve, and accordingly, its practical applications will expand. It's something to look forward to. Also, I created this AI app on Google AI Studio without writing any code. Google AI Studio is very user-friendly and high-performing. I encourage you all to try it. Toshi Stats will continue to challenge itself to build various AI apps. Please stay tuned!

1) Google AI Studio

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

July 26, 2025

Google AI Studio, AI agent, generative ai

I tried creating and implementing an AI app with no-code on Google AI Studio, and it was amazing!

Toshifumi Kuga

July 26, 2025

Google AI Studio, AI agent, generative ai

Google has been rapidly releasing generative AI and related products recently, with Google AI Studio (1) particularly standing out as a developer platform. It integrates the latest image and video generation AI, truly embodying a multimodal platform. What's more, it's free up to a certain limit, making it a powerful ally for startups like ours. So, let's actually create an AI application with this platform!

1. Google AI Studio Portal

Below is the Google AI Studio portal. It has so many features that an AI beginner might get confused without prior knowledge. I suppose that's why it's a developer-oriented platform. By clicking the button in the red box, you'll be taken to a site where you can create an application simply by writing a prompt.

Here's the prompt I used this time.

"As a 'Complaint Categorization Agent,' you are an expert at understanding which product a customer is complaining about. You can select only one product from the complaint. Comprehensively analyze the provided complaint and classify it into one of the following categories:

Mortgage
Checking or savings account
Student loan
Money transfer, virtual currency, or money service
Bank account or service
Consumer Loan

Your output should be only one of the above categories. All samples must be classified into one of these classes. Results for all samples are required. Create a GUI that adds the ability to input a CSV file of customer complaints and generate a graph showing the distribution of customer complaint classes. Add features to the GUI to add labeled data independently of the customer complaint CSV file, calculate and display accuracy, and display a confusion matrix of the results."

Just by typing this prompt into the box and running it, the application described below is created. I didn't use any coding like Python at all. It's amazing!

2. Tackling a Real Classification Task with the Created App

After two or three attempts, the final application I built is shown below. It handles the task of classifying bank customer complaints by financial product. This time, I've set it to six types of financial products, but generative AI can achieve high accuracy even without prior training, so it's possible to classify many more classes if desired.

We import customer complaints via a CSV file. This time, I'll use 100 complaints. Furthermore, if ground truth data is available, I've added functionality to output accuracy and a confusion matrix. Below are the actual classification results. The distribution of the six financial products is displayed. It seems this customer complaint data primarily concerns mortgages.

Here's the crucial classification accuracy. This time, we achieved over 80% accuracy, at 83%, without any prior training. It's incredible!

The confusion matrix, often used in classification tasks, can also be displayed. This not only provides a numerical accuracy but also shows where classification errors frequently occur, making it easier to set guidelines for improving accuracy and enabling more effective improvements.

Confusion Matrix

3. Agent Evaluation

What I realized when creating this app was that if some evaluation metric is available, the quality of discussions for subsequent improvements deepens. Trying with just a few samples won't give a good grasp of the generative AI's behavior. Ideally, preparing at least 10, and ideally 100 or more, samples with corresponding ground truth data, and having the AI app output evaluation metrics, would enable effective accuracy improvement suggestions. This theme is called "Agent evaluation," and I believe it will become essential for building practical AI applications in the future.

What do you think? Despite not doing any programming at all this time, I was able to create such an amazing AI application. Google AI Studio integrates perfectly with Google Cloud, allowing you to deploy your app to the cloud with a single button and use it worldwide. Toshi Stats will continue to challenge ourselves by building various AI applications. Stay tuned!

1) Google AI Studio

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

July 14, 2025

generative ai, Google DeepMind, prompt

Prompt Engineering Mastery: The Fast Track

Toshifumi Kuga

July 14, 2025

generative ai, Google DeepMind, prompt

Since the debut of ChatGPT at the end of November 2022, the way we give instructions to computers has completely changed. Previously, programming languages like Python were necessary, but with ChatGPT, it's now possible to give instructions using the "natural languages" we use every day, such as English and Japanese. These natural language instructions are called "prompts." It has been about two and a half years since prompts came into use, and many people are likely experimenting with various prompts daily. As this is a new technology, systematically learning it can be challenging. However, Google has released a free white paper (1) of over 60 pages on the topic, so let's explore it for some hints. Let's begin!

1. Grasping the Basic Concepts

We often see simple prompt guides like "The Top 20 Prompts You Need to Know." However, it's impossible to effectively interact with a generative AI, which holds a vast amount of knowledge, with just about 20 prompts. While it may seem like a shortcut, memorizing a recommended list of 20 prompts each time is laborious and inefficient. Various studies are being conducted on how to write prompts, and the theoretical background is being investigated. While it's difficult for the average person to grasp everything, Google's white paper summarizes it concisely as follows:

Zero-shot prompting
Few-shot prompting
System prompting
Role prompting
Contextual prompting
Step-back prompting
Chain of thought
Self-consistency
Tree of thoughts

For example, the second method, "Few-shot prompting," is a technique to elicit more accurate answers from a generative AI by providing it with specific examples in "question and answer pairs." The other methods also have their own theoretical backgrounds and wide ranges of application. Rather than rote memorization, it's important to first understand the concepts and then apply them. I cannot explain them all here, so I encourage you to read the original document. I recommend taking your time to learn them one by one.

2. Memorize Useful Words

That said, taking the first step to actually write a prompt can be quite daunting. Google has provided a list of recommended verbs, which I'd like to introduce here. Choosing from these verbs to craft your prompts might help you create good ones, so it's worth a try.

Act, Analyze, Categorize, Classify, Contrast, Compare, Create, Describe, Define, Evaluate, Extract, Find, Generate, Identify, List, Measure, Organize, Parse (especially for sentences and data grammatically), Pick, Predict, Provide, Rank, Recommend, Return, Retrieve (information, etc.), Rewrite, Select, Show, Sort, Summarize, Translate, Write

When you're unsure what to write, these verbs might give you a hint. This list includes many that I frequently use myself.

3. Finding Hints from Actual Examples

When you actually try out prompts, you'll find that some cases work well while others don't. The white paper summarizes these into 15 Best Practices. Here, I'll introduce an example from page 56.

Be specific about the output

Be specific about the desired output. A concise instruction might not guide the LLM enough

or could be too generic. Providing specific details in the prompt (through system or context

prompting) can help the model to focus on what’s relevant, improving the overall accuracy.

Examples:

DO:

Generate a 3 paragraph blog post about the top 5 video game consoles.

The blog post should be informative and engaging, and it should be

written in a conversational style.

DO NOT:

Generate a blog post about video game consoles.

Indeed, we tend to write simple prompts like the bad example. However, if we can add a bit more information and write like the good example, the information we receive will be better tailored to our needs. Just knowing this can change how you write prompts from now on. This white paper is full of such examples, so I highly recommend you read it for yourself.

How was that? I hope this serves as a reference for your prompt learning journey. Prompt engineering is still in its infancy, making it a great time to start learning. Let's conclude with a message from Google: "You don’t need to be a data scientist or a machine learning engineer – everyone can write a prompt. (1)"

Stay tuned!

1) , "Prompt Engineering”, Google, Feb 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

June 23, 2025

multi AI agent, AI, prompt, generative ai

How can we achieve best practices for constructing multi-agent AI systems?

Toshifumi Kuga

June 23, 2025

multi AI agent, AI, prompt, generative ai

Lately, I've been hearing a lot about multi-agent AI systems. As someone who is always thinking about not just using these services but building them myself, I've been keen to know how to construct high-performance AI agents. Last week, Anthropic published an article titled, "How we built our multi-agent research system(1)," which describes their construction method in detail. So today, using this article as a reference, I'd like to explore the best practices for creating multi-agent AI systems with all of you. Let's get started!

1. Why do we need so many agents?

ChatGPT, which debuted at the end of November 2022, was a single model. Since then, several services using generative AI have appeared, but initially, most of them used a single AI. So why have we recently seen a rise in methods that connect multiple generative AIs to operate as a single system? I believe it's because it has become clear that there are limits to what a single generative AI can accomplish when faced with complex tasks. It has gradually become apparent that by connecting and integrating several agents, even complex tasks can be handled. This trend has become particularly noticeable in conjunction with the performance improvements of standalone generative AI models like Gemini 1.5 Pro and OpenAI's o3.

2. What kind of agent structure should we build?

The Anthropic article included a wonderful chart that I'd love to reference. The key lies with the "Lead agent" and the "sub-agents" placed beneath it.

Here is Anthropic's explanation: "The multi-agent architecture in action: user queries flow through a lead agent that creates specialized subagents to search for different aspects in parallel" . While the chart shows three sub-agents, it's a matter of course that more may be needed to handle more complex tasks.

3. How do you coordinate many agents?

I've described the move to multi-agent AI as if it's all upside, but it requires numerous AI agents to function as expected. Getting a desired response from a single generative AI can be quite a challenge, so is it even possible to control multiple, simultaneously operating AI agents to meet our expectations? The key seems to lie in the "prompt." In fact, the Anthropic article contains countless, very helpful methods for prompt creation. Here, I'd like to introduce two representative examples. For the rest, I highly recommend reading the original article for yourself.

"Teach the orchestrator how to delegate. In our system, the lead agent decomposes queries into subtasks and describes them to subagents. Each subagent needs an objective, an output format, guidance on the tools and sources to use, and clear task boundaries. Without detailed task descriptions, agents duplicate work, leave gaps, or fail to find necessary information.

"Guide the thinking process. Extended thinking mode, which leads Claude to output additional tokens in a visible thinking process, can serve as a controllable scratchpad. The lead agent uses thinking to plan its approach, assessing which tools fit the task, determining query complexity and subagent count, and defining each subagent’s role.

In a nutshell, I think it comes down to "describing things meticulously." Apparently, simple and short instructions like "Research the semiconductor shortage" did not work well, so it seems necessary to write prompts for multi-agent AI as meticulously as possible. I'm going to work on writing better prompts from now on.

What did you think? It appears that various techniques are necessary to make multi-agent AI systems operate as intended. As the performance of generative AI improves in the future, the required orchestration techniques will also change. I want to continue to stay updated and incorporate the latest cutting-edge technologies. That's all for today. Stay tuned!

Toshi Stats Co., Ltd. provides a wide range of AI-related services. Please see here for more details!

1) , "How we built our multi-agent research system”, Anthropic, June 13, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

June 8, 2025

white collar, generative ai, recursiveself-improvement, Anthropic

What Will White-Collar Jobs Be Like in 2030? What Should We Do Now?

Toshifumi Kuga

June 8, 2025

white collar, generative ai, recursiveself-improvement, Anthropic

As many of you may know, Dario Amodei has issued a warning to people. Roughly speaking, he stated, "The demand for entry-level jobs, such as those performed by new graduates, will be cut in half. This will become a reality within the next one to five years." This is shocking news, and the fact that it came from the CEO of a company actually developing generative AI has made it a global topic of discussion. In this article, I would like to delve deeper into this matter.

1. Dario Amodei's Warning

He is the co-founder and CEO of Anthropic, a U.S. company developing generative AI. He holds a Ph.D. in Physics from Princeton University, and from what I've seen, he strikes me more as a researcher than a business executive. I've been following his statements for the past two years, and I remember them being relatively conservative. I thought they were consistent with his researcher-like nature. However, this time he stated, "We are not keeping up with the pace of AI evolution," and "Unemployment rates will be 10% to 20%" (1), which shocked the world. I don't recall similar warnings from other frontier model development companies like OpenAI or Google DeepMind. This is why his latest statement garnered so much attention.

2. Current Performance of Generative AI

Currently, generative AI indeed possesses sufficient ability to handle entry-level tasks. As I mentioned before, Google Gemma 3, an open-source generative AI, achieved an accuracy of around 80% without any specific tuning for a 6-class classification task of bank customer complaints. Typically, relatively simple tasks like "Which product does this complaint relate to?" are assigned to new employees, and they learn the ropes through these assignments. However, with generative AI's performance reaching this level, management will undoubtedly lose the incentive to assign tasks to new employees at a cost. It's not yet clear whether the impact will be as significant as half of entry-level jobs disappearing, but given that even free generative AI can achieve around 80% accuracy today, a considerable impact is inevitable.

3. So, What Should We Do?

There is a division of opinion among experts regarding when AGI (Artificial General Intelligence), with capabilities equivalent to human experts, will appear. The most common estimate seems to be around 2030, but honestly, it's not clear. If so, we have about five years. In any case, we need to adapt our skills to the advent of AGI. Past computers could not be instructed or managed without a computer language. However, with the emergence of ChatGPT in November 2022, generative AI can now be instructed using natural language—"prompts." However, prompting is not a simple matter. It's an extremely delicate process of finely controlling the behavior of generative AI to precisely fit one's needs. Therefore, it's not uncommon to write prompts exceeding 20 to 30 lines. While I cannot delve into the detailed techniques here, it is certainly a skill that requires logical prompt writing. Even though prompts can be written in English or Japanese, acquiring this skill requires time and individual training. Given that open-source and free generative AIs are rapidly improving in performance, it is imperative for us, as users, to learn "prompting," the method of controlling them, regardless of our position or industry.

What do you think? It's good that Dario Amodei's warning has sparked more active discussion. As I mentioned in my previous blog post, generative AI is on the verge of implementing recursive self-improvement, gaining the ability for computers to improve themselves. The evolution of generative AI will accelerate further in the future. I believe the time has come to thoroughly learn prompting and prepare for the emergence of AGI. Discussions about AI and employment will continue globally. ToshiStats will keep you updated. Stay tuned!

ToshiStats Co., Ltd. offers various AI-related services. Please check them out here!

1) AI company's CEO issues warning about mass unemployment, CNN, May 30, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

May 13, 2025

LLM, generative ai, ADK, agent

We Built a Customer Complaint Classification Agent with Google's New AI Agent Framework "ADK"

Toshifumi Kuga

May 13, 2025

LLM, generative ai, ADK, agent

On April 9th, Google released a new AI agent framework called "ADK" (Agent Development Kit). It's an excellent framework that incorporates the latest multi-agent technology while also being user-friendly, allowing implementation in about 100 lines of code. At Toshi Stats, we decided to immediately try creating a customer complaint classification agent using ADK.

1. Customer Complaint Classification Task

Banks receive various complaints from customers. We want to classify these complaints based on which financial product they concern. Specifically, this is a 6-class classification task where we choose one from the following six financial products. Random guessing would yield an accuracy below 20%.

2. Implementation with ADK

Now, let's move on to the ADK implementation. We'll defer to the official documentation for file structure and other details, and instead show how to write the AI agent below. The "instruction" part is particularly important; writing this carefully improves accuracy. This is what's known as a "prompt". In this case, we've specifically instructed it to select only one from the six financial products. Other parts are largely unchanged from what's described in tutorials, etc. It has a simple structure, and I believe it's not difficult once you get used to it.

3. Accuracy Verification

We created six classification examples and had the AI agent provide answers. In the first example, I believe it answered "student loan" based on the word "graduation." It's quite smart! Also, in the second example, it's presumed to have answered "mortgage " based on the phrase "prime location." ADK has a built-in UI like the one shown below, which is very convenient for testing immediately after implementation.

The generative AI model used this time, Google's "gemini-2.5-flash-04-17," is highly capable. When tasked with a 6-class classification problem using 100 actual customer complaints received by a bank, it typically achieves an accuracy of over 80%. For simple examples like the ones above, it wouldn't be surprising if it achieved 100% accuracy.

So, what did you think? This was our first time covering ADK, but I feel it will become popular due to its high performance and ease of use. Combined with A2A(2), which was announced by Google around the same time, I believe use cases will continue to increase. We're excited to see what comes next! At Toshi Stats, we will continue to build even more advanced AI agents with ADK. Stay tuned!

1) Agent Development Kit, Google, April 9th, 2025
2) Agent2Agent. Google, April 9th, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

April 12, 2025

gemma3, generative ai, Google DeepMind, ollama

Running Google's Generative AI 'Gemma 3' on a MacBook Air M4 is Impressive!

Toshifumi Kuga

April 12, 2025

gemma3, generative ai, Google DeepMind, ollama

Gemma 3 (1) has been released by Google. While open-source generative AI seemed to be somewhat lagging behind Chinese competitors, it looks like a model capable of competing has finally arrived. Of course, its performance is excellent, but its efficiency, allowing implementation even on a single GPU, is also a key appeal. So, this time, we got our hands on the latest M4 chip-equipped MacBook Air 13 (10-core GPU, 24GB unified memory, 512GB storage) to actually run it locally and check its accuracy and computation speed. Let's get started right away.

1. Data Used in the Experiment

Customer complaints submitted to US banks are publicly available (2). We prepared 10,000 of these data points and had Gemma 3 predict, "What specific financial product is this complaint about?". Specifically, this is a 6-class classification task, choosing one from the following six financial products. The numbers listed above in the image description are used as the labels.

2. Hardware and Software Used

We prepared the latest model of the MacBook Air 13. To implement Gemma 3 locally, we used Ollama (3). This software is often used for implementing generative AI on PCs; it lacks a UI, but is consequently lightweight and easy to use. Additionally, to enable easy swapping of the generative AI with different models in the future, we built the classification process using LangChain (4). The generative AI model used this time was Gemma3-12B-it, downloaded via Ollama.

3. Confusion Matrix Showing Results

We ran the classification on 10,000 samples. Although the model was used straight out-of-the-box without fine-tuning, it achieved a sufficient accuracy of 0.7558. Despite the considerable sample size, the computation time was about 14 hours, manageable within a day. The latest M4 chip truly is powerful. Looking at the confusion matrix, it seems distinguishing between "Bank account or service" and "Checking or savings account" was challenging.

Conclusion

So, what did you think? While I've tried various generative AIs in the past, this was my first time experimenting with 10,000 samples. The classification accuracy was good, and above all, not having to worry about costs is one of the advantages of running generative AI locally. Also, while the analysis data used this time is public, some tasks involve confidential information that cannot be uploaded to the cloud. In such cases, the analysis method presented here becomes a valid solution. I highly encourage everyone to give it a try. We plan to conduct more experiments using various generative AIs, so please look forward to them. Stay tuned!

1) gemma3 https://blog.google/technology/developers/gemma-3/

2) Consumer Complaint Database https://www.consumerfinance.gov/data-research/consumer-complaints/

3) Ollama https://ollama.com/

4) LangChain https://www.langchain.com/

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

February 4, 2025

DeepSeek-R1, artificial intelligence, generative ai, AGI, GRPO

DeepSeek-R1's Impact and the Future of Generative AI

Toshifumi Kuga

February 4, 2025

DeepSeek-R1, artificial intelligence, generative ai, AGI, GRPO

Hello, DeepSeek-R1, released on January 20th (1), has sparked excitement among AI professionals and investors worldwide. I believe it's had an impact comparable to that of ChatGPT's emergence. Here, I'd like to consider why it has garnered so much global attention.

1. What Was New?

DeepSeek-R1's performance is remarkable. It stands shoulder-to-shoulder with OpenAI's o1 model, a veteran in inference models. Below is a comparison of performance across various benchmarks, where DeepSeek-R1 rivals the o1 model. The fact that a newcomer model suddenly matched OpenAI, the frontrunner in generative AI, is undoubtedly why the world is so astonished.

Performance comparison across various benchmarks

While DeepSeek-R1 appeared suddenly like a comet, there were several technical breakthroughs. Among the most significant is a training method called "GRPO." DeepSeek-R1 uses reinforcement learning to acquire advanced reasoning abilities in mathematics and coding. This is similar to existing generative AI. Reinforcement learning is a powerful training technique that doesn't require so-called "correct answer data," but it's a complex and resource-intensive approach. DeepSeek adopted a method that requires only one model instead of the usual two. This is "GRPO." Here is an overview, with PPO in the upper section representing a common technique used in existing models, and GRPO in the lower section being a new method.

In comparison, GRPO lacks the Value model present in PPO, and has only a Policy model. This means that only one model is needed instead of two. Since the model here refers to a massive generative AI, being able to complete training with only one model has a massive impact on resource saving. The fact that DeepSeek-R1, developed by a Chinese company unable to use the latest GPUs due to US semiconductor export restrictions, achieved such remarkable results might be related to this. For more details on the technical aspects, please refer to the research paper (2). The research paper (3) first introduced GRPO.

2. Why Did It Attract Global Attention?

DeepSeek-R1 was released as an open-weight model, available for anyone to download and use. Additionally, the entire training method, including GRPO, was published in detail in research papers. Until now, most generative AI models, with a few exceptions, could only be accessed via APIs and not downloaded. Furthermore, how they were trained was rarely disclosed, making them black boxes. In this context, the release of DeepSeek-R1, a cutting-edge model, in a usable form for AI researchers worldwide had a profound impact. Even if a model is called amazing, if the inner workings are unknown, neither criticism nor improvement suggestions can be made. With DeepSeek-R1, I feel that the open-source community can, for the first time, participate in the development of the most advanced generative AI models.

3. What Will Become of Generative AI in the Future?

AI developers around the world are already starting to adopt methods like GRPO in the development of state-of-the-art models. DeepSeek-R1 has proven that it's possible without incurring enormous costs. I'm currently focusing on a public project called "Open-R1" (4), which plans to disclose not only the training data but also the code, which was not revealed with DeepSeek-R1, and I believe this is revolutionary.

Of course, it is expected that such projects will start worldwide, and I am looking forward to that. It's exciting!

How was it? The landscape surrounding generative AI has changed in an instant. New generative AI models will continue to be created. It's really hard to take your eyes off of it. I will continue to deliver further news. Stay tuned!

1) DeepSeek-R1 Release, Jan 22, 2025
2) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, DeepSeek-AI, Jan 22, 2025
3) DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, DeepSeek-AI, Apr 27,2025
4) Open-R1: a fully open reproduction of DeepSeek-R1, Hugging Face, Jan 28, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

January 28, 2025

agent, artificial intelligence, generative ai, smolagents, data analysis

Marketing AI agents for customer targeting in telemarketing can also be easily implemented using the new library "smolagents." This looks promising!

Toshifumi Kuga

January 28, 2025

agent, artificial intelligence, generative ai, smolagents, data analysis

1. Marketing AI Agent

To efficiently reach potential customers, it's necessary to target customers who are likely to purchase your products or services. Marketing activities directed at customers without needs are often wasteful and unsuccessful. However, identifying which customers to focus on from a large customer list beforehand is a challenging task. To meet the expectation of easily targeting customers without complex analysis, provided you have customer-related data at hand, we have implemented a marketing AI agent this time. Anyone with basic Python knowledge should be able to implement it without much difficulty. The secret to this lies in the latest framework "smolagents" (1), which we introduced previously. Please refer to the official documentation for details.

2. Agent Predicting Potential Customers for Deposit-Taking Telemarketing

Let's actually build an AI agent. The theme is "Predicting potential customers for deposit-taking telemarketing with an AI agent using smolagents." As before, by providing data, we want the AI agent itself to internally code using Python and automatically display "the top 10 customers most likely to be successfully reached by telemarketing."

While the coding method should be referenced from the official documentation, here we will present what kind of prompt to write to make the AI agent predict potential customers for deposit-taking telemarketing. The key point, as before, is to instruct it to "use sklearn's HistGradientBoostingClassifier for data analysis." This is a gradient boosting library, highly regarded for its accuracy and ease of use.

Furthermore, as a question (instruction), we specifically add the instruction to calculate "the purchase probability of the 10 customers most likely to be successful." The input to the AI agent is in the form of "prompt + question."

Then, the AI agent automatically generates Python code like the following. The AI agent does this work instead of a human. And as a result, "the top 10 customers most likely to be successfully marketed to" are presented. Customers with a purchase probability close to 100%! Amazing!

"Top 10 customers most likely to be successfully marketed to"

In this way, the user only needs to instruct "tell me the top 10 customers most likely to be successful," and the AI agent writes the code to calculate the purchase probability for each customer. This method can also be applied to various other things. I'm looking forward to future developments.

3. Future Expectations for Marketing AI Agents

As before, we implemented it with "smolagents" this time as well. It's easy to implement, and although the behavior isn't perfect, it's reasonably stable, so we plan to actively use it in 2025 to develop various AI agents. The code from this time has been published as a notebook (2). Also, the data used this time is relatively simple demo data with over 40,000 samples, but given the opportunity, I would like to try how the AI agent behaves with larger and more complex data. With more data, the possibilities will increase accordingly, so we can expect even more. Please look forward to the next AI agent article. Stay tuned!

1) Introducing smolagents, a simple library to build agents, Aymeric Roucher, Merve Noyan, Thomas Wolf, Hugging Face, Dec 31,2024
2) https://github.com/TOSHISTATS/AI-agent-for-Marketing_20250125/blob/main/AI_agent_for_Marketing_20250125.ipynb

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

January 23, 2025

Hugging Face, smolagents, generative ai, LLM, agent, artificial intelligence

I tried using the new AI agent framework "smolagents". The code is simple and easy to use, and I recommend it for AI agent beginners!

Toshifumi Kuga

January 23, 2025

Hugging Face, smolagents, generative ai, LLM, agent, artificial intelligence

At the end of last year, a new AI agent framework called "smolagents" was released from Hugging Face (1). The code is simple and easy to use, and it even supports multi-agents. This time, I actually created a data analysis AI agent and tried various things. I hope it will be helpful.

1. Features of "smolagents"
The newly released "smolagents" has features that existing frameworks do not have. 1) First, it has a simple structure. You can execute an AI agent by writing 3 to 5 lines of code. It's perfect for those who want to start with AI agents. 2) Also, since it was released by Hugging Face, there are already a huge number of open-source models on the Hub. You can easily call and use them. Of course, it also supports proprietary models such as GPT4o, so you can use it for both open and closed models. 3) Finally, when you execute an agent, python code is generated and acted upon. Therefore, you can use the assets of the vast Python ecosystem, which is very convenient. Especially for those who specialize in data analysis like me, it is a perfect framework because you can use Python libraries such as sklearn.

2. An Agent for Predicting Credit Card Defaults

Now, let's actually build an AI agent. The theme is "AI agent by smolagent predicts credit card defaults". Normally, when building a default prediction model, you would code using machine learning libraries such as sklearn, but this time, I want to give it data and have the AI agent itself code internally using Python and automatically display the default probabilities of the first 10 customers.

For how to write the code, please refer to the official documentation , but here I would like to present what kind of prompts I actually wrote to make the AI agent predict defaults. The point is to specifically instruct it to "use sklearn's HistGradientBoostingClassifier for data analysis". This library is highly evaluated for creating machine learning models with high accuracy and ease of use. This is domain knowledge of data analysis, but by including that knowledge in the prompt, we expect to obtain higher accuracy.

Furthermore, as a question, I will add an instruction to specifically calculate "the default probability of 10 customers". The AI agent is input in the form of "prompt + question".

Then, the AI agent automatically generated the following Python code. Normally, this is what I would write myself, but the AI agent does it for me. And as a result, the default probabilities for 10 people are also shown. Amazing!

In this way, the user only needs to instruct "use sklearn to calculate the default probability", and the AI agent writes the code to calculate the default probability for each customer. And you will be able to make default predictions for each customer. I tried it with default prediction this time, but I think it can be covered to the probability in any business, such as marketing, customer churn and human resources. I'm looking forward to future developments.

3. Impressions after using "smolagents" for the first time

Until now, I used LangGraph to implement AI agents. I liked it because I could make various detailed settings, but it was necessary to code each of state, tool, node, edge, etc., and I felt that the hurdle was high for beginners to start with. After implementing it with "smolagents" this time, I found that if I coded according to the template, it would run by writing a few lines, so anyone could start. Of course, it fully meets the needs of AI developers, so I plan to actively use it in 2025 to develop various AI agents. I have published the code this time in a notebook (2). Please look forward to the next AI agent article. Stay tuned!

(1) Introducing smolagents, a simple library to build agents, Aymeric Roucher, Merve Noyan, Thomas Wolf, Hugging Face, Dec 31,2024
(2) AI-agent-to-predict-default-of-credit-card-with-smolagent_20250121

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

September 23, 2024

AGI, LLM, OpenAI o1, generative ai, Chain of Thought

OpenAI o1-preview: A Breakthrough in Generative AI, Introducing a Novel Paradigm

Toshifumi Kuga

September 23, 2024

AGI, LLM, OpenAI o1, generative ai, Chain of Thought

Last week, we introduced OpenAI o1(1). Despite still being in preview, it boasts high performance, and as evidenced by the leaderboard below(2), it seems to be widely regarded as having overwhelming performance, especially in mathematics and coding. In this article, we'd like to explore why OpenAI o1 demonstrates higher accuracy compared to existing generative AI models like GPT4.

Scores of various generative AIs in the mathematics field

1. Chain of Thought is Key

The star of the OpenAI o1 model is "Chain of Thought." This refers to "a series of intermediate reasoning steps," which was previously considered an important element of prompts created by users for existing generative AI models. By incorporating "Chain of Thought" into prompts, users enabled generative AI to engage in deeper and broader thinking before providing an answer, thereby improving accuracy. "Chain of Thought" became known to the public through a 2022 research paper(3). Please refer to it for details.

2. OpenAI o1 Can Generate Chain of Thought Independently

OpenAI o1 can generate Chain of Thought internally on its own. Users don't need to devise Chain of Thought themselves; it's generated automatically. This is why it achieved high accuracy in mathematics and coding. Unfortunately, OpenAI seems to have decided not to disclose the Chain of Thought itself. Users can only see a summary of it. If you're like most users, you're probably thinking, "I'd really like to see that!" We hope that OpenAI will change its policy and release it in the future.

3. Creating a Reward Model

OpenAI has released very little information about what we'll discuss from here on. Please note that the following is based on speculation drawn from previously published research papers and information shared by OpenAI researchers. To enable generative AI to automatically generate task-specific Chain of Thought for practical use, we must evaluate whether the generated Chain of Thought is actually correct. This is where the Reward model comes into play. A 2023 research paper(4) from OpenAI provides a detailed explanation of how to train a Reward model, so let's look to it for clues.

The data for training the Reward model takes the form of Chain of Thought, as shown below. This research paper limits the tasks to mathematics. Since it's challenging for humans to manually create Chain of Thought for each task, they are automatically generated using GPT4. This is called the Generator. Humans then label each step of the Chain of Thought generated by the Generator on a three-point scale (correct, incorrect, or neither). This completes the training data. In the example below, you can see that each step is assigned a three-point label. It must have been quite a task for humans to label a large amount of such data.

4. Training Generative AI Through Reinforcement Learning

Once the Reward model is complete, we can train the generative AI using reinforcement learning. As a result, the generative AI can generate the correct Chain of Thought for the task. We, the users, actually run OpenAI o1 and benefit from the generated Chain of Thought. Unfortunately, OpenAI has not disclosed the specific method for training OpenAI o1 using reinforcement learning. Since this directly affects accuracy, it's unlikely to be released in the future. However, researchers worldwide are working on this issue and have published several promising results. As this is a technology that will support the future development of generative AI, we would like to revisit it in a future article.

5. A New Paradigm for Generative AI

OpenAI's website includes the following statement and chart:

“Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.”

Until now, there has been much discussion about how increasing the computational cost (time and number of parameters) in pre-training improves the accuracy of generative AI, but there hasn't been much in-depth discussion about the relationship between inference computational cost and accuracy. However, it has now become clear that by generating Chain of Thought itself and then providing an answer, generative AI can answer tasks requiring complex logical reasoning with high accuracy, albeit with significantly increased inference computational cost. The chart on the right above shows that accuracy improves as the computational cost at inference time increases. We believe this is a groundbreaking development. Therefore, it will be important to consider both training and inference computational costs for generative AI in the future. This marks the dawn of a new paradigm for generative AI.

OpenAI o1 has not only improved accuracy but has also brought about a new paradigm for the development of generative AI. We look forward to seeing how future generative AIs will follow in its footsteps. That's all for today. Stay tuned!

1) Introducing OpenAI o1, OpenAI, Sep 12, 2024
2) LMSYS Chatbot Arena Leader board
3) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Google Research, Brain Team, Jan 2023
4) Let’s Verify Step by Step, Hunter Lightman , Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe, OpenAI, May 2023

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

September 16, 2024

AGI, LLM, generative ai, o1, OpenAI

OpenAI's "o1-preview" Arrives: Is This the Next Leap Towards Artificial General Intelligence?!

Toshifumi Kuga

September 16, 2024

AGI, LLM, generative ai, o1, OpenAI

On September 12, 2024, OpenAI released its new generative AI model "o1" (pronounced "oh-one"), which had been the subject of much speculation. I had the opportunity to try it out, and here are my initial impressions.

1. Model Overview

As a new generative AI model, o1 has various features, but the key points are as follows:

Specialized for scientific, coding, and mathematical reasoning.
Available in two versions: OpenAI o1 and OpenAI o1-mini.
Currently in preview with limited functionality and performance.
Not a successor to GPT-4.
OpenAI o1 has a limited usage of 30 requests per week.
Price: OpenAI o1 is about six times more expensive than GPT-4o.

For more details, please refer to the official website (1).

Compared to GPT-4o, o1-preview demonstrates superior performance in coding, data analysis, and mathematics, as shown below. It seems likely that o1 will excel in fields where existing generative AI has struggled to achieve satisfactory accuracy. However, because it utilizes Chain of Thought reasoning to arrive at answers, it can take a considerable amount of time to respond, making it unsuitable for tasks requiring real-time answers.

GPT-4o vs. o1-preview: Task Performance Comparison

2. Challenging o1 with Game24

Let's test the capabilities of o1-preview. A common example of a task that generative AI struggles with is Game24.

This is a simple mathematical puzzle with the following rules:

Use the four given numbers and basic arithmetic operations (addition, subtraction, multiplication, division).
Create a mathematical expression that results in 24.
Each of the four given numbers can be used only once.

Example: 13, 10, 9, 4 → (10 - 4) × (13 - 9)

When attempting this with o1-preview, it produced the following result. It successfully solved the puzzle! The response took about 15 seconds, likely due to internal trial-and-error processes.

When trying the same with GPT-4o:

GPT-4o fails to provide a correct answer. This highlights o1's superiority in tasks that require strong logical reasoning.

3. The Impact on the Future of Generative AI

o1's newfound capabilities are attributed to its incorporation of Chain of Thought reasoning, enabling it to generate task-specific chains of thought and produce more reliable correct answers. However, the Chain of Thought process, which demonstrates how the correct answer is derived, is not revealed to the user. This is somewhat disappointing, as users typically want to understand not only the correct answer but also "why" that answer was reached. Therefore, it's understandable that some may perceive it as a black box. We hope that the open-source development community will further research this aspect and share their findings with the world. With excellent open-source generative AI models like Llama and Gemma currently available, we believe that user verification of Chain of Thought will become possible in the near future.

Conclusion

o1-preview seems to have been received with a level of excitement not seen since the release of GPT-4 in March 2023. In the next installment, I plan to explore the technology behind this impressive generative AI, based on external speculation. That's all for today. Stay tuned!

1) Introducing OpenAI o1, OpenAI, Sep 12, 2024

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

August 23, 2024

AlphaGo, Google DeepMind, generative ai, AlphaProof

The Future of Generative AI: Predicting the Next Generation Based on Google DeepMind's Math Olympiad Breakthrough

Toshifumi Kuga

August 23, 2024

AlphaGo, Google DeepMind, generative ai, AlphaProof

Generative AI has a reputation for struggling with math, often making mistakes even with simple elementary-level arithmetic. However, Google DeepMind recently announced that their AI achieved a score equivalent to a silver medal in the International Mathematical Olympiad (IMO)(1). Based on this article, let's delve into predicting the future of next-generation generative AI.

1. How Did AI Solve Complex Math Problems?

The achievement is impressive:

“Today, we present AlphaProof, a new reinforcement-learning based system for formal math reasoning, and AlphaGeometry 2, an improved version of our geometry-solving system. Together, these systems solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time.”

This is an amazing score, just shy of a gold medal. We'll focus on AlphaProof, the reasoning system, out of the two models.

AlphaProof is explained as follows:

“AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go.”

In simple terms, while there is abundant data available for math problems written in natural language, generative AI tends to make plausible yet incorrect statements (hallucinations), making it difficult to utilize effectively. Therefore, Google utilized its generative AI, Gemini, to translate math problems into the formal language Lean. This formal representation was then fed into AlphaZero, known for its long-term planning and reasoning capabilities, for computation. The chart below provides a clear illustration.

AlphaZero has already proven its reasoning prowess in board games like Go. This achievement demonstrates the successful application of its capabilities to the realm of mathematics. Remarkable!

2. Implications from AlphaZero

Let's briefly revisit AlphaZero, which made a reappearance. It is a groundbreaking AI that combines RL (Reinforcement Learning) and MCTS (Monte Carlo Tree Search). The initial model gained fame in March 2016 as the first AI to defeat a top professional Go player. It's important to emphasize that AlphaZero achieved superhuman ability without relying on human-created data; it trained itself using self-generated data. Upon hearing this for the first time, many might wonder, "How is that even possible?" AlphaZero accomplishes this through self-play, generating massive amounts of training data by playing against itself. Refer to the research paper(2) for more details. For context, consider AlphaGo as the initial version of AlphaZero.

3. The Fusion of Current Generative AI and AlphaGo

Interestingly, Demis Hassabis, CEO of Google DeepMind, recently hinted at the future of their generative AI(3). The key takeaways are:

“Gemini” is a natively multimodal model.
It can understand various aspects of the world, including language, images, videos, and audio.
Current models are incapable of long-term planning and problem-solving.
DeepMind possesses expertise in this field through AlphaGo.
The next-generation model will be an agent that fuses Gemini and AlphaGo.

It's plausible to view the project that secured a silver medal in the Math Olympiad as a step towards overcoming the limitations of generative AI in "long-term planning." However, one might question, "How exactly will this fusion work?" A prominent long-form paper (4) in June of this year provides clues.

A look back at AlphaGo—the first AI system that beat the world champions at Go, decades before it was thought possible—is useful here

• In step 1, AlphaGo was trained by imitation learning on expert human Go games. This gave it a foundation.

• In step 2, AlphaGo played millions of games against itself. This let it become superhuman at Go:

remember the famous move 37 in the game against Lee Sedol, an extremely unusual but brilliant move a human would never have played. Developing the equivalent of step 2 for LLMs is a key research problem for overcoming the data wall (and, moreover, will ultimately be the key to surpassing human-level intelligence).

AlphaGo eventually transitioned to self-play, generating its own training data and eliminating the need for human input. This is a remarkable achievement achieved through the combination of "Reinforcement Learning and MCTS." The future of next-generation AI hinges on how generative AI can be trained using this mechanism.

Conclusion:

The ability to execute long-term plans opens up a plethora of possibilities. Imagine AI formulating long-term investment strategies or serving as legal advisors in court, excelling in tasks that demand prolonged reasoning and debate. The world is undoubtedly on the verge of transformation, and the future is incredibly exciting.

That's all for today. Stay tuned!

1) AI achieves silver-medal standard solving International Mathematical Olympiad problems, Google DeepMind, 25 JULY 2024
2)Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Google DeepMind, 5 DEC 2017
3)Unreasonably Effective AI with Demis Hassabis, Google DeepMind, 14 AUG 2024 (around 18:00)
4) SITUATIONAL AWARENESS　p28, The Decade Ahead, Leopold　Aschenbrenner, June 2024　

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.