Gemini 3 Flash: The Multi-modal Powerhouse Dominating the 2026 AI Scene!

Gemini 3 Flash (1) — likely the final major AI model debut of 2025 — is currently making waves. Despite being positioned as an affordable, mid-tier model, its performance is reportedly on par with flagship models. Today, I want to put Gemini 3 Flash to the test and see just how much its multimodal capabilities have evolved. Let’s dive right in.

 

1. App Development

To conduct our experiments, I wanted to create a simple application using Google AI Studio. By simply entering a prompt into the interface, the app was ready in an instant. No Python was used at all. This level of accessibility means even non-engineers can build functional apps now. Things have truly become incredibly convenient.

 

2. Object Counting

First, I challenged the model with a task that has historically been difficult for AI: counting objects. I asked the AI to count the number of cans and cars in an image. I counted them myself as well, and the AI’s response was spot on. At this level of accuracy, we might no longer need specialized object detection models for general tasks.

 

3. Economic Analysis from Charts

Next, let’s try a task that requires a higher level of intelligence: interpreting economic indicators from charts and generating an analytical report. Japan has entered a super-aging society faster than any other developed nation, and the labor force is steadily declining. For this test, I provided charts for the labor force population, unemployment rate, and Manufacturing Sector hourly wages. I then instructed the AI to read these charts, synthesize the data, and produce a comprehensive analysis.

labor force population

unemployment rate

                Manufacturing Sector hourly wages

In 30 seconds, the economic report was generated. Below is an excerpt. I was genuinely impressed by the depth of analysis derived from just three charts. Gemini 3 Flash is truly formidable!

 

Conclusion

What do you think? Gemini 3 Flash is a fantastic value, being significantly cheaper than rival flagship models. Given that its multimodal performance is top-tier, I believe this will become the "go-to" model for many users. For AI startups like ours, having a model that allows for extensive experimentation with high token volumes without breaking the bank is incredibly reassuring. I highly recommend giving it a try!

Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!


1) Gemini 3 Flash: frontier intelligence built for speed, Dec 17, 2025, Google

Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Improving ML Vibe Coding Accuracy: Hands-on with Claude Code's Plan Mode

2025 was a year where I actively incorporated "Vibe Coding" into machine learning. After repeated trials, I encountered situations where coding accuracy was inconsistent—sometimes good, sometimes bad.

Therefore, in this experiment, I decided to use Claude Code "Plan Mode" (1) to automatically generate an implementation plan via an AI agent before generating the actual code. Based on this plan, I will attempt to see if a machine learning model can be built stably using "Vibe Coding." Let's get started!

 

1. Generating an Implementation Plan with Claude Code "Plan Mode"

Once again, I would like to build a model that predicts in advance whether a customer will default (on a loan, etc.). I will use publicly available credit card default data (2). For the code assistant, I am using Claude Code, and for the IDE, the familiar VS Code.

To provide input to the Claude Code AI agent, I summarized the task and implementation points into a "Product Requirement Document (PRD)." This is the only document I created.

I input this PRD into Claude Code "Plan Mode" and instructed it to: "Create a plan to create predictive model under the folder of PD-20251217".

Within minutes, the following implementation plan was generated. Comparing it to the initial PRD, you can see how refined it is. Note that I am only showing half of the actual plan generated here—a truly detailed plan was created. I can only say that the ability of the AI agent to envision this far is amazing.

 

2. Beautifully Visualizing Prediction Accuracy

When this implementation plan is approved and executed, the prediction model is generated. Naturally, we are curious about the accuracy of the resulting model.

Here, it is visualized clearly according to the implementation plan. While these are familiar metrics for machine learning experts, all the important ones are covered and visualized in an easy-to-understand way, summarized as a single HTML file viewable in a browser.

The charts below are excerpts from that file. It includes ROC curves, SHAP values, and even hyperparameter tuning results. This time, the total implementation time was about 10 minutes. If it can be generated automatically to this extent in that amount of time, I’d rather leave it to the AI agent.

 

3. Meta-Prompting with Claude Code "Plan Mode"

A Meta-Prompt refers to a "prompt (instruction to AI) used to create and control prompts."

In this case, I called Claude Code "Plan Mode" and instructed it to "generate an implementation plan" based on my PRD. This is nothing other than executing a meta-prompt in "Plan Mode."

Thanks to the meta-prompt, I didn't have to write a detailed implementation plan myself; I only needed to review the output. It is efficient because I can review it before coding, and since that implementation plan can be viewed as a highly precise prompt, the accuracy of the actual coding is expected to improve.

To be honest, I don't have the confidence to write the entire implementation plan myself. I definitely want to leave it to the AI agent. It has truly become convenient!

 

How was it? Generating implementation plans with Claude Code "Plan Mode" seems applicable not only to machine learning but also to various other fields and tasks. I definitely intend to continue trying it out in the future. I encourage everyone to give it a challenge as well.

That’s all for today. Stay tuned!




You can enjoy our video news ToshiStats-AI from this link, too!

1) How to use Plan Mode,  Anthropic

2) Default of Credit Card Clients








Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Can You "Vibe Code" Machine Learning? I Tried It and Built an App

2025 was the year the coding style known as "Vibe Coding" truly gained mainstream acceptance. So, for this post, I conducted an experiment to see just how far we could go in building a machine learning model using only AI agents via "Vibe Coding"—with almost zero human programming involved. Let's get started!

 
  1. The Importance of the "Product Requirement Document" for Task Description

This time, I wanted to build a model that predicts whether bank loan customers will default. I used the publicly available Credit Card Default dataset (1).

In Vibe Coding, we delegate the actual writing of the program to the AI agent, while the human shifts to a reviewer role. In practice, having a tool called a "Code Assistant" is very convenient. For this experiment, I used Google's Gemini CLI. For the IDE, I used the familiar VS Code.

Gemini CLI

To entrust the coding to an AI agent, you must teach it exactly what you want it to do. While it is common to enter instructions as prompts in a chatbot, in Vibe Coding, we want to use the same prompts repeatedly, so we often input them as Markdown files.

It is best to use what is called a "Product Requirement Document (PRD)" for this content. You summarize the goals you want the product to achieve, the libraries you want to use, etc. The PRD I created this time is as follows:

PRD

By referencing this PRD and entering a prompt to create a default prediction model, the model was built in just a few minutes. The evaluation metric, AUC, was also excellent, ranging between 0.74 and 0.75. Amazing!!

 

2. Describing the Folder Structure with PROJECT_SUMMARY

It is wonderful that the machine learning model was created, but if left as is, we won't know which files are where, and handing it over to a third party becomes difficult.

Therefore, if you input the prompt: "Analyze the current directory structure and create a concise summary that includes: 1. A tree view of all files 2. Brief description of what each file does 3. Key dependencies and their purposes 4. Overall architecture pattern Save this as PROJECT_SUMMARY.md", it will create a Markdown file like the one below for you.

PROJECT_SUMMARY.md

With this, anyone can understand the folder structure at any time, and it is also convenient when adding further functional extensions later. I highly recommend creating a PROJECT_SUMMARY.md.

 

3. Adding a UI and Turning the ML Model into an App

Since we built such a good model, we want people to use it. So, I experimented to see if I could build an app using Vibe Coding as well.

I created PRD-pdapp.md and asked the AI agent to build the app. I instructed it to save the model file and to use Streamlit for app development. The actual file and its translation are below:

PRD-pdapp.md

When executed, the following app was created. It looks cool, doesn't it?

You can input customer data using the boxes and sliders on the left, and when you click the red button, the probability of default is calculated.

  • Customer 1: Default probability is 7.65%, making them a low-risk customer.

  • Customer 2: Default probability is 69.15%, which is high, so I don't think we can offer them a loan. The PAY_0 Status is "2", meaning their most recent payment status is 2 months overdue. This is the biggest factor driving up the default probability.

As you can see, having a UI is incredibly convenient because you can check the model's behavior by changing the input data. I was able to create an app like this using Vibe Coding. Wonderful.

 

How was it? It was indeed possible to perform machine learning using Vibe Coding. However, instead of programming code, you need to create precise PRDs. I believe this will become a new and crucial skill. I encourage you all to give it a try.

That’s all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Default of Credit Card Clients

 



Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The OpenAI Code Red: What’s Next for the Generative AI Market?

In late November 2022, OpenAI released ChatGPT. It has been three years since then, and just as it was about to celebrate its third birthday, an event occurred that dampened the celebratory mood. CEO Sam Altman declared a "CODE RED" (Emergency) (1). The driving force behind this was the breakthrough of the new generative AI, "Gemini 3" (2), released by Google on November 18. Today, I would like to delve into this theme and forecast the generative AI market for 2026. Let’s get started.

 

1. Gemini 3 vs. GPT-5

On August 6, 2025, OpenAI released GPT-5. Since it was the first major update since GPT-4, people had very high expectations. However, in reality, it was difficult to perceive a significant difference compared to other models. Although it managed to update scores across various benchmarks, the impression was that its impact felt somewhat muted compared to the arrival of GPT-4.

Of course, it is evolving steadily, so if rival companies' models had remained stagnant, I believe it could have celebrated its third birthday peacefully. However, the moves made by its rival, Google, surpassed our expectations. On November 18, 2025, Gemini 3 was released, and everyone was astonished by its high performance. Its scores in almost all benchmarks surpassed those of GPT-5, and for the first time since the birth of ChatGPT, GPT-5 lost its "technological competitive advantage." The battle surrounding generative AI has entered a new phase.

 

2. Why Gemini 3 is Particularly Superior

There are several technical talking points, but what I am paying special attention to is its high capability in image processing and generation. As shown in the leaderboard (3) below, its strength is overwhelming and unrivaled. The famous image generation app Nano Banana Pro is officially named Gemini 3-Pro-Image, and its high scores truly stand out.

                        Leaderboard

When considering individual customers, the ability to easily generate and edit images exactly as envisioned is crucial and can serve as a "killer app." I feel that once individuals experience the technical level of Gemini 3, they will find it difficult to easily switch back to competitor apps. The image below was generated using Nano Banana Pro. As you can see, it has become easy to render both English and Japanese text together on an image. Previously, Japanese text was often incomplete or incomprehensible, so it was quite moving to see clean Japanese generated for the first time.

                   Image generated by Nano Banana Pro

 

3. The Generative AI Market in 2026

With Sam Altman issuing a CODE RED, I believe OpenAI will allocate significant development resources to improving the model itself and will frantically work to close this gap in the image generation field. On the other hand, Google, armed with Gemini 3, possesses several multimodal generative AI models beyond just Nano Banana Pro, and I expect them to leverage that expertise to aim for further breakthroughs.

In particular, generative AI capable of simulation using 3D structures—known as World Models—will likely influence Large Language Models (LLMs) as well, solidifying Google's competitive advantage. One has to admit that Google, which owns YouTube, is incredibly strong in this field. It looks like 2026 will be a year where we cannot take our eyes off how OpenAI launches its counterattack.

 

How was it? While there are several other players creating generative AI, I believe the industry style will involve companies defining their own positions within the context of the "OpenAI vs. Google" battle. Therefore, the outcome of OpenAI vs. Google is extremely important for all AI-related companies. I would like to write another blog post on this same theme if the opportunity arises.

That’s all for today. Stay tuned!









You can enjoy our video news ToshiStats-AI from this link, too!


1) Sam Altman’s ‘Code Red’ Memo Urges ChatGPT Improvements Amid Growing Google Threat, Reports Say, Forbes, 2 Dec 2025
2) A new era of intelligence with Gemini 3, Google, 18 Nov 2025
3)  Leaderboard Overview





Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Game Changer: How Nano Banana Pro is Redefining Digital Marketing!

Just fresh off the heels of last week's new model release, Google has debuted yet another new image generation model: Nano Banana Pro (Gemini 3 Pro Image). Rumors on the street say it boasts incredible performance. So, let's dive in and test it out to see its potential capabilities.

 

1. The Latest Tokyo Fashion Trends

Fashion evolves with every season, and keeping up with the trends can be a challenge. However, the internet is overflowing with the latest style information. I figured that by feeding this real-time data into generative AI, we could generate images of models wearing the styles currently in vogue. Let's give it a try. Below is the original image of the model. She is wearing an outfit typical of Japanese autumn.

Original Image

I fed this original image and the prompt "Perform Google Search for current Tokyo fashion trends for 20s lady and apply that style to the model in the attached photo. 4 images are needed." into Nano Banana Pro.

Generated Images

The same model appears in all four images, maintaining consistency. Furthermore, the latest fashion trends have been incorporated thanks to Google Search. This is wonderful. Nano Banana Pro's Grounding feature using Google Search is excellent. As the model updates in the future, we can expect the accuracy of capturing trendy fashion to improve even further.

 

2. Creating a Signature Cafe Menu

Next, I want to devise a set menu featuring shortcake and coffee for opening a cafe in Ashiya, a high-end residential area in Japan. For this one too, I prepared a prompt to generate the image after researching currently popular cakes using Google Search.

"I am opening a cafe in Ashiya, Japan, featuring a fruit shortcake and coffee set as the signature dish. Use Google Search to identify current cake trends in Ashiya City. Then, create a high-quality menu image for this set that includes a description and price in English, incorporating the local trends."

I generated the following Japanese and English versions of the menu.

English Version

Japanese Version

Both the Japanese and English text are perfect. I think this is a huge leap forward, especially since AI image generation has struggled to correctly render local languages like Japanese until now. I’m sure it will work well with other local languages too. It looks like Nano Banana Pro will be able to perform globally, regardless of language.

 

3. 3D Visualization of Loss Functions

Raising the abstraction level a bit, I want to execute a 3D visualization of a loss function—a topic often discussed when building targeting models for marketing—and clearly explain the concept of the gradient descent method. Nano Banana Pro can understand even theoretical and highly abstract phenomena like loss functions and map them in 3D. Below is the result. You can see at a glance how the parameters get stuck in a local minimum and cannot reach the point where the loss function is at its global minimum. Amazing.

Gradient Descent Method

 

How was it? Even from these few experiments, the excellence of Nano Banana Pro is clear. I have a hunch that Nano Banana Pro is going to change the very methods of digital marketing. I felt particularly strong potential in the Grounding feature using Google Search. I plan to cover Nano Banana Pro again in the near future.

That’s all for today. Stay tuned!

 



You can enjoy our video news ToshiStats-AI from this link, too!

 

1) Introducing Nano Banana Pro, Google, 20 Nov 2025







Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.













Google Antigravity: The Game Changer for Software Development in the Agent-First Era

Google has unveiled Gemini 3.0, its new generative AI, and "Antigravity" (1), a next-gen IDE powered by it. Google states that "Google Antigravity is our agentic development platform, evolving the IDE into the agent-first era," signaling a shift toward truly agent-centric development. Here, I’m going to task Antigravity with creating a "Bank Complaint Classification App." I want to actually run it to explore its potential.

                   Antigravity

 

1.Agentic Development with Antigravity

Antigravity is built on top of VS Code. If you are a VS Code user, the editor will look familiar, making it very approachable and easy to pick up. However, the real power of Antigravity lies in its dedicated interface for agentic development: the Agent Manager (shown below). Just enter a prompt into the box and run it to kick off "Vibe Coding." The prompt shown here is the very simple one I entered at the beginning of the development process. Antigravity also appears to be packed with various features designed to facilitate efficient communication with the Agent. For more details, please check the website (1).

                         Agent Manager

 

2. Prompt Refinement and Improvement

Just because you start "Vibe Coding" doesn't mean you'll get perfect code immediately. I started with a simple prompt this time as well, but the process proved to be more challenging than anticipated. While Gemini 3.0 Pro often demonstrates human-level capability when handling HTML and CSS for website building, the framework used for this app—Google ADK—is a brand-new agent development kit that just debuted in April 2025. Consequently, there are likely very few code examples available on the web, and I assume it hasn't been fully absorbed into Gemini 3.0's training data yet.

               Development with Google ADK

It was quite a struggle, but as shown above, I managed to build a fully functional app via "Vibe Coding." To generate these files, I relied solely on natural language instructions; I didn't write a single line of code directly in the editor. However, I did include simple code snippets within the prompts. This is a technique known as "few-shot learning," where you provide examples to guide the model. I believe this approach is highly effective when Vibe Coding with Gemini 3.0 for Google ADK development. While this might become unnecessary as Gemini 3 is updated in the future, it’s certainly a technique worth remembering for now.

Bank Complaint Classification App using Google ADK

The screenshot above shows the "Bank Complaint Classification App" I developed. I verified its accuracy with some simple examples, and the results were excellent. It seems the internal prompts within the app were generated very effectively. Impressive work!

 

3. Summary of Building a Complaint Classification App with ADK

  • Total Time: 6 hours (starting from the Antigravity installation) to complete the app.

  • Execution: With the finalized prompt, the run time is just over a minute.

  • Manual Effort: The actual coding for Google ADK to make the app is only about a 20-minute task if done manually without vibe-coding.

  • Reasons for the Delay:

    • I had to iterate on the prompts several times because Gemini 3 is still unfamiliar with Google ADK

    • I had to explicitly instruct it on file structures and code syntax.

    • I was also using Antigravity for the first time.

  • Conclusion: It is manageable once you understand Gemini 3 Pro's behavior regarding Google ADK.

 

So, what do you think?

It took a little longer because I wasn't used to the new IDE yet, but the combination of Gemini 3.0 Pro and Antigravity was outstanding. I could really feel its high potential. Since the execution speed itself is fast, next time I plan to challenge myself by "Vibe Coding" a multi-agent app. Look forward to it! That's all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!



1) Experience liftoff with the next-generation IDE, Google,  19 Nov 2025







Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

OpenAI & MUFG : A Strategic Collaboration Poised to Reshape the Future of Finance

On November 12, 2025, OpenAI and MUFG (Mitsubishi UFJ Financial Group, Inc.), one of Japan's three largest financial groups, announced a strategic collaboration (1). As this content has the potential to transform Japan's financial sector, I'd like to share the key points from the news release along with my own analysis. Let's get started!

 

1. Business Transformation Utilizing AI

"Beginning in January 2026, all approximately 35,000 employees of MUFG Bank will use ChatGPT Enterprise in their daily operations." This is a significant step forward in transforming the subsidiary bank into an AI-native organization. It's presumed that OpenAI and MUFG, having already collaborated for over a year, have accumulated considerable expertise in applying AI to banking operations. If they can unlock the full potential of the generative AI GPT-5 through ChatGPT Enterprise, the impact on their business processes is expected to be substantial.

                   ChatGPT Enterprise

 

2. Talent Development

"Furthermore, to accelerate the company-wide adoption of AI, the two companies will establish a project team. They will collaborate on training specialized personnel, or 'AI Champions,' who can drive AI utilization and organizational reform. This will be supported by providing education, training programs, and support for MUFG's company-wide AI adoption campaign, 'Hello, AI @MUFG.'" As this indicates, talent development is essential for embedding AI within the company. While GPT-5 is highly capable, it cannot completely replace human abilities. Collaboration between AI and humans remains indispensable. There is no fixed methodology for how we communicate with AI to achieve our goals; I believe this will continue to be a process of trial and error.

 

3. Creating Innovative Customer Experiences in Retail

"We will install an 'AI Concierge' equipped with the latest AI into the apps provided by MUFG's group companies. This will go beyond simply answering questions to provide personalized support that becomes more tailored with use. In the future, data from each app will be integrated, enabling the AI to grasp the customer's entire transaction history and offer precise suggestions from any app. The first implementation is planned for the digital bank scheduled to launch next fiscal year, with the aim of creating an AI-native digital bank." Of the various retail measures, this "AI Concierge for personalized support" is particularly striking. I believe that without accurately recorded past transaction histories and conversations, providing relevant support is impossible. The entry of Japan's largest financial group into the "AI Concierge" space holds great significance for the financial industry. I'm looking forward to trying it myself.

 

4. Participation in the OpenAI Ecosystem

"We will explore integration with 'Apps in ChatGPT,' which OpenAI announced in October. By connecting MUFG's group company apps and services to ChatGPT's framework, we aim to offer a new financial experience where customers can naturally discuss household financial management and asset investment tailored to their situation, all within the flow of a conversation with ChatGPT." This can be interpreted as MUFG's medium-to-long-term strategy to enter the OpenAI ecosystem. OpenAI is solidifying its position as a global portal to the internet and, from that base, has begun building an ecosystem to realize "Agentic Commerce." I believe MUFG is considering being one of the first in the world to take this leap. I'm excited to see how this unfolds.

 



What did you think? While it has only just been announced and details are still scarce, I feel the content clearly conveys the strong commitment from both companies. I am very excited to see how this "tag team" will change the future of finance in Japan and Asia. For those who wish to read the full content of this release, please see the original source (1). That's all for today. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!


1)Initiatives for AI-Driven Business Transformation and New Service Creation in the Retail Sector, 12 Nov 2025, Mitsubishi UFJ Financial Group, Inc. (MUFG) MUFG Bank, Ltd. (MUFG Bank)


Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

This Is What Happens When an AI Agent Runs Our 2025 Autumn Marketing!

Hello, the high temperature in Tokyo has dropped to 16°C, and it's starting to feel very much like autumn. For those unfamiliar with autumn in Japan, this is the season when the leaves on the mountains change from green to orange. The entire mountainside is dyed orange, creating a beautiful and spectacular view. Therefore, I decided to use orange as the background color for this marketing campaign's promotional video. The challenge is: "To devise a campaign to sell cakes to women in Ashiya, an affluent residential area in the Kansai region." What happens when we entrust this task to an AI agent? Let's find out.

 

1. Creating an AI Marketing Agent with "Google Opal"

This time, I'm creating an AI marketing agent using Google Opal (1). As the description says, "Opal, our no-code AI mini-app builder," you can easily develop an AI agent app like the one below.

For this AI agent's development, I only entered the following prompt: "You are an expert in marketing campaigns. You will be given the following information: 1. The product/service to sell, 2. The target customer, 3. The location/region, 4. The time/season of the campaign, 5. The desired brand image color, 6. A photo of the facilitator. Using this information, please create the following: a. A marketing strategy, b. A marketing campaign name, c. A logo based on the name, d. A promotional video featuring the facilitator, complete with BGM."

Just by executing this, you can create a workflow like the one shown above using the AI agent. After that, you just switch to the app and answer questions related to your task, and the marketing campaign is created. Amazing, isn't it!

 

2. Marketing Strategy and Logo

Once you input all the necessary information, you get the results back immediately. First is the marketing strategy. In reality, a more detailed discussion followed. This time, I'll just introduce the beginning. Even though I didn't input very detailed information about the campaign at the initial stage, I think this marketing strategy is well-done.

                  Marketing Strategy

Next is the marketing campaign name and logo. What it generated was a cool, French-style logo. I'd love to try using it sometime.

          Logo

 

3. Three Short Promotional Videos

First, I provide the AI agent with a base image of a woman. Then, using this image as a starting point and based on the created marketing strategy, an approximately 8-second short video is generated. It's exciting to see what kind of video the AI agent will produce. This time, it created three videos with BGM. All of them are based on the theme of "Autumn Cakes." It's hard to pick a winner; they are all excellent. After actually creating the videos, I felt that even 8 seconds is enough to convey the image clearly. Which one did you like the best?

 

What did you think? Although this was just a demo AI agent, I was astonished at what it could accomplish with no code, no programming. It seems like it will become a powerful ally for marketers. Of course, there are limitations, but what I created this time can be done for free with just a Google account. I highly recommend giving it a try. ToshiStats will continue to share more about AI agents. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1) Opal is now available in more than 160 countries, Google, 7 Nov 2025

Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

OpenAI vs. Google: Who Has the Right Take on AGI?

Recently, OpenAI CEO Sam Altman commented on YouTube(1) that 'it is plausible that a legitimate AI Researcher will be achieved by March 2028.' Can this really be achieved in such a short time, less than 2.5 years from now? I would like to consider this deeply, comparing it with the statements of Demis Hassabis, CEO of rival company Google DeepMind.

 
  1. Achieving Legitimate AI Researcher by March 2028

As for when Artificial General Intelligence (AGI)—which would surpass human intelligence—will actually be achieved, opinions are divided even among experts. Amidst this, OpenAI CEO Sam Altman commented, referencing the following timeline, that 'It is a plausible that a legitimate AI Researcher will be achieved by March 2028.'"

Of course, this is an internal goal, and he isn't claiming it's AGI. However, if AI can take on the role of a researcher, technological development will accelerate dramatically, and the current industrial structure will likely change completely. I think it's groundbreaking that they have set a timeline for such a high-impact goal. The issue is its feasibility. Although technical points were discussed in this YouTube video, I felt that alone was insufficient to explain its feasibility. There is likely much that cannot be disclosed as it is confidential information, but it would have been better if there had been a more in-depth explanation.

 

2. Current AI Lacks Consistency

At this point, let's introduce the opinion(2) of Google DeepMind CEO Demis Hassabis regarding the realization of AGI. As you know, he is a co-founder of DeepMind and has aimed to develop AGI since its founding in 2010. Despite that extensive experience, he says it will still take 5 to 10 years to achieve AGI. One reason for this is that 'current generative AI exhibits PhD-level capabilities for some tasks, yet at other times, it can make mistakes on simple high school math.' In short, its abilities 'lack consistency.' . 'Consistency' is essential for achieving AGI, and apparently, two or three more breakthroughs will be necessary to get there. I find this to be a rather cautious view. For other points of discussion, please watch the YouTube video(2).

 

3. AI is Steadily Evolving, Step by Step

Although there are differences in their definitions of AGI and their timelines, both parties seem to agree on its eventual realization. We cannot predict when breakthroughs will occur. I believe the only thing we should do is 'prepare for the emergence of AGI.' . Whether it arrives in 2028 or 10 years from now, we need to start preparing now how we can use AGI—considered humanity's greatest invention—to realize a better society, industry, and life. Even as we speak, AI is likely evolving beneath the surface. Our company, ToshiStats, intends to continue discussions in order to successfully incorporate those advancements.



You can enjoy our video news ToshiStats-AI from this link, too!


1) Sam, Jakub, and Wojciech on the future of OpenAI with audience Q&A, OpenAI, 30 Oct 2025

2) Google DeepMind CEO Demis Hassabis on AI, Creativity, and a Golden Age of Science | All-In Summit,  13 Sep 2025





Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

"AGI Is Still a Decade Away" : A Message from a Genius AI Engineer

I recently found an interesting interview video on YouTube. It was an interview with a prominent AI engineer, and the message from it was the shocking statement that "AGI is still a decade away."

While many opinions suggest AGI will be realized in just a few years, his mention of such a long timespan—10 years—seems to have gathered global attention. This time, I'd like to share the key points that caught my attention from the video (1), which is over two hours long, and a subsequent tweet (2) he posted on X. you have a perfect AI tutor, maybe you can get extreme far, the genius today are barely scratching the surface of what a human mind can do,

Andrej Karpathy (left)— “We’re summoning ghosts, not building animals”

 

1. AGI is Still a Decade Away

The timeline for achieving AGI is debated among researchers, but the claim that it will take 10 years feels like a minority opinion, perhaps due to the flood of hype surrounding AI agents.

Of course, he has his reasons for asserting this. His tweet (2) stated: "There is still a lot of work (grunt work, integration work, sensors/actuators to the physical world, social work, safety & security work (jailbreaks, poisoning, etc)) to be done before we get to something that you’d rather hire than a human for any job in the world."

Indeed, AI agents in the world of text, like coding, have only just begun this year. The speculation that it will take a considerable amount of time to achieve an AGI that can also operate with high precision in the real world, including physical interaction, feels very convincing.

 

2. On LLM Agents

I believe this topic is especially important for those who use code assistants. His tweet included a critical comment on the current state: "I live in an intermediate world of collaborating with LLMs, where our pros/cons combine. The industry lives in a future where fully autonomous entities collaborate in parallel to write all the code and humans are useless."

I also feel that "those unfamiliar with AI technology might misunderstand, thinking they can easily build anything just by asking a code assistant." The performance of the latest generative AI like GPT-5 is incredible, but I believe there are still many cases where you can't just delegate 100% of a task to it. A collaborative relationship is still necessary, where the human decides the basic outline and structure, has the AI agent draft the details, and then the human reviews the results.

Once AGI is achieved, human intervention shouldn't be necessary at all, but it makes sense that it will take a considerable time to get there.

 

3. On Education in the AGI Era

Let's approach this final topic with optimism. In the interview, he spoke about the future of education, saying: "Teaching Assistants are currently human, but I think they can be replaced by AI in the future. Even in that case, the overall structure of the course would be devised by myself or the faculty, but perhaps in the future, AGI will even do that."

In fact, my company is also developing an e-learning program. While I am designing the overall structure, an AI avatar is scheduled to deliver the actual lectures. It's not possible to automate everything with current AI agents, but I think everyone can agree on the point that by humans and AI collaborating, we can create wonderful educational programs.

I'd like to close with his words: "If you have a perfect AI tutor, maybe you can get extremely far, the geniuses today are barely scratching the surface of what a human mind can do."

 

What did you think?

I want to note that he is bullish on the realization of AGI itself; it's his opinion on the timeline that differs from the consensus. Although the time until realization may vary, AGI will eventually appear before us.

What I've introduced here is just a tiny fraction of the more-than-two-hour interview. I highly recommend that you all watch this wonderful interview. I'm sure you will find some hints about the future of AGI.

Well, that's all for today. Stay tuned!






You can enjoy our video news ToshiStats-AI from this link, too!


1) Andrej Karpathy — “We’re summoning ghosts, not building animals” ,  Dwarkesh Podcast, 18 Oct 2025

2) X_post, Andrej Karpathy, 19 Oct 2025






Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Your Guide to AI Agents: Insights from Andrew Ng's Latest Course

A new online course called "Agentic AI" (1) has been released by DeepLearning AI. The creator is Andrew Ng, an adjunct professor at Stanford University, who is also famous for his past machine learning-related courses. For me, this is the first course I've taken from him since the Deep Learning Specialization in 2018. I've just completed it, and I'd like to share my thoughts and a recommendation.

 

1. Course Overview

The course is divided into five modules, each consisting of 5-7 short videos (about 5-10 minutes each), a quiz, and coding tasks using jupyter notebook. By passing each assignment, you are ultimately awarded a certificate of completion. The level is listed as intermediate; while a basic knowledge of Python is necessary, I believe that even those without specialized knowledge in AI can progress through the material and naturally come to understand it. The main topics are as follows:

Reflection: AI critiques its own work and iterates to improve quality—like code review, but automated.

Tool Use: Connect AI to databases, APIs, and external services so it can actually perform actions, not just generate text.

Planning: Break complex tasks into executable steps that AI can follow and adapt when things don’t go as expected.

Multi-Agent: Coordinate multiple specialized AI systems to handle different parts of a complex workflow.

Created by Andrew Ng, who teaches at Stanford while concurrently doing practical consulting work, I found the course to have a wonderful balance between theory and practice.

 

2. Reflection and Tool Use

The second and third modules are critical technologies for the future realization of AGI. In particular, "Reflection," where an AI improves itself, is also known as Recursive Self Improvement and is a field being researched worldwide. This module introduces a method that allows even non-experts to incorporate reflection functionality, which I am very eager to try implementing. Additionally, using tools allows a generative AI to incorporate information that is difficult to acquire on its own, thereby enhancing the AI agent's capabilities. Furthermore, this information can be applied to the "Reflection" process, promising a synergistic effect. I'm also keen to implement this and see what kind of information can be integrated.

 

3. Error Analysis

As Andrew Ng states, this fourth module is, in my opinion, the most important and valuable content in the course. Generative AI is excellent, but it is not perfect. There is still a considerable possibility that it will produce incorrect answers. Therefore, to raise its accuracy to a practical level, the course emphasizes the importance of adopting a strategy that quickly identifies the parts of the overall process with the lowest performance and allocates resources to improving those areas. I can certainly see how for a complex AI agent that may contain numerous sub-agents, identifying and prioritizing the reinforcement of its weaknesses is incredibly important in practical applications.

 

So, what did you think? With a flood of AI-related news every day, many people are likely wondering, "How should I proceed with my AI projects from now on?" I believe this course provides a valuable perspective for thinking in the medium to long term. While it is a paid course, it is not as expensive as university tuition, and I highly recommend trying it. Incidentally, because I studied intensively, I was able to receive my certificate in about three days. It's certainly possible for a business professional to complete it over a long weekend.

Well, that's all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!


1) Agentic AI, Andrew Ng,  DeepLearning AI, Oct 2025 







Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The Secret to High-Accuracy AI: An Exploration of Machine Learning engineering agent

In a previous post, I explained Google's research paper, "MLE STAR" (1), and uncovered the mechanism by which an AI can build its own high-accuracy machine learning models. This time, I'm going to implement that AI agent using the Google ADK and experiment to see if it can truly achieve high accuracy. For reference, the MLE STAR code is available as open source (2).

 

1. The Information I Provided

With MLE STAR, humans only need to handle the data input and task definition. The data I used for this experiment comes from the Kaggle competition "Home Credit Default Risk" (3). While the original data consists of 8 files, I combined them into a single file for this experiment. I reduced the training data to 10% of the original, resulting in about 30,000 samples, and kept the original test data of 48,700 samples.

The task was set as follows: "A classification task to predict default." Note that to speed up the experiment, the number of iterative loops was set to a minimum.

                     Task Setup

 

2. Deciding Which Model to Use

MLE STAR uses a web search to select the optimal model for the given task. In this case, it ultimately chose LightGBM. To finish the experiment quickly, I configured it to select only one model. If I had set it to select two, it likely would have also chosen something like XGBoost. Both are models frequently used in data science competitions.

                Model Selection by MLE STAR

It generated the initial script below. As a frequent user of LightGBM, the code looks familiar, but the ability to generate it in an instant is something only an AI can do. It's amazing!

 

3. Identifying Key Code Blocks with "Ablation Studies"

Next, it uses ablation studies to identify which code blocks should be improved. In this case, ablation2 showed that removing Early Stopping worsened the model's performance, so this feature was kept in the training process from then on.

               Ablation Studies Results by MLE STAR

 

4. Iteratively Improving the Model

Based on the ablation studies, MLE STAR decided to improve the model using the following two techniques: K-fold target encoding and binary encoding. These techniques themselves are common in machine learning and are not particularly unusual.

                   K-fold Target Encoding

                     Binary Encoding

This ability to "use ablation studies to identify which code blocks to improve" is likely a major reason for MLE STAR's high accuracy. I look forward to seeing how this functionality evolves in the future.

 

5. The Results Are In. Unfortunately, I Lost.

For its final step, MLE STAR ensembles the models to create the final version. For more details, please see the research paper. It also generates a CSV file with the default predictions, which I slightly modified and promptly submitted to Kaggle. This task is evaluated using AUC, where a score closer to 1 indicates higher accuracy.

The top score is the result I achieved using my own LightGBM model. The score in the red box at the bottom is the one automatically generated by MLE STAR. With a difference of more than 0.01 on both the Public and Private scores, it was my complete defeat.

             Kaggle Prediction Accuracy Evaluation (AUC)

Improving the AUC by 0.01 is quite a challenge, which gives a glimpse into how excellent MLE STAR is. I didn't perform any extensive tuning on my LightGBM model, so I believe my score would have improved if I had spent time tuning it manually. However, MLE STAR produced its result in about 7 minutes from the start of the computation, so from an efficiency standpoint, I couldn't compete.

 
 

So, what did you think? Although this was a limited experiment, I feel I was able to grasp the high potential of MLE STAR. I was truly impressed by the power of its Recursive Self-Improvement, which identifies specific code blocks and improves upon them autonomously.

Here at Toshi Stats, I plan to continue digging into MLE STAR. Stay tuned!





You can enjoy our video news ToshiStats-AI from this link, too!




1) MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam1 2 *, Jinsung Yoon1, Jiefeng Chen1, Jinwoo Shin2, Sercan Ö. Arık1 and Tomas Pfister1, Google Cloud1, KAIST2,  23, Aug 2025

2) Machine Learning Engineering with Multiple Agents (MLE-STAR) , Google

3) Home Credit Default Risk, kaggle



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Is an AI Machine Learning Assistant Finally a Reality? I Looked Into It, and It's Incredible!

I often build machine learning models for my job. The process of collecting data, creating features, and gradually improving the model's accuracy takes time, specialized knowledge, and programming skills in various libraries. I've always found it to be quite a challenge. That's why I've been hoping for an AI that could skillfully assist with this work, and recently, a potential candidate has emerged. I'd like to take a deep dive into it right away.

 
  1. A Basic Three-Layer Structure

This AI assistant is called MLE-STAR, and according to a research paper (1), it has the following structure. Simply put, it first searches the internet for promising libraries. Next, after writing code using those libraries, it identifies which parts, called "code blocks," should be improved further. Finally, it decides how to improve those code blocks. Let's explore each of these steps in detail.

 

2. Selecting the Optimal Library with a Search Function

To create a high-accuracy machine learning model, you first need to decide "what kind of model to use." This means you have to select a library to implement the model. This is where the search function comes in. For example, in a finance task to calculate default probability, many methods are possible, but gradient boosting is often used in competitions like Kaggle. I also use gradient boosting in most cases. It seems MLE-STAR can use its search function to find the optimal library on its own, even without me specifying "use gradient boosting." That's amazing! This would eliminate the need for humans to research everything, leading to greater efficiency.

 

3. Finding Where to Improve the Code and Steadily Making Progress

Once the library is chosen and a baseline script is written, it's time to start making improvements to increase accuracy. But it's often difficult to know where to begin. MLE-STAR employs an ablation study to understand how accuracy changes when a feature is added or removed, thereby identifying the most impactful code block. This part of the process typically relies on human experience and intuition, involving a lot of trial and error. By using MLE-STAR, we can make data-driven decisions, which is incredibly efficient.

 

4. Iterating Until Accuracy Actually Improves

Once the code block for improvement is identified, the system gradually changes parameters and confirms the accuracy improvements. This is also done automatically within a loop, without requiring human intervention. The accuracy is calculated at each step, and as a rule, only changes that improve performance are adopted, ensuring that the model's accuracy steadily increases. Incredible, isn't it? In fact, a graph comparing the performance of MLE-STAR with past AI assistants shows that MLE-STAR won a "gold medal" in approximately 36% of the tasks, highlighting its superior performance.

 

So, what did you think? This new framework for an AI assistant looks extremely promising. In particular, its ability to identify which code blocks to improve and then actually increase the accuracy is likely to become even more powerful as the performance of foundation models continues to advance. I'm truly excited about future developments.

Next time, I plan to apply it to some actual analysis data to see what kind of accuracy it can achieve. Stay tuned!




You can enjoy our video news ToshiStats-AI from this link, too!



1) MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam1 2 *, Jinsung Yoon1, Jiefeng Chen1, Jinwoo Shin2, Sercan Ö. Arık1 and Tomas Pfister1, Google Cloud1, KAIST2,  23, Aug 2025



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

A Sweet Strategy: Selling Cakes in Wealthy Residential Areas !

Has everyone ever thought about starting a cake shop? As a cake lover myself, I often find myself wondering, "What kind of cake would be perfect?" However, developing a concrete business strategy is a real challenge. That's why this time, I'd like to conduct a case study with the support of an "AI marketing-agency." Let's get started.


1. Selling Cakes in an Upscale Kansai Neighborhood

The business scenario I've prepared for this case is a simple one:

Goal: To sell premium fruit cakes in the Kansai region.

  • Cake Features: Premium shortcakes featuring strawberries, peaches, and muscat grapes.

  • Target Audience: Women in their 20s to 40s living in upscale residential areas.

  • Stores: 3 cafes near Yamate Dentetsu Ashiya Station, 1 cafe near Kaigan Dentetsu Ashiya Station.

  • Direct Sales Outlet: 1 store inside the Yamate Dentetsu Ashiya Station premises.

  • Branding: The brand's primary color will be blue, with the website and logo also unified in blue.

  • Current Plan: In the process of planning a sales promotion for the autumn season.

From here, what kind of concrete business strategy can we derive? First, I'll input the business scenario into the AI marketing-agency.

The first thing it does is automatically generate 10 cool domain names.

It's hard to choose, but for now, I'll proceed with branding using "PremiumAshiyaCake.com".

 

2. A Practical Business Strategy

Now, let's ask the AI marketing-agency to formulate a business strategy for selling our premium fruit cakes in Kansai. When prompted to input the necessary information, I re-entered the business scenario, and the following business strategy was generated in about two minutes. Amazing!

It's a long document, over five pages, so I can't share it all, but here is the "Core of the Marketing Strategy."

  • Overall Approach: Direct Response that Inspires Aspiration

    • We will build an aspirational, luxury brand image through beautiful content, and then convert that desire into immediate store visits using precisely targeted calls-to-action (CTAs).

  • Core Message and Positioning:

    • Positioning Statement: For the discerning women of Kansai, Premium Ashiya Cake is the patisserie that transforms a moment into a cherished memory with its exquisitely crafted seasonal shortcakes.

    • Tagline / Core Message: "Premium Ashiya Cake: An exquisite moment, crafted for you."

  • Key Pillars of the Strategy:

    • Visual Elegance and a "Blue" Signature: All visuals must be of professional, magazine-quality. The brand color "blue" will be used as a sophisticated accent in styling—such as on blue ribbons, parts of the tableware, or as background elements—to create a recognizable and unique visual signature.

    • Hyper-local Exclusivity: Marketing efforts will be geographically and demographically laser-focused on the target audience residing in Ashiya and its surrounding affluent areas. This creates an "in-the-know" allure for locals.

    • Seasonal Storytelling: Treat each season's campaign as a major event. We will build a narrative around the star ingredients, such as Shine Muscat grapes from a specific partner farm, to build anticipation and justify the premium price point.

This is wonderfully practical content. The keywords I provided—"blue," "Ashiya," and "muscat"—have been skillfully integrated into the strategy.

 

3. The Logo is Excellent, Too—This is Usable!

Because I specified in the initial business scenario that I wanted to "unify the color scheme based on blue," it created this cool logo for me. It really looks like something I could use right away. Google's image generation AI, Imagen 3.0, is used here. The quality of this AI is always highly rated, so it's no surprise that the logo generated this time is also of outstanding quality.

 

So, what did you think of the AI marketing-agency? The business strategy is professional, and it's amazing how it automatically created the domain names and logo with such excellent results. Although I couldn't introduce it this time, it also includes a website creation feature. It's surprising that a tool this high-performance is actually available for free. A development kit called "Google ADK" is provided as open-source, and the AI marketing-agency from this article can be downloaded and used for free as Sample (1). For those who can use Python, I think you'll get the hang of it with a little practice. The operational costs are also limited to the usage fees for Google Gemini 2.5 Pro, so the cost-effectiveness is outstanding. I encourage you all to give it a try.

Please note that this story is a work of fiction and does not represent anything that actually exists. That's all for today, stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Marketing Agency, Google, May 2025



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Unlocking Sales Forecasts: Can GPT-5 Reveal the Most Important Data?

Have you ever found yourself in marketing, wanting to predict sales and gathering a ton of data? For example, let's say you have sticker sales data (1) like the set below. The num_sold column represents the number of units sold. This is actually a large dataset with over 200,000 entries. So, among these data columns (which we call "features"), which one is the most important for predicting sales? They all seem important, and it's impossible to check all 200,000 records one by one. So, let's try asking the generative AI, GPT-5.

                         Sticker sales data

 

1. Asking GPT-5 with a Prompt

To identify the important features for a prediction, you first have to create a predictive model. This is a task that data scientists perform all the time. However, they usually create these models by coding in Python, which can be a high barrier for the average business person. So, isn't there an easier way? Yes, and this is where prompts come in handy. If you can give instructions to GPT-5 with a prompt, no coding is necessary. Here is the prompt I created for this task.

     data & prompt

Key points of the prompt:

  • Use HistGradientBoostingRegressor from sklearn.

  • Evaluate the error using mean_absolute_percentage_error.

  • Split the data into train-data and test-data at an 80:20 ratio.

  • Display the top 10 feature importances with their original variable names.

  • Print the results as numerical output.

By getting the top 10 feature importances, we can understand which data column is the most significant. I won't explain the predictive model itself this time, so for those who want to dive deeper, please refer to a machine learning textbook.

 

2. The Code Actually Being Executed

Based on the prompt above, GPT-5 generated the following Python code on its own. It might look complicated to non-specialists, but rest assured, we don't have to touch Python at all. However, we can review this code to see how the calculation is being done, so it's by no means a black box. I believe this transparency is very important when using GPT-5 in a business context.

                 GPT-5's code for building the prediction model

 

3. "Product" Was the Most Important!

Ultimately, we got the following result.

Feature Importance Ranking

A higher "importance" value in the table above means the feature is more significant. This analysis revealed that "product" was overwhelmingly important. It seems that thinking about "what is selling" is essential. This is followed by "store" and "country". This suggests that considering "in what kind of store" and "in which country" is also crucial.

                     feature importance ranking

 

So, what did you think? This time, we instructed GPT-5 with a prompt to calculate which features are most important for predicting sales. It's true that you might run into errors along the way that GPT-5 has to correct itself, so I felt that having some basic knowledge of machine learning is beneficial. However, we were able to get the result without the user having to write any Python, which means marketing professionals can start trying this out today. I hope you can use the method we introduced today in your own marketing work. That's all for now. Stay tuned!

 


You can enjoy our video news ToshiStats-AI from this link, too!


1)Forecasting Sticker Sales, kaggle, January 1,2025



Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

How to Turn GPT-5 into a Pro Marketing Analyst with AI Agents!

A while back, I introduced a guide to prompting GPT-5, but it can be quite a challenge to write a perfect prompt from scratch. Not to worry! You can actually have GPT-5 write prompts for GPT-5. Pretty cool, right? Let's take a look at how.

 

1. Using GPT-5 to Do a Marketer's Job

I have some global sales data for stickers(1). Based on this data, I want to develop a sales strategy.

                 Global Sticker Sales Records

In a typical company, a data scientist would analyze the data, and a marketing manager would then create an action plan based on the results. We're going to see if we can get GPT-5 to handle this entire process. Of course, this requires a good prompt, but what kind of prompt is best? This is where it gets tricky. The principle I always adhere to is this: "Data analysis is a means, not an end." There are many data analysis methods, so the same data can be analyzed in various ways. However, what we really want is a sales strategy that boosts revenue. With this in mind, let's reconsider what makes a good prompt.

It's a bit of a puzzle, but I've managed to draft a preliminary version.

 

2. Using Metaprompting to Improve the Prompt with GPT-5

Now, let's have GPT-5 improve the prompt I quickly drafted. The image below shows the process. The first red box is my draft prompt.

                    Metaprompt

The second red box explicitly states the principle: "Perform data analysis with the goal of creating a Marketing strategy." When you provide the data and run this prompt, GPT-5 creates the improvement suggestions you see below, which are very detailed. I actually ran this process twice to get a better result.

                   Final Prompt

 

3. The Result: GPT-5 Generates MARKETING Strategy!

Running the final prompt took about a minute and produced the following output. The detailed analysis and resulting insights are directly connected to marketing actions, staying true to our initial principle. It's fantastic.

The output is concise and perfect for busy executives. Creating this content on my own would likely take an entire day, but with GPT-5, the whole process—including the time it took to draft the initial prompt by myself —takes only about 30 minutes. This really shows how powerful GPT-5 is.

 

What do you think? This time, we explored a method for getting GPT-5 to improve its own prompts. This technique is called Metaprompting, and it's described in the OpenAI GPT-5 Prompting Guide (2).

I encourage you to try Metaprompting starting today and take your AI agent to the next level. That's all for now! Stay tuned!

 



You can enjoy our video news ToshiStats-AI from this link, too!

 

Copyright © 2025 Toshifumi Kuga. All right reserved

1)Forecasting Sticker Sales, kaggle, January 1,2025

2) GPT-5 prompting_guide, OpenAI, August 7, 2025


Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Let's Explore the Best Practices for Crafting GPT-5 Prompts!

We are already hearing from many in the field that with the arrival of GPT-5, "the writing style is different from GPT-4o and earlier" and "its performance as an agent is on another level." Here, we will build upon the key points from OpenAI's "GPT-5 Prompt Guide (1)" and organize, from a practical perspective, "how to write prompts to stably reproduce desired behaviors." The following three keywords are key:

  1. GPT-5 acts very proactively as an AI agent.

  2. Self-reflection and guiding principles.

  3. Instruction following with "surgical precision."

Let's delve into each of these.

 




 

1. GPT-5 acts very proactively as an AI agent.

GPT-5's enhanced capabilities in tool-calling, understanding long contexts, and planning allow it to proceed autonomously even with ambiguous tasks. Whether you "harness" or "suppress" this capability depends on how you design the agent's "eagerness").


1-1. Controlling Eagerness with Prompts

To suppress eagerness, intentionally limit the depth of exploration and explicitly set caps on parallel searches or additional tool calls. This is effective in situations where processing time and cost are priorities, or when requirements are clear and exploration needs to be minimized.

To enhance eagerness, explicitly state rules for persistence, such as "Do not end the turn until the problem is fully resolved" and "Even with uncertainty, proceed with the best possible plan." This is suitable for long-duration tasks where you want the agent to see them through to completion with minimal check-ins with the user.

Practical Snippet (To suppress eagerness):

<context_gathering>
Goal: Reach a conclusion quickly with minimal information gathering.
Method: A single-batch search, starting broad and then narrowing down. Avoid duplicate searches.
Budget: A maximum of 2 tool calls.
Escape: If a conclusion is reasonably certain, accept minor incompleteness to provide an early answer.
</context_gathering>

Practical Snippet (To encourage eagerness):

<persistence>
Do not end the turn until the problem is completely resolved.
Reason through uncertainty and continue with the best possible plan.
Minimize clarifying questions. Adopt reasonable assumptions and state them later.
</persistence>

1-2. Visualize with a "Tool Preamble"

When the agent outputs a long rollout during execution, having it first provide a brief summary—explaining the objective, outlining the plan, noting progress, and confirming completion—makes it easier for the user to follow along and creates a better user experience.

Recommended Snippet:

<tool_preambles>
First, restate the user's goal in a single sentence. Follow with a bulleted list of the planned steps.
During execution, add concise progress logs sequentially.
Finally, provide a summary that clearly distinguishes between the "Plan" and the "Actual Results."
</tool_preambles>
 
 

2. Self-reflection and Guiding Principles

GPT-5 excels at "internally refining" the quality of its output through self-reflection. However, if the criteria for judging quality are not established beforehand, this reflection can become unproductive. This is where guiding principles and a private rubric are effective.


2-1. Provide a "Self-Grading Scorecard" with a Private Rubric

For zero-to-one generation tasks (e.g., creating a new web app, drafting specifications), have the model internally create a scorecard with 5-7 evaluation criteria. Then, have it repeatedly rewrite and re-evaluate its output based on these criteria.

Rubric Generation Snippet:

<self_reflection>
Define the conditions that a world-class deliverable should meet across 5-7 categories (e.g., UI quality, readability, robustness, extensibility, accessibility, accountability). Score your own proposal against these criteria, identify shortcomings, and redesign. The rubric itself should not be shown to the user.
</self_reflection>

2-2. Reduce Inconsistency with Guiding Principles

For ongoing development or modifying existing code, first provide the project's conventions by clearly stating its design principles, directory structure, and UI standards. This ensures that the model's suggested improvements and changes integrate naturally with the existing culture.

Guiding Principles Snippet (Example):

<guiding_principles>
Clarity and Reusability: Keep components small and reusable. Group them and avoid duplication.
Consistency: Unify tokens, typography, and spacing.
Simplicity: Avoid unnecessary complexity in styling and logic.
</guiding_principles>

2-3. Separately Control Verbosity and Reasoning Effort

GPT-5 can control its verbosity (the length of the final answer) and its reasoning_effort (the depth of thought) independently. This allows for context-specific overrides, such as "be concise in prose, but provide detailed explanations in code." The guide introduces a practical example of prompt tuning by Cursor, which is worth checking out. A useful tip for fast mode (minimal reasoning) is to require a brief summary of its thinking or plan at the beginning to assist its process.

 
 


3. GPT-5's Instruction Following has "Surgical Precision"

GPT-5 is extremely sensitive to the accuracy and consistency of instructions. Contradictory requests or ambiguous prompts waste reasoning resources and degrade output quality. Therefore, it is crucial to "structure" your instruction hierarchy to prevent contradictions before they occur.



3-1. Design to Avoid Contradictions

Take the example of a healthcare administrator scheduling a patient appointment based on symptoms. "Exceptions," such as altering preceding steps only in emergencies, must be clearly stated so they do not conflict with standard procedures.

  • Bad Example: The instructions "Do not schedule without consent" and "First, automatically secure the fastest same-day slot" coexist.

  • Correct Example: When "Always check the profile" and "In an emergency, immediately direct to 911" coexist, the exception rule is declared first.

OpenAI offers the following warning:

We understand that the process of building prompts is an iterative one, and that many prompts are living documents, constantly being updated by different stakeholders. But that’s why it is even more important to thoroughly review for instructions that are phrased improperly. We have already seen multiple early users discover ambiguities and contradictions within their core prompt libraries when they did such a review. Removing them dramatically streamlined and improved GPT-5's performance. We encourage you to test your prompts with our Prompt Optimizer tool to identify these kinds of issues.

 
 

How was that? In this article, we explored key points for prompt design from OpenAI's GPT-5 Prompt Guide (1). GPT-5 is a "partner in practice," combining powerful autonomy with precise instruction following. Try incorporating the points discussed today into your prompts and take your AI agents to the next level. That's all for today. Stay tuned!

 
 

Copyright © 2025 Toshifumi Kuga. All right reserved

1) GPT-5 prompting_guide, OpenAI, August 7, 2025

You can enjoy our video news ToshiStats-AI from this link, too!

 

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Unexpected Weakness Revealed! What Happened When I Tried Image Analysis with the New "GPT-5" Generative AI from OpenAI

OpenAI's New Generative AI "GPT-5" Has Arrived. I Tried Image Analysis and Discovered a Surprising Weakness! (1)

The long-awaited new generative AI, "GPT-5," has been released by OpenAI. I believe its multimodal capabilities have also improved, so I decided to upload a few images and run some simple tests. Let's get started.

 

1.The car is stopped, but why is it stopped?

The image shows a Mazda passenger car on display inside a train station (Hiroshima Station). This is just an exhibit car, but I thought GPT-5 could answer if it understood the background. It seems to have correctly recognized that this is an indoor space and not a public road. The answer was correct.

 

2.How many minutes until departure?

This is a common scenario when traveling. I asked how many minutes until the train I was planning to board, "Nozomi 104," would depart. The key was whether GPT-5 could understand that the large displayed time was the current time. This time, it also worked out well.

 

3.Which way should I go for car number 4?

This is another common travel situation. At a Shinkansen platform at Tokyo Station, I wanted to go to car number 4, and I asked which way to go, left or right, based on the sign above. The result was correct.

 

4. I want to go to Shin-Osaka Station. How many trains can I take?

The last one is a difficult question. This is a Shinkansen information board at Tokyo Station, and it shows 16 trains in total. When I asked, "I want to go to Shin-Osaka Station," it replied with 8 trains. This is the number of trains with Shin-Osaka as the destination, which is a bit of a simplistic answer. For example, a Shinkansen bound for Hakata also stops at Shin-Osaka. It seems that GPT-5, in its default mode, didn't think that far ahead.

To redeem itself, I switched to "Thinking" mode and tried one more time. As expected, it considered the intermediate stops and answered 14 trains, excluding the trains bound for Nagoya. That's the correct answer.

 

So, what do you think? Overall, the performance is excellent. GPT-5 is said to use a "real-time router" that defaults to "Auto" and automatically switches to "Thinking" for difficult tasks. However, since it's just been released, this switching might not always work perfectly. As the examples above show, although "Thinking" mode was appropriate in some cases, it didn't activate automatically. Therefore, if you feel something is "a little off," I recommend switching to "Thinking" mode. I hope it will become more stable over time. I look forward to covering GPT-5 again in the future. Stay tuned!





Copyright © 2025 Toshifumi Kuga. All right reserved

1) GPT-5 System Card., OpenAI, August 7, 2025


Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.




Prompt Optimization: The Secret to Building Better AI Agents?

The instructions that humans write for generative AI are called "prompts." There are many books and blogs out there that offer guidance on how to write them. Many of you have probably tried, and it's surprisingly difficult, isn't it? While no programming language is required, you have to go through a lot of trial and error to get the output you want from a generative AI. This process can be quite time-consuming, isn't well-systematized, and you often have to start from scratch for each new task.

So, this time, we'd like to experiment with "what happens if we have a generative AI write the prompts for us?" Let's get started.

 


1. Prompt Optimization

In 2023, Google DeepMind released a research paper titled "LARGE LANGUAGE MODELS AS OPTIMIZERS"(1).

This paper explored the use of LLMs to optimize prompts, and it seems to have worked well for several tasks. While a human writes the initial prompt, subsequent improvements are delegated to the LLM (the optimizer). The LLM is also responsible for judging whether the result was successful or not (the evaluator), meaning this approach can be applied even without labeled data that provides the correct answers. This is very helpful, as tasks involving generative AI often lack labeled data. Below is a flowchart of this process, which is effectively the automation of prompt engineering. This is professionally referred to as "prompt optimization." The specific method we adopted for this experiment is called OPRO (Optimization by PROmpting).






2. Experiment with a Customer Complaint Classification Task

Similar to our blog post on July 26th, we set up a task to predict which financial product a bank's customer complaint is about. We used an LLM to solve a classification task where it selects one of the following six financial products. We used gemini-2.5-flash for this experiment, with a sample size of 100 customer complaints.

  • Mortgage

  • Checking or savings account

  • Student loan

  • Money transfer, virtual currency, or money service

  • Bank account or service

  • Consumer Loan

In this experiment, the LLM handled the prompt generation, but a meta-prompt was necessary to further improve the resulting prompts. I wrote the meta-prompt as follows. Essentially, it tells the LLM to "please further improve the resulting prompt."

We had the LLM generate 20 prompts, and the results are shown below. The final number is the accuracy. An accuracy of 0.8 means 80 out of 100 cases were correct. Since this data came with labeled data, calculating the accuracy was easy.

We adopted the second prompt from the list, which had the best accuracy of 0.89 in this experiment. When we ported this prompt to our regular experimental environment and ran it, the accuracy exceeded 0.9, as shown below. We've done this task many times before, but this is the first time we've surpassed 0.9 accuracy. That's amazing!

 






3. What Does the Future of Prompt Engineering Look Like?

As you can see, it seems possible to optimize prompts by leveraging the power of generative AI. Of course, when considering cost and time, the results might not always be worth the effort. Nevertheless, I feel there's a strong need for prompt automation. Researchers worldwide are currently exploring various methods, so many things that aren't possible now will likely become possible in the near future. Prompt engineering techniques will continue to evolve, and I'm looking forward to these technological developments and plan to try out various methods myself.

 

So, what did you think? The ability of an AI agent to fully utilize the power of generative AI and improve itself without human intervention is called "Recursive-self-improvement." At ToshiStats, we will continue to provide the latest updates on this topic. Please look forward to it. Stay tuned!

 

Copyright © 2025 Toshifumi Kuga. All right reserved

1) LARGE LANGUAGE MODELS AS OPTIMIZERS Chengrun Yang Xuezhi Wang Yifeng Lu Hanxiao Liu Quoc V. Le Denny Zhou Xinyun Chen , Google DeepMind

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Back to object detection after a break! Generative AI shows no signs of slowing down

It's remarkable to see the rapid progress of generative AI. Recently, the improvement in multimodal capabilities, which process information like images and videos in addition to natural language, has been outstanding. This is sometimes referred to as AI's "spatial understanding." Let's briefly experiment with what kind of information generative AI can extract from images to check the performance of the current Gemini 2.5-flash model.



1. Google AI Studio

I'll be using the familiar generative AI development platform, Google AI Studio (1), again. I've prepared a no-code app for spatial understanding. It can display the number of identified objects and their coordinates. For example, for "hands," it shows them like this. It accurately identifies two hands.

 

2. Generative AI Understands the Meaning of Words and Can Identify Objects

So, what about a task that requires understanding the positional relationship between a flower and a hand, such as "a hand holding a flower"? The result is a successful identification.

Conversely, what about a task like "a hand not holding a flower"? The result is also a successful identification. This is impressive; it identified it with no problem.

Next, can it identify an object based solely on its positional relationship? Let's ask it to identify "what's on the hamburg." It easily answered "fried egg." While this generative AI, Gemini, has been touted for its high-performance image processing since its debut in December 2023, I'm honestly surprised it can do this much.

 

3. Can It Identify Station Names from a Sign?

Let's try a slightly more difficult task. This is a section of a subway station sign in Kuala Lumpur, the capital of Malaysia. Let's see if it can identify the three stations between Ampang Park and Chan Sow Lin from this image of the sign.

The result was that it accurately identified the three stations. This is a task that requires it to not only read the text in the image correctly but also understand the positional relationship of the stations. It accomplished this without any difficulty. I have nothing more to say; it's amazing!

 

What do you think? I'm sure many of you are surprised by the high level of spatial understanding. Generative AI is still in its early stages, so its performance will continue to improve, and accordingly, its practical applications will expand. It's something to look forward to. Also, I created this AI app on Google AI Studio without writing any code. Google AI Studio is very user-friendly and high-performing. I encourage you all to try it. Toshi Stats will continue to challenge itself to build various AI apps. Please stay tuned!

 
 

Copyright © 2025 Toshifumi Kuga. All right reserved

1) Google AI Studio

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.