google

Logic-Powered Agents: How LLMs Evolution in Math is Shaping the Future of AGI

On June 3rd, a new research paper (1) was released by Google. It states that difficult mathematical proofs were solved by combining the LLM Gemini 3.1-pro with a mathematical proof language called LEAN (2). This time, I would like to delve into this paper and consider what kind of developments we can expect from this new AI agent in the future, beyond the framework of mathematics.

 

1. The Synergistic Effect of the LLM's Flexibility and LEAN's Strictness

Here is the paper, featuring an active AI agent called LEAP. It only uses Gemini 3.1-pro as the LLM, and no specific fine-tuning has been performed. It is being used straight out of the box. Even so, it is reported to demonstrate outstanding exploration capabilities in mathematical proofs. Since it doesn't require any particular additional training, it can be used immediately without doing anything, which is very convenient for practical use. Furthermore, by using LEAN in conjunction, if a proof with contradictory logic due to hallucinations is produced, an error occurs during compilation, creating a mechanism where it is automatically rejected.

         LEAP (LLM-in-Lean Environment Agentic Prover)

Since LLM responses can fluctuate probabilistically, humans need to verify them in detail when conducting rigorous arguments. However, by introducing LEAN, this process has been automated. This is very reassuring. It seems that this fantastic result was achieved by combining the flexibility of the LLM and the strictness of LEAN in this way. Let's look closer.

 

2. The Structure and Accuracy of LEAP

Here is the structure of LEAP. The figure on the left is the roadmap for the theorem to be proved using LEAN. Technically, it forms a structure called a DAG (Directed Acyclic Graph). Complex mathematical proofs are not completed in a single attempt; the proof progresses by going back and forth between the LLM and LEAN several times. The key here is the section in the red frame, where the LLM describes an INFORMAL BLUEPRINT in natural language and converts it into a FORMAL SKETCH in LEAN. Furthermore, a two-tier review by LEAN and the LLM awaits. LEAN verifies whether the new proof method has any contradictions, and the LLM's review verifies whether that method is genuinely effective. In other words, the LLM acts as a pilot in the search for proof methods. Even though it's just using Gemini 3.1-pro as is, its potential is truly surprising.

                LEAP workflow

Now, let's look at the results of applying LEAP to an actual task. It tackled the notoriously difficult Putnam 2025. Putnam 2025 contains twelve undergraduate-level problems from the 86th William Lowell Putnam Mathematical Competition, a highly challenging North American mathematics competition.

Looking at the DAG, you can see how the proof actually progresses. In this example of Putnam 2025 Problem A6, you can see layers upon layers of connected proofs. It's certainly a difficult problem. The green indicates the parts that have already been proven.

            DAG example for Putnam 2025 Problem A6

The results, as shown below, were that LEAP answered all questions correctly. An overwhelming accuracy.

                Results on Putnam 2025

You can see that while the original Gemini 3.1-pro couldn't score at all, it was able to demonstrate tremendous capabilities by combining it with LEAN. I think it is truly a breakthrough.

 

3. Beyond Mathematical Proofs into Various Fields

What we have seen so far were tasks related to mathematical proofs. With LEAP being able to construct such perfect logic, I felt it would be a waste to keep it confined solely to mathematics. In particular, its application to economics, which is directly linked to business practices, has immense scope and depth, and I believe it can contribute to expanding the areas where LLMs can be active. Economics is also generally described using mathematical formulas, so I think it has a high affinity with LEAP. A paper (3) on its application to economics has already been published, so if you are interested, please do give it a try.

 

What did you think? I believe the combination of LLMs and LEAN will be expanded and improved in various ways in the future. It might be stepping closer to AGI. It's very exciting.

At Toshi Stats, we plan to take on tasks in the field of economics moving forward. Stay tuned!

 

You can enjoy our video news “ToshiStats AI Weekly Review” from this link, too!

1) LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks, 3 Jun 2026, Google
2) LEAN
3) We Can't Agree to Disagree, Formally: Aumann's Theorem and Assumption Accounting in Lean, May 27, 2026, Ruize Chen, Ben Eltschig, Ken Ono, Jujian Zhang  Axiom Math,  Scott Duke Kominers Harvard University; a16z crypto

Copyright © 2026 ToshiStats Co., Ltd. All right reserved.

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

Is Google Omni One Step Closer to AGI? Testing It in a 10-Second Video

The other day, Google held its annual developer conference, Google I/O, where they announced "Gemini Omni," a new multimodal generative AI. Google has championed AGI (Artificial General Intelligence) since its inception, viewing multimodal AI as an essential requirement to achieve it. In this article, we will use "Gemini Omni" to examine just how much closer we have come to AGI.

 

1. What Kind of AI is "Gemini Omni"?

First, let's look at the explanation released by Google (1).

"We’re introducing Gemini Omni, where Gemini’s ability to reason meets the ability to create. Omni is our new model that can create anything from any input — starting with video. With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge. You can also easily edit your videos through conversation.Gemini Omni Flash is a model that can create anything from any input – starting with video."

In short, it can be described as "a generative AI that can take any form of information as input and output it in any format." It appears that "Omni" understands 3D spatial information, visual elements, and physical laws—such as objects falling downward—which are difficult to grasp through text alone. This is truly a massive leap forward toward AGI.

The Omni Flash model that debuted this time is limited to video output only. However, in line with the "any-to-any" concept, the next version is highly expected to support output across all formats. It is something to look forward to.

 

2. The Task: Singing to a Given Theme

So, how capable is Omni Flash in practice? Can it successfully integrate various forms of information? Can it maintain consistency in its output? To test this, we will use the image below, add a prompt, and see if it can sing emotionally based on a specific theme. She is Leia, an instructor at ToshiStats Co. She is a familiar face on YouTube, but this time she is participating in our experiment.

             Leia, Instructor at ToshiStats Co.Ltd.

For this experiment, we prepared the following prompt:

"She is singing 'Kita-wing' in English. It is 80s Japanese pop. This must be 1. An urban and bittersweet melody, 2. about emotion of an independent, mature woman for love, 3. provide courage for action, 4. A movie-like scenery born from a 'midnight flight', 5. A deep, plaintive, and vibrating long vibration. 6, This scene is needed 'An airplane gliding through the midnight sky above the glowing metropolis.'."

We entered this prompt along with the image above. We believe this makes the singing theme reasonably clear. In particular, we want to focus on how well it can express emotional nuances, such as item 1: "An urban and bittersweet melody."

While you can listen to Leia’s actual singing later on YouTube, let's walk through the analysis first. Although the original Leia had a bright smile, the singing Leia looks somewhat sorrowful.

When it transitions to a close-up, those emotions become very clear.

We specified in the prompt to incorporate a "midnight flight" scene. It has indeed been inserted effectively. In the actual video, the airplane moves slowly.

Her physical expressions and body language look natural as she conveys emotion. It is impressive.

Actually, the video ended right at the climax. Ah, what a pity. I wanted to hear more. The maximum generation time for the current Omni Flash is 10 seconds, so it cannot be helped. Let's look forward to an extended generation time in the next version update.

Please take a moment to listen to Leia's song. Both English and Japanese versions are available. The English version is nearly perfect, but the Japanese version has a few parts where the pronunciation is slightly unclear. This is an area for improvement.

 

3. The Roadmap to AGI

In this test, Omni Flash consistently generated quite difficult emotional expressions. It understood the meaning and context—keeping her original clothing unchanged while swapping out only the background to match the theme—to create the video. Its adherence to the prompt was also excellent. While the short generation time remains a bottleneck, the content itself deserves high praise.

It is highly probable that Google will use Omni Flash as a starting point to accelerate its development toward AGI. The AI industry is currently suffering from a shortage of GPU supplies, and Google has become one of the few actively speaking out about AGI. Ultimately, being able to develop and produce their own computing resources, such as TPUs, gives them an overwhelming advantage. Demis Hassabis, CEO of Google DeepMind, who is leading the development of Omni Flash, has stated that AGI is "just a few years away" (2).

 

What did you think? Through this experiment, we confirmed the latent potential of the new multimodal generative AI "Omni" and discussed its possibilities for achieving AGI. Here at ToshiStats, we will continue to explore various ideas under the theme of "Road to AGI." Stay tuned!

 


You can enjoy our video news “ToshiStats AI Weekly Review” from this link, too!


1) Introducing Gemini Omni,  Google
2)  A new era of discovery: AI and the frontiers of science with Demis Hassabis, May 22, 2026,  Google for Developers

Copyright © 2026 ToshiStats Co., Ltd. All right reserved.

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

"Agentic Commerce and Agentic Payments: The Next Game Changers for the Financial Industry?"

On May 7, 2026, Mitsubishi UFJ Financial Group, Inc. (hereinafter referred to as MUFG) and Google announced a strategic partnership in the retail sector. They stated that they will collaborate to create new financial services and customer experiences within Japan's retail finance industry. The fact that MUFG, one of Japan's largest financial conglomerates, has teamed up with Google—often regarded as the strongest among AI giants—has an extremely significant impact. In this article, out of several key points, I would like to delve deeper, focusing particularly on AI agents.

 

1. Agentic Commerce and Agentic Payments

First, let's take a closer look at the release from MUFG regarding the section on AI agents (1).

Content of the Partnership (1)

Next-Generation Financial Experiences Supported by AI Agents, spanning from Purchases and Payments to Financial Transaction Decision-Making: Initiatives Toward Autonomous Finance, including Agentic Commerce / Agentic Payments

  • We will collaborate with an eye toward early domestic realization in the fields of "Agentic Commerce" and "Agentic Payments," where AI agents autonomously support a continuous series of processes from product selection and purchasing to payment execution.

  • Google Cloud plans to leverage its expertise in AI and cloud infrastructure to provide MUFG with cloud and AI technologies, as well as technical advice and development support for these initiatives.

  • Through this partnership, MUFG aims to build a next-generation payment infrastructure on Google Cloud to realize Agentic Commerce / Agentic Payments, striving to establish a new standard for purchasing and payments in the AI agent era in Japan.

  • Furthermore, by having AI agents that cooperate at a high level on this same platform support decision-making processes in daily purchases, payments, and various procedures, we aim to realize a new form of finance (autonomous finance) that gently guides customers without burdening them, while respecting their intentions.

  • In addition to digital channels, we will integrate physical touchpoints such as branch offices and remote consultations. By having AI agents understand and support situations across channels, we will provide a consistent sense of security and convenience, while achieving continuous support tailored to each individual customer throughout their daily lives and life events.

As shown above, this is a highly ambitious strategy. In particular, the phrase "MUFG aims to build a next-generation payment infrastructure on Google Cloud to realize Agentic Commerce / Agentic Payments, striving to establish a new standard for purchasing and payments in the AI agent era in Japan" felt like a self-declaration that they will leave other domestic competitors far behind. The following chart is a conceptual diagram of Agentic Commerce / Agentic Payments (1). Next, let's think about why MUFG chose Google.

‍  ‍       Conceptual Diagram of Agentic Commerce / Agentic Payments

 

2. Google’s AI Agent Protocol Suite is One of the Strongest in the World

The primary reason for MUFG choosing Google this time is presumed to be that the suite of AI agent protocols spearheaded by Google is one of the strongest in the world, making it difficult to find alternative options. Starting with the release of the Agent Development Kit (ADK, 2) in April 2025, Google has successively released AI agent protocols (communication standards) such as A2A, AP2, and UCP, expanding its partner network and leading the industry in standardization (3). In particular, the Agent Payments Protocol (AP2) is a protocol specialized for payments, which must have been highly coveted by the financial industry. Currently, each of these is evolving as open-source software, but the fact that Google is driving them is nevertheless crucial. The following material writes well about AI agent protocols. I highly recommend giving it a read (3).

Developer’s Guide to AI Agent Protocols (3)

 

3. Potential for Development from the Japanese Market to the Global Market

Future developments might be easier to understand when looked at from Google’s perspective. Google knows all too well how much of a competitive advantage can be gained by securing a de facto standard in software. A prime example of this is Android, the operating system for mobile devices. Companies that want to manufacture mobile devices typically adopt Android. This is because the Android ecosystem is fully established, and even if a company were to build a proprietary system from scratch now, no partner would willingly adopt a brand-new OS. Many of you probably use mobile devices that run on Android. Through this ecosystem, Google is always able to maintain a competitive advantage in mobile devices. If they can establish a position like Android's in purchasing and payments for the AI agent era, it will bring massive benefits. Although this partnership concerns the Japanese retail market, if it succeeds in the Japanese market, expanding it as-is into the global market would be easy. This is because, inherently, there are no national borders for AI agent protocol suites. We cannot take our eyes off future developments.

 

What do you think? It feels like a harbinger of AI agents entering the payment market in earnest. I am very much looking forward to seeing how future financial services will change.

At ToshiStats, we will continue to think about the evolution of AI agent protocols and financial services. Look forward to it. Stay tuned!


You can enjoy our video news “ToshiStats AI Weekly Review” from this link, too!

1) Strategic Partnership between MUFG and Google in the Retail Sector, May 7, 2026, MUFG
2) Agent Development Kit (ADK)
3) Developer’s Guide to AI Agent Protocols, MARCH 18, 2026, Google





Copyright © 2026 ToshiStats Co., Ltd. All right reserved.

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

The Race for AI Supremacy: Will Google Come Out on Top?

The AI market is a battlefield where diverse players like OpenAI, Anthropic, NVIDIA, Alibaba, and Tencent are engaged in fierce competition. Today, I want to focus on Google and delve into whether they can truly seize hegemony in the AI market in the near future.

 

1. Google’s Secret Weapon: The 8th Generation TPU

Google recently announced its 8th generation TPU (1). The most significant feature of this generation is the separation into independent chips for training and inference. What particularly caught my attention is the remarkable improvement in inference speed. As highlighted in the red frame, the computation speed has increased approximately tenfold compared to the previous generation. While I found myself wondering, "Can it really get this much faster in just one year?", I am eager to try it out as soon as possible. It is expected to debut later this year.

                TPU Performance Comparison

With TPU inference becoming this fast, we might see the same generative AI models produce results significantly quicker when running on TPUs. Currently, among public clouds, only Google Cloud offers the TPU option, which is likely to further boost Google Cloud's competitive edge.

 

2. Massive Investment in Anthropic

Currently, the most popular frontier model in the AI market is Claude, developed by Anthropic. It is exceptionally strong, particularly in the B2B sector. Recently, Google reportedly committed to a massive investment in Anthropic (up to $40 billion, albeit with conditions) (2). From the perspective of frontier model development, Google and Anthropic are competitors. On the other hand, Anthropic is a major customer for Google Cloud.

Therefore, this massive investment holds significant strategic weight. If the likelihood of Claude’s training and inference being performed on TPUs increases, so does the potential for Google to generate revenue from it. This can be viewed as a form of risk diversification for Google. While it would be ideal if Google’s own frontier model, Gemini, maintained a dominant market share, rivals are constantly launching high-performance models. Practically speaking, it is a rational risk-hedging strategy to have even competing models run on TPUs—thereby collecting Google Cloud usage fees—or to aim for capital gains through equity stakes in those invested companies. In any case, we must keep a close eye on the collaboration between Google and Anthropic.

 

3. Google DeepMind’s Technical Prowess and Google’s Product Ecosystem

One cannot discuss Google’s AI without mentioning Gemini. Developed by Google DeepMind, this frontier model is natively multimodal and has made headlines for its high performance with every new release. The current model is Gemini 3, and there is anticipation that a next-generation model might be announced at Google I/O, the annual event starting on May 19, 2026. It’s very exciting.

However, Gemini is not the only generative AI from Google DeepMind. Boasting one of the most diverse arrays of models among all AI labs, their portfolio includes image and video generation models, as well as world models like Genie 3 (3).

Furthermore, Google possesses a vast amount of data required for model generation. Google already operates various products globally, and the data harvested from them is immense—YouTube alone is a clear example. Compared to many AI labs that must build their user bases from scratch, Google has an overwhelming advantage. The combination of "Google DeepMind’s technical prowess + data obtained from various products" is unparalleled.

 

What do you think? Today, we took a deep dive into Google. With powerful technology spanning not just AI model development but various other fields, Google’s strength feels overwhelming. They will likely continue to lead the AI market. Conversely, they are so strong that one might even worry about when they might run afoul of antitrust laws. What are your thoughts?

ToshiStats will continue to cover Google in the future. Stay tuned!

 

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) Our eighth generation TPUs: two chips for the agentic era, Google, Apr 23, 2026
2) Google to invest up to $40B in Anthropic in cash and compute, TechCrunch, April 24, 2026
3) Genie 3: A new frontier for world models, Google, August 5, 2025



Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.


Can You "Vibe Code" Machine Learning? I Tried It and Built an App

2025 was the year the coding style known as "Vibe Coding" truly gained mainstream acceptance. So, for this post, I conducted an experiment to see just how far we could go in building a machine learning model using only AI agents via "Vibe Coding"—with almost zero human programming involved. Let's get started!

 
  1. The Importance of the "Product Requirement Document" for Task Description

This time, I wanted to build a model that predicts whether bank loan customers will default. I used the publicly available Credit Card Default dataset (1).

In Vibe Coding, we delegate the actual writing of the program to the AI agent, while the human shifts to a reviewer role. In practice, having a tool called a "Code Assistant" is very convenient. For this experiment, I used Google's Gemini CLI. For the IDE, I used the familiar VS Code.

Gemini CLI

To entrust the coding to an AI agent, you must teach it exactly what you want it to do. While it is common to enter instructions as prompts in a chatbot, in Vibe Coding, we want to use the same prompts repeatedly, so we often input them as Markdown files.

It is best to use what is called a "Product Requirement Document (PRD)" for this content. You summarize the goals you want the product to achieve, the libraries you want to use, etc. The PRD I created this time is as follows:

PRD

By referencing this PRD and entering a prompt to create a default prediction model, the model was built in just a few minutes. The evaluation metric, AUC, was also excellent, ranging between 0.74 and 0.75. Amazing!!

 

2. Describing the Folder Structure with PROJECT_SUMMARY

It is wonderful that the machine learning model was created, but if left as is, we won't know which files are where, and handing it over to a third party becomes difficult.

Therefore, if you input the prompt: "Analyze the current directory structure and create a concise summary that includes: 1. A tree view of all files 2. Brief description of what each file does 3. Key dependencies and their purposes 4. Overall architecture pattern Save this as PROJECT_SUMMARY.md", it will create a Markdown file like the one below for you.

PROJECT_SUMMARY.md

With this, anyone can understand the folder structure at any time, and it is also convenient when adding further functional extensions later. I highly recommend creating a PROJECT_SUMMARY.md.

 

3. Adding a UI and Turning the ML Model into an App

Since we built such a good model, we want people to use it. So, I experimented to see if I could build an app using Vibe Coding as well.

I created PRD-pdapp.md and asked the AI agent to build the app. I instructed it to save the model file and to use Streamlit for app development. The actual file and its translation are below:

PRD-pdapp.md

When executed, the following app was created. It looks cool, doesn't it?

You can input customer data using the boxes and sliders on the left, and when you click the red button, the probability of default is calculated.

  • Customer 1: Default probability is 7.65%, making them a low-risk customer.

  • Customer 2: Default probability is 69.15%, which is high, so I don't think we can offer them a loan. The PAY_0 Status is "2", meaning their most recent payment status is 2 months overdue. This is the biggest factor driving up the default probability.

As you can see, having a UI is incredibly convenient because you can check the model's behavior by changing the input data. I was able to create an app like this using Vibe Coding. Wonderful.

 

How was it? It was indeed possible to perform machine learning using Vibe Coding. However, instead of programming code, you need to create precise PRDs. I believe this will become a new and crucial skill. I encourage you all to give it a try.

That’s all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Default of Credit Card Clients

 



Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The OpenAI Code Red: What’s Next for the Generative AI Market?

In late November 2022, OpenAI released ChatGPT. It has been three years since then, and just as it was about to celebrate its third birthday, an event occurred that dampened the celebratory mood. CEO Sam Altman declared a "CODE RED" (Emergency) (1). The driving force behind this was the breakthrough of the new generative AI, "Gemini 3" (2), released by Google on November 18. Today, I would like to delve into this theme and forecast the generative AI market for 2026. Let’s get started.

 

1. Gemini 3 vs. GPT-5

On August 6, 2025, OpenAI released GPT-5. Since it was the first major update since GPT-4, people had very high expectations. However, in reality, it was difficult to perceive a significant difference compared to other models. Although it managed to update scores across various benchmarks, the impression was that its impact felt somewhat muted compared to the arrival of GPT-4.

Of course, it is evolving steadily, so if rival companies' models had remained stagnant, I believe it could have celebrated its third birthday peacefully. However, the moves made by its rival, Google, surpassed our expectations. On November 18, 2025, Gemini 3 was released, and everyone was astonished by its high performance. Its scores in almost all benchmarks surpassed those of GPT-5, and for the first time since the birth of ChatGPT, GPT-5 lost its "technological competitive advantage." The battle surrounding generative AI has entered a new phase.

 

2. Why Gemini 3 is Particularly Superior

There are several technical talking points, but what I am paying special attention to is its high capability in image processing and generation. As shown in the leaderboard (3) below, its strength is overwhelming and unrivaled. The famous image generation app Nano Banana Pro is officially named Gemini 3-Pro-Image, and its high scores truly stand out.

                        Leaderboard

When considering individual customers, the ability to easily generate and edit images exactly as envisioned is crucial and can serve as a "killer app." I feel that once individuals experience the technical level of Gemini 3, they will find it difficult to easily switch back to competitor apps. The image below was generated using Nano Banana Pro. As you can see, it has become easy to render both English and Japanese text together on an image. Previously, Japanese text was often incomplete or incomprehensible, so it was quite moving to see clean Japanese generated for the first time.

                   Image generated by Nano Banana Pro

 

3. The Generative AI Market in 2026

With Sam Altman issuing a CODE RED, I believe OpenAI will allocate significant development resources to improving the model itself and will frantically work to close this gap in the image generation field. On the other hand, Google, armed with Gemini 3, possesses several multimodal generative AI models beyond just Nano Banana Pro, and I expect them to leverage that expertise to aim for further breakthroughs.

In particular, generative AI capable of simulation using 3D structures—known as World Models—will likely influence Large Language Models (LLMs) as well, solidifying Google's competitive advantage. One has to admit that Google, which owns YouTube, is incredibly strong in this field. It looks like 2026 will be a year where we cannot take our eyes off how OpenAI launches its counterattack.

 

How was it? While there are several other players creating generative AI, I believe the industry style will involve companies defining their own positions within the context of the "OpenAI vs. Google" battle. Therefore, the outcome of OpenAI vs. Google is extremely important for all AI-related companies. I would like to write another blog post on this same theme if the opportunity arises.

That’s all for today. Stay tuned!









You can enjoy our video news ToshiStats-AI from this link, too!


1) Sam Altman’s ‘Code Red’ Memo Urges ChatGPT Improvements Amid Growing Google Threat, Reports Say, Forbes, 2 Dec 2025
2) A new era of intelligence with Gemini 3, Google, 18 Nov 2025
3)  Leaderboard Overview





Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Google Antigravity: The Game Changer for Software Development in the Agent-First Era

Google has unveiled Gemini 3.0, its new generative AI, and "Antigravity" (1), a next-gen IDE powered by it. Google states that "Google Antigravity is our agentic development platform, evolving the IDE into the agent-first era," signaling a shift toward truly agent-centric development. Here, I’m going to task Antigravity with creating a "Bank Complaint Classification App." I want to actually run it to explore its potential.

                   Antigravity

 

1.Agentic Development with Antigravity

Antigravity is built on top of VS Code. If you are a VS Code user, the editor will look familiar, making it very approachable and easy to pick up. However, the real power of Antigravity lies in its dedicated interface for agentic development: the Agent Manager (shown below). Just enter a prompt into the box and run it to kick off "Vibe Coding." The prompt shown here is the very simple one I entered at the beginning of the development process. Antigravity also appears to be packed with various features designed to facilitate efficient communication with the Agent. For more details, please check the website (1).

                         Agent Manager

 

2. Prompt Refinement and Improvement

Just because you start "Vibe Coding" doesn't mean you'll get perfect code immediately. I started with a simple prompt this time as well, but the process proved to be more challenging than anticipated. While Gemini 3.0 Pro often demonstrates human-level capability when handling HTML and CSS for website building, the framework used for this app—Google ADK—is a brand-new agent development kit that just debuted in April 2025. Consequently, there are likely very few code examples available on the web, and I assume it hasn't been fully absorbed into Gemini 3.0's training data yet.

               Development with Google ADK

It was quite a struggle, but as shown above, I managed to build a fully functional app via "Vibe Coding." To generate these files, I relied solely on natural language instructions; I didn't write a single line of code directly in the editor. However, I did include simple code snippets within the prompts. This is a technique known as "few-shot learning," where you provide examples to guide the model. I believe this approach is highly effective when Vibe Coding with Gemini 3.0 for Google ADK development. While this might become unnecessary as Gemini 3 is updated in the future, it’s certainly a technique worth remembering for now.

Bank Complaint Classification App using Google ADK

The screenshot above shows the "Bank Complaint Classification App" I developed. I verified its accuracy with some simple examples, and the results were excellent. It seems the internal prompts within the app were generated very effectively. Impressive work!

 

3. Summary of Building a Complaint Classification App with ADK

  • Total Time: 6 hours (starting from the Antigravity installation) to complete the app.

  • Execution: With the finalized prompt, the run time is just over a minute.

  • Manual Effort: The actual coding for Google ADK to make the app is only about a 20-minute task if done manually without vibe-coding.

  • Reasons for the Delay:

    • I had to iterate on the prompts several times because Gemini 3 is still unfamiliar with Google ADK

    • I had to explicitly instruct it on file structures and code syntax.

    • I was also using Antigravity for the first time.

  • Conclusion: It is manageable once you understand Gemini 3 Pro's behavior regarding Google ADK.

 

So, what do you think?

It took a little longer because I wasn't used to the new IDE yet, but the combination of Gemini 3.0 Pro and Antigravity was outstanding. I could really feel its high potential. Since the execution speed itself is fast, next time I plan to challenge myself by "Vibe Coding" a multi-agent app. Look forward to it! That's all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!



1) Experience liftoff with the next-generation IDE, Google,  19 Nov 2025







Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

This Is What Happens When an AI Agent Runs Our 2025 Autumn Marketing!

Hello, the high temperature in Tokyo has dropped to 16°C, and it's starting to feel very much like autumn. For those unfamiliar with autumn in Japan, this is the season when the leaves on the mountains change from green to orange. The entire mountainside is dyed orange, creating a beautiful and spectacular view. Therefore, I decided to use orange as the background color for this marketing campaign's promotional video. The challenge is: "To devise a campaign to sell cakes to women in Ashiya, an affluent residential area in the Kansai region." What happens when we entrust this task to an AI agent? Let's find out.

 

1. Creating an AI Marketing Agent with "Google Opal"

This time, I'm creating an AI marketing agent using Google Opal (1). As the description says, "Opal, our no-code AI mini-app builder," you can easily develop an AI agent app like the one below.

For this AI agent's development, I only entered the following prompt: "You are an expert in marketing campaigns. You will be given the following information: 1. The product/service to sell, 2. The target customer, 3. The location/region, 4. The time/season of the campaign, 5. The desired brand image color, 6. A photo of the facilitator. Using this information, please create the following: a. A marketing strategy, b. A marketing campaign name, c. A logo based on the name, d. A promotional video featuring the facilitator, complete with BGM."

Just by executing this, you can create a workflow like the one shown above using the AI agent. After that, you just switch to the app and answer questions related to your task, and the marketing campaign is created. Amazing, isn't it!

 

2. Marketing Strategy and Logo

Once you input all the necessary information, you get the results back immediately. First is the marketing strategy. In reality, a more detailed discussion followed. This time, I'll just introduce the beginning. Even though I didn't input very detailed information about the campaign at the initial stage, I think this marketing strategy is well-done.

                  Marketing Strategy

Next is the marketing campaign name and logo. What it generated was a cool, French-style logo. I'd love to try using it sometime.

          Logo

 

3. Three Short Promotional Videos

First, I provide the AI agent with a base image of a woman. Then, using this image as a starting point and based on the created marketing strategy, an approximately 8-second short video is generated. It's exciting to see what kind of video the AI agent will produce. This time, it created three videos with BGM. All of them are based on the theme of "Autumn Cakes." It's hard to pick a winner; they are all excellent. After actually creating the videos, I felt that even 8 seconds is enough to convey the image clearly. Which one did you like the best?

 

What did you think? Although this was just a demo AI agent, I was astonished at what it could accomplish with no code, no programming. It seems like it will become a powerful ally for marketers. Of course, there are limitations, but what I created this time can be done for free with just a Google account. I highly recommend giving it a try. ToshiStats will continue to share more about AI agents. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1) Opal is now available in more than 160 countries, Google, 7 Nov 2025

Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.