The Secret Sauce for Mastering Agentic Coding !

Since the beginning of this year, we've been hearing a lot about "agentic coding"—where AI agents handle the coding—everywhere. While we no longer write programs ourselves and instead focus entirely on giving instructions to AI agents via prompts, many people likely find themselves wondering, "What exactly should I learn to write good prompts?" So, today, I'd like to explore this topic using an experiment conducted at ETH Zurich as our guide.

 

1. Overview of the Experiment

The reference for this discussion is the paper titled "Computer Science Achievement and Writing Skills Predict Vibe Coding Proficiency (1)." They gathered 100 students who first took tests to measure their writing skills, computer science achievement, and general cognitive abilities. I've summarized these three foundational skills below.

‍  ‍        Three Foundational Skills

Afterward, to measure their "agentic coding" proficiency, the participants reviewed a sample application, drafted prompts for an LLM-based agent, tested the generated application, and then further refined it. The final applications were evaluated by human graders.

         Measuring "Agentic Coding" Proficiency

This process reveals the relationship between the three foundational skills and agentic coding proficiency.

 

2. As Expected, Computer Science Skills Mattered

As the results below show, computer science skills were most strongly correlated with agentic coding proficiency, showing a correlation coefficient of 0.39. Writing skills also showed a significant correlation, with a coefficient of 0.29. Here is a summary of the results.

‍  ‍        Skills Correlated with Agentic Coding Proficiency

Now, some of you might find this a bit puzzling. Computer science skills are primarily centered around programming, whereas in agentic coding, humans don't actually write code directly. So, why did computer science skills show such a high correlation? The research paper explains it as follows:

"It may have contributed through problem decomposition or mental models of control flow and state."

It's certainly true that people hone these kinds of abilities through the practice of programming. If that's the case, it makes perfect sense that individuals with strong computer science skills would perform well, even in natural language-driven agentic coding.

 

3. How Those with No Programming Experience Can Become Excellent Agentic Coders

Based on our discussion so far, I'd like to explore a new approach on "how people with no programming experience can become excellent agentic coders." As agentic coding becomes more widespread, it might be inevitable that the incentive to learn traditional programming will fade. However, the following skills are still absolutely essential for mastering agentic coding:

  • The ability to decompose tasks

  • The ability to understand system flows

  • The ability to expand your vocabulary and accurately define requirements in writing

For those without programming experience, deliberately focusing on and studying these specific points alongside your regular prompt writing practice will likely accelerate your improvement. This is something you can start doing right away today. I highly recommend it!

 

What do you think? While we focused on "agentic coding" today, the insights we've gained go far beyond just "coding"—they can be seen as universal skills for unlocking the true potential of AI agents. As AI agents become integrated into various fields in the future, these skills will essentially become mandatory subjects for all of us. Here at ToshiStats, we will continue to discuss the collaboration between business professionals and AI agents. Stay tuned!

 

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!


1) Computer Science Achievement and Writing Skills Predict Vibe Coding Proficiency, Sverrir Thorgeirsson, Theo B. Weidmann, Zhendong Su. 14 Mar 2026


Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

The End of Traditional Research: How "autoresearch" is Changing Everything

"It would be wonderful to have a system where you could give instructions to an AI agent before going to bed, and while you sleep, the AI agent executes the program so that a finished product is ready by the time you wake up in the morning." This is not a story about the future. It is an application called "autoresearch" (1) released on March 6, 2026, and anyone can use it for free. Let’s take a look right away.

 

1. What is "autoresearch"?

This is a project by the renowned AI researcher Andrej Karpathy. According to his GitHub, it is described as "AI agents running research on single-GPU nanochat training automatically," meaning he has created AI agents that automatically train nanochat (2). Nanochat is a small yet high-performance large language model (LLM) that he developed. Usually, he trains nanochat while manually tuning it, but this is a very ambitious project to automate that process using "autoresearch." According to him, even though it has just begun, "autoresearch" has worked very well. For details, please see his post on X (3).

 

2. Simple is Best

When you hear about automating the training of a large language model, you might imagine a very complex system, but there are only three basic files. Furthermore, the only file a human needs to write directly is program.md. In this file, you write in natural language, such as English or Japanese, "what kind of research team we want to form by launching multiple AI agents and what we want them to do." No programming is required. The AI agent that receives these instructions autonomously writes code in train.py to improve the accuracy of nanochat. The final file, prepare.py, is never updated during training. It serves as the foundation for the experiment, so it remains the same until the end. It is a very simple structure. I highly recommend checking Andrej Karpathy’s GitHub for the contents of each file; it will be very informative. I have summarized the overview briefly below.

This is the autoresearch repository for Mac that I executed this time. You can certainly see the three files I introduced. The file structure is extremely simple, and I believe anyone can handle it.

 

3. Running on a MacBook Air

Now, let's run it on my MacBook Air. This Mac was purchased exactly one year ago and is equipped with an M4 chip and 24GB of RAM. Claude Code is active as the development environment once again. It is on duty at our company almost every day.

Claude Code

When I asked Claude Code to draw a diagram, it looked like the one below. It is simple and easy to understand. On the second from the right, it says MLX Train 5m, which means repeating a 5-minute training session many times. It can be executed about 12 times in one hour. On the far right, Evaluate val_bpb means "evaluate the metric val_bpb (validation bits per byte) and check if the value is steadily decreasing." If the value decreases, it means the accuracy is improving. If not, that session is discarded, and training continues from the previous state. If you let this run while you sleep, you can conduct 100 experiments in a single night.

autoresearch Training Process

Andrej Karpathy describes this design as follows: ‘Self-contained. No external dependencies beyond PyTorch and a few small packages. No distributed training, no complex configs. One GPU, one file, one metric.‘

Since I wanted to confirm if it would work properly this time, I ran the loop only three times. As seen below, the evaluation metric did indeed decrease, showing that the training progressed smoothly. During this time, I gave no instructions at all. It’s amazing. It truly is "autoresearch"!

Trends in Evaluation Metric Values

 

What did you think? Andrej Karpathy stated on his X (3) account:

“All LLM frontier labs will do this. “,

“any metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.”

You, too, might be able to create your own AI lab using a Mac. It is a wonderful thing. At ToshiStats, we will continue to conduct experiments incorporating cutting-edge technology. Stay tuned!.

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) autoresearch, Andrej karpathy, March 6, 2026
2) nanochat, Andrej karpathy, Oct13,2025
3) https://x.com/karpathy/status/2031135152349524125

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

Many-Shot In-Context Learning: The Game Changer of the Long-Context AI Era

Recently, OpenAI released its newest AI model, GPT-5.4 (1). While much of the praise has focused on its overall performance, I want to highlight its context window length. The context window refers to the amount of information a generative AI can process in a single go. GPT-5.4 now supports 1M (one million) tokens. With its rival Opus 4.6 also at 1M and Google Gemini having achieved 1M two years ago, all frontier models from the "Big Three" now possess 1M-token context windows. We can officially say that AI has entered the Long-Context Era.

How will this impact the development of AI agents? Let’s explore.

 

1. What is Many-Shot In-Context Learning?

When you ask ChatGPT, "What is the capital of Japan?" and it replies, "Tokyo," that question or instruction is called a prompt. However, you can input much more than just a short prompt.

For example, if you provide examples first—such as "Where was the World Expo held in Japan?" followed by "Osaka"—and then ask your actual question, the accuracy is known to improve. This technique is called In-Context Learning. When the number of examples exceeds roughly 10 and you provide a massive amount of data, it is referred to as Many-Shot In-Context Learning. Here is a brief summary.

In-Context Learning

 

2. Challenging a 20-Class Classification Task Using Bank Complaint Data

To measure the effectiveness of Many-Shot In-Context Learning, I decided to tackle a difficult 20-class classification task using bank complaint data (2). This dataset contains an "issue" column describing why a complaint occurred. The goal is to read the "text" column and select the correct cause from 20 possible categories. For this, I used Gemini 3.1 Flash-Lite (3).

     Banking complaints dataset

Rather than using a simple prompt like "Please classify this," I asked the AI itself to "create the optimal prompt," resulting in a highly detailed set of instructions—what you might call a "Prompt Powered by AI."

prompt powered by AI

I first attempted this using Zero-shot (providing no examples), even with this enhanced prompt. Unfortunately, the accuracy was only 46%. Since it gets it wrong more than half the time, it isn't yet viable for practical business use.

Zero-Shot accuracy

 

3. Executing Many-Shot In-Context Learning with 1,000 Samples

Next, I implemented Many-Shot In-Context Learning by providing 1,000 examples alongside the prompt. While the underlying process remains the same as the Zero-shot approach, the volume of information is massive. The following are the first five examples.

Many-Shot samples

The results were dramatic: accuracy jumped to 70%. This clearly demonstrates the sheer power of the "Many-Shot" approach.

Many-Shot accuracy

However, with a 30% error rate, there is still room for improvement. I had an AI Agent analyze why the errors occurred and generate a report. The insights gained from this analysis are highly valuable for further refinement.

Root cause analysis

 

Conclusion

There are several ways to improve the accuracy of generative AI, but as 1M-token context windows become the standard, Many-Shot In-Context Learning is set to become a major focal point. At ToshiStats, we plan to continue evolving this methodology.

Stay tuned!

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

 

1) Introducing GPT‑5.4, Open AI, March 5, 2026
2) Consumer Complaint Database
3 )Gemini 3.1 Flash-Lite: Built for intelligence at scale, Google, Mar 03, 2026

Copyright © 2026 ToshiStats Co., Ltd. All right reserved.

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

Which AI Model Should You Use Daily? Why Gemini 3.1 Flash-Lite is the Top Choice!

I’ve been using Opus 4.6 for coding lately, but I've realized that the costs can really add up when running it via API. This led me to think that for tasks where absolute peak precision isn't the only priority, a more budget-friendly model would be a better fit. Right on cue, Google announced the gemini-3.1-flash-lite-preview—a model built for speed and affordability (1). I decided to put it to the test immediately.

 

1. The Perfect Balance of Speed, Cost, and Performance

The Flash-Lite series is the most affordable tier in the Gemini lineup. It’s likely the engine behind many of Google’s own internal services. Speed, in particular, seems to be its standout feature.

When compared to its rivals, the processing speed is remarkably fast. Its cost-efficiency is equally impressive: at $0.25 per 1 million input tokens, it is poised to be a powerhouse for tasks involving massive amounts of data. For a startup like ours, this is incredibly encouraging.

               Comparison with Rival AI Models

Affordability hasn't come at the expense of performance, however. As shown in the Leaderboard (2), it boasts a score exceeding 1430. Given that the top-tier frontier models are currently competing around the 1500 mark, a score of 1430 for a lightweight model is truly outstanding.

                 Leaderboard Standings

 

2. Performance Evaluation: Banking Complaint Classification

To see what it can really do, I tested the model on a banking complaint classification task. Using this dataset (3), I provided the model with customer complaints from the "text" column and asked it to select the most relevant category from six financial products listed in the "Product" column. I ran this test on 100 samples to see how accurately it could categorize each complaint.

                 Banking Complaint Data

Here is the detailed prompt I used.

The Prompt

The results were fantastic, achieving a 92% accuracy rate. The entire process finished in about 60 seconds, demonstrating its high-speed processing capabilities. I’ve attempted this specific task several times in the past, but this is the first time a model has exceeded 90% accuracy without any fine-tuning. Truly impressive!

Task Accuracy Results

 

3. A High-Speed Model You Can Use Without Budget Anxiety

For the past few months, I’ve relied on Opus 4.6 for its sheer coding power. While its performance is top-notch, the costs are substantial. When you want to run various experiments where success isn't guaranteed, the budget can become a significant hurdle.

That’s where gemini-3.1-flash-lite-preview shines. Its balance of performance and cost makes it easy to iterate and experiment freely. It’s the perfect "partner" for development, and I plan to integrate it into my workflow even more moving forward.

 

What do you think? It looks like Google will continue to roll out new AI models one after another. We might even see some open-source models soon, so it's definitely something to keep an eye on. Here at ToshiStats, we’ll keep testing and integrating various AI models into our workflow. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Gemini 3.1 Flash-Lite: Built for intelligence at scale,  Google,  Mar 03, 2026
2) Arena
3) Consumer Complaint Database

Copyright © 2026 ToshiStats Co., Ltd. All right reserved.

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

The Rise of the AI Strategist: Can AI Agents Master Corporate Strategy?

Claude Code, the coding assistant that's exploding in popularity worldwide—did you know you can use Agent teams (1) to run AI agents as a team? The idea is to run multiple AI agents simultaneously according to their purpose, achieving performance that a single agent couldn't deliver. This time, we'd like to test whether we can use Agent Teams to develop corporate strategy. Let's get started!

 

1. Implementing Five Forces Analysis with Agent Teams

There's a well-known framework in competitive strategy called Five Forces Analysis (2). This time, we'd like to apply it to the Japanese digital payment market and explore the possibility of market entry. We'll analyze from the following five perspectives, setting up an AI agent for each one.

                  Five Forces Analysis

We entered the following prompt into Claude Code, which you're all familiar with by now. There's nothing particularly difficult about it. Of course, no programming is required. However, if this is your first time using Agent Teams, you'll need to configure the settings, so don't forget (1).

                    Claude Code

The multi-agent system we'll actually build looks like the following. A total of seven AI agents will be running, but the key point is the loop involving Agent 6 and Agent 7. After Agent 6 creates a report summarizing the research findings, Agent 7, positioned independently, verifies that report. The report isn't complete until Agent 7 approves it and gives the go-ahead. Quite rigorous, isn't it?

                Strategic Analysis Multi-Agent System

 

2. The Report Creation Process

Now let's follow the report creation process on the actual screen. As you can see below, seven AI agents have indeed been configured. You can also see that the crucial verification loop has been created.

                    Seven AI Agents

First, Phase 1. The five research AI agents begin by pulling information from the web. They gather information about the Japanese digital payment market from the five perspectives of Five Forces Analysis. Each AI agent operates independently and processes in parallel, making it very efficient.

Work has progressed, and it appears four of the research tasks are complete. The competitive landscape from each perspective is documented as well. Just a little more to go.

The research by all five AI agents is complete, and we move into Phase 2: creating the integrated report. I'm excited to see what kind of report it will be.

Then we enter the most important phase—Phase 3: the verification loop. Here, the goals are: 1) fact-checking through search, 2) identifying logical inconsistencies, and 3) identifying hallucinations, all aimed at improving the quality of the integrated report.

It appears eight errors were identified and corrected.

The report is finally complete. As shown below, there are six types of reports. We compiled all six into a single PDF file, and it spans 60 pages of content. Impressive, isn't it?

 

3. Structure of the Generated Analysis Report

The structure of the consolidated report is as follows. It's written in accordance with the Five Forces Analysis framework.

Structure of the analysis Report

We can't present everything here, but the summary in Chapter 1 looks like the following—I think it's very clearly organized. Please note that this summary is for educational purposes only and should not be directly applied to business decisions or the like.

              notice : This is for educational purpose only

 

So, what did you think? We carried out corporate strategy development using Five Forces Analysis, and the AI agents produced an excellent report. While further verification is needed, it could potentially be used as a starting point for discussion. I should note that Agent Teams is currently in an experimental phase, so changes to specifications are possible going forward (1). At Toshi Stats, we'll continue applying multi-agent systems across various fields. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Orchestrate teams of Claude Code sessions, Anthropic
2) Porter's five forces analysis, Wikipedia

Copyright © 2026 ToshiStats Co., Ltd. All right reserved.

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

Predicting Loan Payback through "Agent Skills": The New Standard for Enterprise AI

The most common complaint about AI agents in business? 'The output isn't what I wanted.' In a corporate landscape, consistency is everything—without pre-defined formats, users get lost. Instead of just teaching everyone to prompt better, why not embed that expertise into the organization itself? By providing standardized prompts upfront, users get perfect results from day one. The secret to this is 'Agent skills' (1). Let’s see how it works!

 

1. What are Agent Skills?

Announced as "skills" by the AI giant Anthropic in October 2025, Agent Skills have since been adopted by almost every major AI company. They have become the de facto standard for providing domain-specific knowledge to generative AI. According to Anthropic:

“Agent Skills are modular capabilities that extend Claude's functionality. Each Skill packages instructions, metadata, and optional resources (scripts, templates) that Claude uses automatically when relevant.”

The beauty of defined Agent Skills is their portability—once created, they can be used across different platforms.

 

2. Creating Agent Skills

Now, let's dive right in. I’m going to create an 'Agent Skill' using Claude Cowork. I uploaded the PRD (Product Requirements Document) I typically use for building prediction models and input the following prompt.

‍  ‍           Claude Cowork

Since Claude Cowork has a built-in skill creator, it automatically generates an Agent Skills folder containing a skill.md file. This skill.md stores the most fundamental information for the Agent Skill, and its header always includes the following content. AI agents like Claude Code are designed to read this section first.

         skill.md 1

For tasks related to predictive modeling, the agent reads the specific implementation logic defined in the skill (which, in this case, spans about 240 lines) before moving to the coding phase.

           skill.md 2

 

3. Building a Prediction Model via Agent Skills

Next, I utilized Claude Code for agentic coding. As shown below, the "skills" we just created are active and recognized by the environment.

Claude Code

Because the detailed modeling process is already governed by the Agent Skill, my manual prompt can be as simple as: "Please create a prediction model." For this project, I used data from the Kaggle "Predicting Loan Payback" competition (2), where the goal is to predict whether a borrower will repay their loan. The entire implementation was completed in about two hours with almost no manual corrections. The stability of Opus 4.6 (3) is truly remarkable!

The model achieved an AUC of 0.92435 on the Kaggle leaderboard—a score that is well within the range of practical, production-ready application.

Kaggle leaderboard

One secret behind this high accuracy was the creation of new features based on ratios. By analyzing feature importance, we ensured only the most impactful variables were included in the final model.

new features based on ratios

 

4. Testing the Resulting Model

Let’s look at the model built via Agent Skills in action. First, we calculate the probability of repayment for an individual customer. In this example, the probability exceeds 96%, resulting in a "Success" (likely to repay) classification based on a 50% threshold. This threshold is, of course, adjustable depending on the specific business objectives.

prediction for an individual customer

To avoid the "black box" problem, I use SHAP analysis to explain why a customer received a specific score. As seen in the graph, the length of the red arrows indicates the contribution of each feature. Here, employment_status was the most significant factor driving the "Success" prediction. This transparency is crucial for corporate accountability.

SHAP analysis for a customer

 

We can also apply SHAP to the entire dataset. Again, employment_status emerges as the top contributor across all customers. We can see that this feature also carries a high degree of contribution across the entire customer base.

SHAP analysis for all customers

Furthermore, SHAP allows us to visualize the non-linear relationship between specific features and repayment probability. For example, with credit_score, the probability doesn't just rise linearly. The data shows that the probability remains flat until a score of 550, starts to rise at 600, and accelerates significantly after 700. This level of granular insight is what makes SHAP so valuable.

‍ ‍ Feature-wise SHAP Analysis

 

By using Agent Skills, you can embed entire libraries of domain knowledge directly into your AI’s workflow. These skills are reusable, portable, and—in my opinion—will soon be a requirement for any business using AI agents.

I look forward to seeing how Agent Skills continue to permeate the corporate world and what innovations they will trigger. TOSHI STATS Co. will continue to lead the way in this space.

Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Agent Skills
2) Predicting Loan Payback, Yao Yan, Walter Reade, Elizabeth Park. Kaggle, 2025
3) Introducing Claude Opus 4.6, Anthropic, Feb 5 2026

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

From Zero to Production: How Opus 4.6 Agentic Coding Revolutionizes Insurance Analytics

In the ever-evolving landscape of InsurTech, cross-selling is a literal goldmine. Utilizing Opus 4.6 and Agentic Coding, I have constructed a sophisticated "Insurance Cross-Sell Prediction Model" implementation pipeline, covering everything from memory-optimized data loading to complex feature engineering. Let’s dive in!

 

1. Agentic Coding with Opus 4.6

Unlike traditional coding, Agentic Coding with Opus 4.6 (1) allows the AI to function as an autonomous engineer. It goes beyond writing snippets; it manages directory structures, ensures memory efficiency for datasets of 11.5 million rows, and completes a production-ready Streamlit dashboard.

In this process, my role was simply to write the "Product Requirement Document (PRD)”—a document in natural language (Japanese or English) defining what I wanted to build. No Python knowledge was required on my part. By putting Claude Code into plan mode, an implementation blueprint is automatically generated, allowing me to verify the coding logic before Opus 4.6 executes it. While I monitored the progress, I never had to write a single line of code myself. Truly remarkable.

 

2. Project Overview

This project features a robust ecosystem designed for real-world application:

  • LightGBM + Optuna: Automated hyperparameter optimization to maximize AUC.

  • 50 Ratio-Based Features: Generation of 50 unique indicators to capture hidden customer behavior patterns.

  • Explainability via SHAP: Implementation of SHAP values to visualize why a specific customer is likely to purchase.

The data was sourced from a Kaggle competition regarding automobile insurance cross-selling (2).

Kaggle competition regarding automobile insurance cross-selling

Performance Results: When evaluating the model built via Opus 4.6 Agentic Coding on the Kaggle leaderboard, it achieved a high score of AUC = 0.88343. This level of accuracy is more than sufficient for practical business use.

Kaggle leaderboard

 

3. Key Features of the Implementation

The model provides two primary functions: individual customer prediction and total customer portfolio analysis.

Individual Prediction

We set the threshold for a "successful" cross-sell at a probability of 35% or higher. Below is an example of a customer predicted to be a successful cross-sell target. To avoid the "Black Box" problem, we use SHAP values to show the contribution of each feature. The larger the SHAP value, the higher its contribution to the positive prediction. This allows staff to understand the concrete reasoning behind the AI's decision.

customer predicted to be success

feature contribution

Conversely, for customers predicted to fail (probability below 35%), the SHAP values indicate which factors are pulling the probability down.

customer predicted to fail

feature contribution

Customer portfolio Analysis

We can also analyze the "Cross-Sell Success Rate" across an entire customer portfolio. In this demo, we imported a CSV of 30,000 customers. With the threshold set at 35%, the model identified 3,708 potential targets. By adjusting the threshold, marketing teams can narrow or broaden their focus for specific campaigns. The dashboard also displays the overall probability distribution across the entire dataset.

probability distribution

 

4. Business Impact

This high-precision model provides sales representatives with a prioritized "Hot Lead" list. Thanks to the Streamlit-based GUI, non-technical staff can execute batch predictions and verify the reasoning via SHAP instantly. This is the definition of Data-Driven Marketing.

 

Conclusion

The synergy between Opus 4.6 and human expertise is redefining the speed of machine learning development and implementation. The potential is, quite frankly, staggering. At TOSHI STATS, we will continue to explore innovations in this field.

Stay tuned!

 

1) Introducing Claude Opus 4.6, Anthropic, Feb 5 2026
2) Binary Classification of Insurance Cross Selling,  Walter Reade and Ashley Chow, Kaggle

You can enjoy our video news ToshiStats-AI from this link, too!

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Mind-Blowing Performance: Building a Bank Churn Prediction Model using Claude Opus 4.6

Earlier in 2026, the AI giant Anthropic announced Opus 4.6(1), the latest update to its frontier model series. Today, I want to share my experience using Claude Code to build a bank customer churn prediction model to see just how far this new version can go. Let’s dive in.

 

1. The Ultimate Coding Model

Opus 4.6 is Anthropic’s new masterpiece, outperforming Opus 4.5 across various benchmarks. Its coding capabilities, in particular, are often rated as the best in the industry, and it feels like it’s now a giant leap ahead of the competition.

 

2. Developing a Churn Prediction Model via "Agentic Coding"

I decided to pair Claude Code with Opus 4.6 to develop a prediction model using "agentic coding"—a method where the AI agent handles the entire Python implementation without human intervention.

The task: Bank Customer Churn Prediction. Losing customers is costly and hurts brand loyalty. A predictive model allows us to identify "at-risk" customers and take proactive retention measures before they leave. For this experiment, I used a dataset from a well-known Kaggle competition.

The Workflow

  1. PRD Creation: I wrote a detailed Product Requirement Document (PRD) outlining my goals.

  2. Autonomous Execution: I ran Claude Code in plan mode. It drafted the implementation strategy, and once I gave the green light, it proceeded to code the entire system.

  3. Minimal Intervention: While Claude Code occasionally asked for permissions, I simply hit "yes" every time. It was effectively 100% AI-driven development.


The Resulting GUI

The final application is a sleek tool where you can select a Customer ID to see their specific churn probability. It clearly distinguishes between "Loyal" and "At-Risk" customers.

                Example: Predicted Non-Churner

                Example: Predicted Churner

  • Individual Prediction: Instant probability scores for specific users.

  • Batch Prediction: For a birds-eye view, you can upload a CSV of your entire database (approx. 110,000 customers).

  • Dynamic Thresholding: You can set a churn threshold. For example, at a 50% threshold, 31.2% of the customers are flagged as likely to leave.

By raising the threshold to 90%, the list narrows down to the most critical 8.3% of the customer base. This makes it incredibly easy to target high-stakes marketing campaigns or retention offers.

Efficiency Note: The entire process—from data acquisition to a fully functional predictive model—took only about 90 minutes. Not having to write a single line of Python manually is a massive productivity boost.

To enable even deeper analysis, I’ve also included a CSV export feature. Those proficient in Python can leverage this file to conduct their own custom evaluations as needed.

 

3. Glimpsing the Latent Potential of Opus 4.6

As expected, Opus 4.6 completed the end-to-end development process without a single error. When I attempted this same task with Opus 4.5, I had to tell AI agent to correct a calculation method because I hadn't been specific enough in my pipeline description. This time? Zero rework. The performance improvement is tangible.

 

Opus 4.6 is set to become an indispensable partner in machine learning development. While this isn't a "full" generational leap (like a version 5.0), the refinement is world-class. Rumor has it that Opus 5 is already deep in development at Anthropic and might debut in late 2026. I can’t wait to see what kind of evolution that brings.

Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!







1) Introducing Claude Opus 4.6, Anthropic, Feb 5 2026
2) Binary Classification with a Bank Churn Dataset, Kaggle, Jan 2, 2024


Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.



AGI in 2 Years or 5 Years? — Survival Strategies for 2030

In January 2026, several interviews with CEOs of top AI labs were released. One particularly fascinating encounter was the face-to-face interview (1) between Anthropic CEO Dario Amodei and Google DeepMind CEO Demis Hassabis. I have summarized my thoughts on what their comments imply. I hope you find this insightful!

 

1. Will AGI Arrive Within 2 Years?

Dario seems to hold a more accelerated timeline for the realization of AGI. While prefixing his thoughts with "It is difficult to predict exactly when it will happen," he pointed to the reality within his own company: "There are already engineers at Anthropic who say they no longer write code themselves. In the next 6 to 12 months, AI might handle the majority of code development. I feel that loop is closing rapidly." He argued that AI development is hitting a flywheel effect, particularly noting that progress in coding and research is so remarkable that AI intelligence will surpass public expectations within a few short years.

A prime example is Claude Code, released by Anthropic last year. This revolutionary product is currently taking the software development world by storm. It is no exaggeration to say that the common refrain "I don’t code manually anymore" is a direct result of this tool. In fact, I recently used it to tackle a past Kaggle competition; I achieved an AUC of 0.79 with zero manual coding, which absolutely stunned me (3).

 

2. AGI is Still 5 Years Away

On the other hand, Demis maintains his characteristically cautious stance. He often remarks that there is a "50% chance of achieving AGI in five years." His reasoning is grounded in the current limitations of AI: "Today’s AI isn't yet consistently superior to humans across all fields. A model might show incredible performance in one area but make elementary mistakes in another. This inconsistency means we haven't reached AGI yet." He believes two or three more major breakthroughs are required, which explains his longer timeline compared to Dario.

Unlike Anthropic, which is heavily optimized for coding and language, Google is focusing on a broader spectrum. One such focus is World Models—simulations of the physical spaces we inhabit. In these models, physics like gravity are reproduced, allowing the AI to better understand the "real" world. Genie 3 (2) is their latest version in this category. While it has only been released in the US so far, I am eagerly anticipating its global rollout. The "breakthroughs" Demis mentions likely lie at the end of this developmental path.

 

3. Are We Prepared for AGI?

While their timelines differ, Dario and Demis agree on one fundamental point: AGI—which will surpass human capabilities in every field—is not far off. Exactly ten years ago, in March 2016, DeepMind’s AlphaGo defeated the world’s top Go professional. Since then, no human has been able to beat AI in the game of Go. Soon, we may reach a point where humans can no longer outperform AI in any field. What we are seeing in the world of coding today is the precursor to that shift.

It is a world that is difficult to visualize. Industrial structures will be upended, and the very role of "human work" will change. It is hard to say that we are currently prepared for this reality. In 2026, we must begin a serious global dialogue on how to adapt. I look forward to engaging in these discussions with people around the world.

I highly recommend watching the full interview with Dario and Demis. These two individuals hold the keys to our collective future. That’s all for today. Stay tuned!

 

1) The Day After AGI | World Economic Forum Annual Meeting 2026, World Economic Forum,  Jan 21, 2026
2) Genie 3, Google DeepMind, Jan 29, 2026
3) Is agentic coding viable for Kaggle competitions?, January 16, 2026



You can enjoy our video news ToshiStats-AI from this link, too!

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Is agentic coding viable for Kaggle competitions?

The "Agentic Coding" trend continues to accelerate as we enter 2026. In this post, I will challenge myself to see how high I can push accuracy by delegating the coding process to an AI agent, using data from the Kaggle competition Home Credit Default Risk [1]. Let's get started right away.

 

1. Combining Claude Code and Opus 4.5

I will be using Opus 4.5, a generative AI renowned for its coding capabilities. Additionally, I will use Claude Code as my coding assistant, as shown below. While I enter instructions into the prompt box, I do not write any Python code myself.

You can see the words "plan mode" at the bottom of the screen. In this mode, Claude Code formulates an implementation plan based on my instructions. I simply review it, and if everything looks good, I authorize the execution.

Let's look at the actual instructions I issued. It is quite long for a "prompt," spanning about two A4 pages. The beginning of the implementation instructions is shown below. I wrote it in great detail. I'd like you to pay special attention to the final instruction regarding the creation of 50 new features using ratio calculations.

              Part of the Product Requirement Document

Below is a portion of the implementation plan formulated by the AI agent. It details the method for creating new features via ratio calculations. Although I only specified the quantity of features, the plan shows that it selected features likely to be relevant to loan defaults before calculating the ratios.

The AI agent utilized its own domain knowledge to make these selections; they were certainly not chosen at random. This demonstrates the high-level judgment capabilities unique to AI agents.

              New feature creation plan by the AI Agent

            Part of the new features actually created by the AI Agent

 

2. Achieving an AUC of 0.79

By adopting LightGBM as the machine learning library, using the newly created features, and performing hyperparameter tuning, I was able to achieve an AUC of 0.79063, as shown below.

Reaching this level without writing a single line of Python code myself marks this experiment as a success. The data used to build the machine learning model consisted of seven different CSV files. These had to be merged correctly, and the AI agent handled this task seamlessly. Truly impressive!

                 Evaluation results on Kaggle

 

3. Will AI Agents Handle Future Machine Learning Model Development?

While the computation time depends on the number of features created, it generally took between 1 to 4 hours. I ran the process several times, and the calculation never stopped due to syntax errors. The AI agent likely corrected any errors itself before proceeding to the next calculation step.

Therefore, once the initial implementation plan is approved, the results are generated without any further human intervention. This could be revolutionary. You simply input what you want to achieve via a PRD (Product Requirement Document), the AI agent creates an implementation plan, and once you approve it, you just wait for the results. The potential for multiplying productivity several times over is certainly there.

 

How was it? I was personally astonished by the high potential of the "Claude Code and Opus 4.5" combination. With a little ingenuity, it seems capable of even more.

This story is just beginning. Opus 4.5 will likely be upgraded to Opus 5 within the year. I am already looking forward to seeing what AI agents will be capable of then.

That’s all for today. Stay tuned!




1) Home Credit Default Risk, kaggle



You can enjoy our video news ToshiStats-AI from this link, too!



Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

"ClaudeCode + Opus 4.5" Arrives as the 2026 Game Changer !

2026 has officially begun! The AI community is already abuzz with talk of "agentic coding" using ClaudeCode + Opus 4.5. I decided to build an actual application myself to test the potential of this combination. Let’s dive in.

 

1. ClaudeCode + Opus 4.5

These are the coding assistant and frontier model from Anthropic, respectively, both renowned for their strength in coding tasks. I imagine many will use them integrated into an IDE like VS Code, as shown below. You can see the selected model is Opus 4.5. Also, notice the "plan mode" indicator at the bottom.

                   ClaudeCode

Here, a data scientist inputs a prompt detailing exactly what they want to develop. The system then enters "plan mode" and generates an implementation plan like the following. The actual output is quite long, but here is the summary:

                   Implementation Plan

The goal this time is to create an application that combines machine learning and Generative AI, as described above. Once you agree to this implementation plan, the actual coding begins.

 

2. Completion of the AI App with GUI

In this completed app, you can input customer data via the screen below to calculate the probability of default, which can then be used to assess loan eligibility.

The first customer shows low risk, so a loan appears feasible.

                    Input Screen

                   Default Probability 1

‍                 ‍Default Probability 2

For the second customer, as highlighted in the red frame, the payment status shows a 2-month delay. The probability of default skyrockets to 65.54%. This is a no-go for a loan.

 

3. Validating Model Accuracy on a Separate Screen

This screen displays the metrics for the constructed prediction model, allowing you to gauge its accuracy. While figures like AUC are bread and butter for experts, they might be a bit difficult for general business users to grasp.

To address this, I decided to include natural language explanations. By leveraging Generative AI, implementing multilingual support is relatively straightforward.

Switching the setting changes the text from English to Japanese. Of course, support for other languages could be added with further development.

While I used Opus 4.5 during the development phase, this application uses an open-source Generative AI model internally. This allows it to function completely disconnected from the internet—making it ideal even for enterprises with strict security requirements.

 

So, what are your thoughts?

An application with this rich feature set and a high-precision machine learning model was completed entirely with no-code. I didn't write a single line of code this time.

Opus 4.5 was truly impressive; the process never stalled due to syntax errors or similar issues. I can genuinely feel that the accuracy is on a completely different level compared to just six months ago. moving forward, it seems likely that "agentic coding" will become the standard starting point for creating new machine learning models and GenAI apps. It feels like PoC-level projects could now be knocked out in a matter of days.

I’m looking forward to building many more things. That’s all for today.

Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

What Awaits Us in 2026? Bold Predictions for AI Agents & Machine Learning

Happy New Year!

As we finally step into 2026, I am sure many of you are keenly interested in how AI agents will develop this year. Therefore, I would like to make some bold predictions by raising three key points, while also considering their connection to machine learning. Let's get started.

 

1. A Dramatic Leap in Multimodal Performance

I believe the high precision of the image generation AI "Nano Banana Pro (1)," released by Google on November 20, 2025, likely stunned not just AI researchers but the general public as well. Its ability to thoroughly grasp the meaning of a prompt and faithfully reproduce it in an image is magnificent, possessing a capability that could be described as "Text-to-Infographics."

Furthermore, its multilingual capabilities have improved significantly, allowing it to perfectly generate Japanese neon signs like this: "明けましておめでとう 2026" (Happy New Year 2026)

"明けましておめでとう 2026" (Happy New Year 2026)

This model is not a simple image generation AI; it is built on top of the Gemini 3 Pro frontier model with added image generation capabilities. That is why the AI can deeply understand the user's prompt and generate images that align with their intent. Google also possesses AI models like Genie 3(2) that perform simulations using video, leading the industry with multimodal models. We certainly cannot take our eyes off their movements in 2026.

 

2. The Explosive Popularity of "Agentic Coding"

Currently, coding by AI agents—"Agentic Coding"—has become a massive global movement. However, for complex code, it is not yet 100% perfect, and human review is still necessary. Additionally, humans still need to create the Product Requirement Document (PRD), which serves as the blueprint for implementation.

I have built several default prediction models used in the financial industry, and I always feel that development is more efficient when the human side first creates a precise PRD. By doing so, we can largely entrust the actual coding to the AI agent. This is an example of default prediction model.

However, the speed of evolution for frontier models is tremendous. In the latter half of 2026, we expect updates like Gemini 4, GPT-6, and Claude 5, and frankly, it is difficult to even imagine what capabilities AI agents will acquire as a result.

Alongside the progress of these models, the toolsets known as "code assistants" are also likely to significantly improve their capabilities. Tools like Claude Code, Gemini CLI, Cursor, and Codex have become indispensable for programmers today, but in 2026, these code assistants will likely play an active role in fields closer to business, such as machine learning and economic analysis.

At this point, calling them "code assistants" might be off the mark; a broader name like "Thinking Machine for Business" might be more appropriate. The day when those who don't know how to code can master these tools may be close at hand. It is very exciting.

 

3. AI Agents and Governance

As mentioned above, it is predicted that in 2026, AI agents will increasingly permeate large organizations such as corporations and governments. However, there is one thing we must be careful about here.

The behavior of AI agents changes probabilistically. This means that different outputs can be produced for the same input, which is vastly different from current systems. Furthermore, if an AI agent possesses the ability for Recursive Self-Improvement (updating and improving itself), it means the AI agent will change over time and in response to environmental changes. In 2026, we must begin discussions on governance: how do we structure organizational processes and achieve our goals using AI agents that possess characteristics unlike any previous system? This is a very difficult theme, but I believe it is unavoidable if humanity is to securely capture the benefits and gains from AI agents. I previously established corporate governance structures in the financial industry, and I hope to contribute even a little based on that experience.

 

What did you think? It looks like AI evolution will accelerate even further in 2026. I hope we can all enjoy it together. I look forward to another great year with you all.

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Introducing Nano Banana Pro, Google, Nov 20, 2025
2) Genie 3: A new frontier for world models, Jack Parker-Holder and Shlomi Fruchter, Google DeepMind, August 5, 2025

Copyright © 2026 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Gemini 3 Flash: The Multi-modal Powerhouse Dominating the 2026 AI Scene!

Gemini 3 Flash (1) — likely the final major AI model debut of 2025 — is currently making waves. Despite being positioned as an affordable, mid-tier model, its performance is reportedly on par with flagship models. Today, I want to put Gemini 3 Flash to the test and see just how much its multimodal capabilities have evolved. Let’s dive right in.

 

1. App Development

To conduct our experiments, I wanted to create a simple application using Google AI Studio. By simply entering a prompt into the interface, the app was ready in an instant. No Python was used at all. This level of accessibility means even non-engineers can build functional apps now. Things have truly become incredibly convenient.

 

2. Object Counting

First, I challenged the model with a task that has historically been difficult for AI: counting objects. I asked the AI to count the number of cans and cars in an image. I counted them myself as well, and the AI’s response was spot on. At this level of accuracy, we might no longer need specialized object detection models for general tasks.

 

3. Economic Analysis from Charts

Next, let’s try a task that requires a higher level of intelligence: interpreting economic indicators from charts and generating an analytical report. Japan has entered a super-aging society faster than any other developed nation, and the labor force is steadily declining. For this test, I provided charts for the labor force population, unemployment rate, and Manufacturing Sector hourly wages. I then instructed the AI to read these charts, synthesize the data, and produce a comprehensive analysis.

labor force population

unemployment rate

                Manufacturing Sector hourly wages

In 30 seconds, the economic report was generated. Below is an excerpt. I was genuinely impressed by the depth of analysis derived from just three charts. Gemini 3 Flash is truly formidable!

 

Conclusion

What do you think? Gemini 3 Flash is a fantastic value, being significantly cheaper than rival flagship models. Given that its multimodal performance is top-tier, I believe this will become the "go-to" model for many users. For AI startups like ours, having a model that allows for extensive experimentation with high token volumes without breaking the bank is incredibly reassuring. I highly recommend giving it a try!

Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!


1) Gemini 3 Flash: frontier intelligence built for speed, Dec 17, 2025, Google

Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Improving ML Vibe Coding Accuracy: Hands-on with Claude Code's Plan Mode

2025 was a year where I actively incorporated "Vibe Coding" into machine learning. After repeated trials, I encountered situations where coding accuracy was inconsistent—sometimes good, sometimes bad.

Therefore, in this experiment, I decided to use Claude Code "Plan Mode" (1) to automatically generate an implementation plan via an AI agent before generating the actual code. Based on this plan, I will attempt to see if a machine learning model can be built stably using "Vibe Coding." Let's get started!

 

1. Generating an Implementation Plan with Claude Code "Plan Mode"

Once again, I would like to build a model that predicts in advance whether a customer will default (on a loan, etc.). I will use publicly available credit card default data (2). For the code assistant, I am using Claude Code, and for the IDE, the familiar VS Code.

To provide input to the Claude Code AI agent, I summarized the task and implementation points into a "Product Requirement Document (PRD)." This is the only document I created.

I input this PRD into Claude Code "Plan Mode" and instructed it to: "Create a plan to create predictive model under the folder of PD-20251217".

Within minutes, the following implementation plan was generated. Comparing it to the initial PRD, you can see how refined it is. Note that I am only showing half of the actual plan generated here—a truly detailed plan was created. I can only say that the ability of the AI agent to envision this far is amazing.

 

2. Beautifully Visualizing Prediction Accuracy

When this implementation plan is approved and executed, the prediction model is generated. Naturally, we are curious about the accuracy of the resulting model.

Here, it is visualized clearly according to the implementation plan. While these are familiar metrics for machine learning experts, all the important ones are covered and visualized in an easy-to-understand way, summarized as a single HTML file viewable in a browser.

The charts below are excerpts from that file. It includes ROC curves, SHAP values, and even hyperparameter tuning results. This time, the total implementation time was about 10 minutes. If it can be generated automatically to this extent in that amount of time, I’d rather leave it to the AI agent.

 

3. Meta-Prompting with Claude Code "Plan Mode"

A Meta-Prompt refers to a "prompt (instruction to AI) used to create and control prompts."

In this case, I called Claude Code "Plan Mode" and instructed it to "generate an implementation plan" based on my PRD. This is nothing other than executing a meta-prompt in "Plan Mode."

Thanks to the meta-prompt, I didn't have to write a detailed implementation plan myself; I only needed to review the output. It is efficient because I can review it before coding, and since that implementation plan can be viewed as a highly precise prompt, the accuracy of the actual coding is expected to improve.

To be honest, I don't have the confidence to write the entire implementation plan myself. I definitely want to leave it to the AI agent. It has truly become convenient!

 

How was it? Generating implementation plans with Claude Code "Plan Mode" seems applicable not only to machine learning but also to various other fields and tasks. I definitely intend to continue trying it out in the future. I encourage everyone to give it a challenge as well.

That’s all for today. Stay tuned!




You can enjoy our video news ToshiStats-AI from this link, too!

1) How to use Plan Mode,  Anthropic

2) Default of Credit Card Clients








Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Can You "Vibe Code" Machine Learning? I Tried It and Built an App

2025 was the year the coding style known as "Vibe Coding" truly gained mainstream acceptance. So, for this post, I conducted an experiment to see just how far we could go in building a machine learning model using only AI agents via "Vibe Coding"—with almost zero human programming involved. Let's get started!

 
  1. The Importance of the "Product Requirement Document" for Task Description

This time, I wanted to build a model that predicts whether bank loan customers will default. I used the publicly available Credit Card Default dataset (1).

In Vibe Coding, we delegate the actual writing of the program to the AI agent, while the human shifts to a reviewer role. In practice, having a tool called a "Code Assistant" is very convenient. For this experiment, I used Google's Gemini CLI. For the IDE, I used the familiar VS Code.

Gemini CLI

To entrust the coding to an AI agent, you must teach it exactly what you want it to do. While it is common to enter instructions as prompts in a chatbot, in Vibe Coding, we want to use the same prompts repeatedly, so we often input them as Markdown files.

It is best to use what is called a "Product Requirement Document (PRD)" for this content. You summarize the goals you want the product to achieve, the libraries you want to use, etc. The PRD I created this time is as follows:

PRD

By referencing this PRD and entering a prompt to create a default prediction model, the model was built in just a few minutes. The evaluation metric, AUC, was also excellent, ranging between 0.74 and 0.75. Amazing!!

 

2. Describing the Folder Structure with PROJECT_SUMMARY

It is wonderful that the machine learning model was created, but if left as is, we won't know which files are where, and handing it over to a third party becomes difficult.

Therefore, if you input the prompt: "Analyze the current directory structure and create a concise summary that includes: 1. A tree view of all files 2. Brief description of what each file does 3. Key dependencies and their purposes 4. Overall architecture pattern Save this as PROJECT_SUMMARY.md", it will create a Markdown file like the one below for you.

PROJECT_SUMMARY.md

With this, anyone can understand the folder structure at any time, and it is also convenient when adding further functional extensions later. I highly recommend creating a PROJECT_SUMMARY.md.

 

3. Adding a UI and Turning the ML Model into an App

Since we built such a good model, we want people to use it. So, I experimented to see if I could build an app using Vibe Coding as well.

I created PRD-pdapp.md and asked the AI agent to build the app. I instructed it to save the model file and to use Streamlit for app development. The actual file and its translation are below:

PRD-pdapp.md

When executed, the following app was created. It looks cool, doesn't it?

You can input customer data using the boxes and sliders on the left, and when you click the red button, the probability of default is calculated.

  • Customer 1: Default probability is 7.65%, making them a low-risk customer.

  • Customer 2: Default probability is 69.15%, which is high, so I don't think we can offer them a loan. The PAY_0 Status is "2", meaning their most recent payment status is 2 months overdue. This is the biggest factor driving up the default probability.

As you can see, having a UI is incredibly convenient because you can check the model's behavior by changing the input data. I was able to create an app like this using Vibe Coding. Wonderful.

 

How was it? It was indeed possible to perform machine learning using Vibe Coding. However, instead of programming code, you need to create precise PRDs. I believe this will become a new and crucial skill. I encourage you all to give it a try.

That’s all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!

1) Default of Credit Card Clients

 



Copyright © 2025 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

The OpenAI Code Red: What’s Next for the Generative AI Market?

In late November 2022, OpenAI released ChatGPT. It has been three years since then, and just as it was about to celebrate its third birthday, an event occurred that dampened the celebratory mood. CEO Sam Altman declared a "CODE RED" (Emergency) (1). The driving force behind this was the breakthrough of the new generative AI, "Gemini 3" (2), released by Google on November 18. Today, I would like to delve into this theme and forecast the generative AI market for 2026. Let’s get started.

 

1. Gemini 3 vs. GPT-5

On August 6, 2025, OpenAI released GPT-5. Since it was the first major update since GPT-4, people had very high expectations. However, in reality, it was difficult to perceive a significant difference compared to other models. Although it managed to update scores across various benchmarks, the impression was that its impact felt somewhat muted compared to the arrival of GPT-4.

Of course, it is evolving steadily, so if rival companies' models had remained stagnant, I believe it could have celebrated its third birthday peacefully. However, the moves made by its rival, Google, surpassed our expectations. On November 18, 2025, Gemini 3 was released, and everyone was astonished by its high performance. Its scores in almost all benchmarks surpassed those of GPT-5, and for the first time since the birth of ChatGPT, GPT-5 lost its "technological competitive advantage." The battle surrounding generative AI has entered a new phase.

 

2. Why Gemini 3 is Particularly Superior

There are several technical talking points, but what I am paying special attention to is its high capability in image processing and generation. As shown in the leaderboard (3) below, its strength is overwhelming and unrivaled. The famous image generation app Nano Banana Pro is officially named Gemini 3-Pro-Image, and its high scores truly stand out.

                        Leaderboard

When considering individual customers, the ability to easily generate and edit images exactly as envisioned is crucial and can serve as a "killer app." I feel that once individuals experience the technical level of Gemini 3, they will find it difficult to easily switch back to competitor apps. The image below was generated using Nano Banana Pro. As you can see, it has become easy to render both English and Japanese text together on an image. Previously, Japanese text was often incomplete or incomprehensible, so it was quite moving to see clean Japanese generated for the first time.

                   Image generated by Nano Banana Pro

 

3. The Generative AI Market in 2026

With Sam Altman issuing a CODE RED, I believe OpenAI will allocate significant development resources to improving the model itself and will frantically work to close this gap in the image generation field. On the other hand, Google, armed with Gemini 3, possesses several multimodal generative AI models beyond just Nano Banana Pro, and I expect them to leverage that expertise to aim for further breakthroughs.

In particular, generative AI capable of simulation using 3D structures—known as World Models—will likely influence Large Language Models (LLMs) as well, solidifying Google's competitive advantage. One has to admit that Google, which owns YouTube, is incredibly strong in this field. It looks like 2026 will be a year where we cannot take our eyes off how OpenAI launches its counterattack.

 

How was it? While there are several other players creating generative AI, I believe the industry style will involve companies defining their own positions within the context of the "OpenAI vs. Google" battle. Therefore, the outcome of OpenAI vs. Google is extremely important for all AI-related companies. I would like to write another blog post on this same theme if the opportunity arises.

That’s all for today. Stay tuned!









You can enjoy our video news ToshiStats-AI from this link, too!


1) Sam Altman’s ‘Code Red’ Memo Urges ChatGPT Improvements Amid Growing Google Threat, Reports Say, Forbes, 2 Dec 2025
2) A new era of intelligence with Gemini 3, Google, 18 Nov 2025
3)  Leaderboard Overview





Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Game Changer: How Nano Banana Pro is Redefining Digital Marketing!

Just fresh off the heels of last week's new model release, Google has debuted yet another new image generation model: Nano Banana Pro (Gemini 3 Pro Image). Rumors on the street say it boasts incredible performance. So, let's dive in and test it out to see its potential capabilities.

 

1. The Latest Tokyo Fashion Trends

Fashion evolves with every season, and keeping up with the trends can be a challenge. However, the internet is overflowing with the latest style information. I figured that by feeding this real-time data into generative AI, we could generate images of models wearing the styles currently in vogue. Let's give it a try. Below is the original image of the model. She is wearing an outfit typical of Japanese autumn.

Original Image

I fed this original image and the prompt "Perform Google Search for current Tokyo fashion trends for 20s lady and apply that style to the model in the attached photo. 4 images are needed." into Nano Banana Pro.

Generated Images

The same model appears in all four images, maintaining consistency. Furthermore, the latest fashion trends have been incorporated thanks to Google Search. This is wonderful. Nano Banana Pro's Grounding feature using Google Search is excellent. As the model updates in the future, we can expect the accuracy of capturing trendy fashion to improve even further.

 

2. Creating a Signature Cafe Menu

Next, I want to devise a set menu featuring shortcake and coffee for opening a cafe in Ashiya, a high-end residential area in Japan. For this one too, I prepared a prompt to generate the image after researching currently popular cakes using Google Search.

"I am opening a cafe in Ashiya, Japan, featuring a fruit shortcake and coffee set as the signature dish. Use Google Search to identify current cake trends in Ashiya City. Then, create a high-quality menu image for this set that includes a description and price in English, incorporating the local trends."

I generated the following Japanese and English versions of the menu.

English Version

Japanese Version

Both the Japanese and English text are perfect. I think this is a huge leap forward, especially since AI image generation has struggled to correctly render local languages like Japanese until now. I’m sure it will work well with other local languages too. It looks like Nano Banana Pro will be able to perform globally, regardless of language.

 

3. 3D Visualization of Loss Functions

Raising the abstraction level a bit, I want to execute a 3D visualization of a loss function—a topic often discussed when building targeting models for marketing—and clearly explain the concept of the gradient descent method. Nano Banana Pro can understand even theoretical and highly abstract phenomena like loss functions and map them in 3D. Below is the result. You can see at a glance how the parameters get stuck in a local minimum and cannot reach the point where the loss function is at its global minimum. Amazing.

Gradient Descent Method

 

How was it? Even from these few experiments, the excellence of Nano Banana Pro is clear. I have a hunch that Nano Banana Pro is going to change the very methods of digital marketing. I felt particularly strong potential in the Grounding feature using Google Search. I plan to cover Nano Banana Pro again in the near future.

That’s all for today. Stay tuned!

 



You can enjoy our video news ToshiStats-AI from this link, too!

 

1) Introducing Nano Banana Pro, Google, 20 Nov 2025







Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.













Google Antigravity: The Game Changer for Software Development in the Agent-First Era

Google has unveiled Gemini 3.0, its new generative AI, and "Antigravity" (1), a next-gen IDE powered by it. Google states that "Google Antigravity is our agentic development platform, evolving the IDE into the agent-first era," signaling a shift toward truly agent-centric development. Here, I’m going to task Antigravity with creating a "Bank Complaint Classification App." I want to actually run it to explore its potential.

                   Antigravity

 

1.Agentic Development with Antigravity

Antigravity is built on top of VS Code. If you are a VS Code user, the editor will look familiar, making it very approachable and easy to pick up. However, the real power of Antigravity lies in its dedicated interface for agentic development: the Agent Manager (shown below). Just enter a prompt into the box and run it to kick off "Vibe Coding." The prompt shown here is the very simple one I entered at the beginning of the development process. Antigravity also appears to be packed with various features designed to facilitate efficient communication with the Agent. For more details, please check the website (1).

                         Agent Manager

 

2. Prompt Refinement and Improvement

Just because you start "Vibe Coding" doesn't mean you'll get perfect code immediately. I started with a simple prompt this time as well, but the process proved to be more challenging than anticipated. While Gemini 3.0 Pro often demonstrates human-level capability when handling HTML and CSS for website building, the framework used for this app—Google ADK—is a brand-new agent development kit that just debuted in April 2025. Consequently, there are likely very few code examples available on the web, and I assume it hasn't been fully absorbed into Gemini 3.0's training data yet.

               Development with Google ADK

It was quite a struggle, but as shown above, I managed to build a fully functional app via "Vibe Coding." To generate these files, I relied solely on natural language instructions; I didn't write a single line of code directly in the editor. However, I did include simple code snippets within the prompts. This is a technique known as "few-shot learning," where you provide examples to guide the model. I believe this approach is highly effective when Vibe Coding with Gemini 3.0 for Google ADK development. While this might become unnecessary as Gemini 3 is updated in the future, it’s certainly a technique worth remembering for now.

Bank Complaint Classification App using Google ADK

The screenshot above shows the "Bank Complaint Classification App" I developed. I verified its accuracy with some simple examples, and the results were excellent. It seems the internal prompts within the app were generated very effectively. Impressive work!

 

3. Summary of Building a Complaint Classification App with ADK

  • Total Time: 6 hours (starting from the Antigravity installation) to complete the app.

  • Execution: With the finalized prompt, the run time is just over a minute.

  • Manual Effort: The actual coding for Google ADK to make the app is only about a 20-minute task if done manually without vibe-coding.

  • Reasons for the Delay:

    • I had to iterate on the prompts several times because Gemini 3 is still unfamiliar with Google ADK

    • I had to explicitly instruct it on file structures and code syntax.

    • I was also using Antigravity for the first time.

  • Conclusion: It is manageable once you understand Gemini 3 Pro's behavior regarding Google ADK.

 

So, what do you think?

It took a little longer because I wasn't used to the new IDE yet, but the combination of Gemini 3.0 Pro and Antigravity was outstanding. I could really feel its high potential. Since the execution speed itself is fast, next time I plan to challenge myself by "Vibe Coding" a multi-agent app. Look forward to it! That's all for today. Stay tuned!

 

You can enjoy our video news ToshiStats-AI from this link, too!



1) Experience liftoff with the next-generation IDE, Google,  19 Nov 2025







Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

OpenAI & MUFG : A Strategic Collaboration Poised to Reshape the Future of Finance

On November 12, 2025, OpenAI and MUFG (Mitsubishi UFJ Financial Group, Inc.), one of Japan's three largest financial groups, announced a strategic collaboration (1). As this content has the potential to transform Japan's financial sector, I'd like to share the key points from the news release along with my own analysis. Let's get started!

 

1. Business Transformation Utilizing AI

"Beginning in January 2026, all approximately 35,000 employees of MUFG Bank will use ChatGPT Enterprise in their daily operations." This is a significant step forward in transforming the subsidiary bank into an AI-native organization. It's presumed that OpenAI and MUFG, having already collaborated for over a year, have accumulated considerable expertise in applying AI to banking operations. If they can unlock the full potential of the generative AI GPT-5 through ChatGPT Enterprise, the impact on their business processes is expected to be substantial.

                   ChatGPT Enterprise

 

2. Talent Development

"Furthermore, to accelerate the company-wide adoption of AI, the two companies will establish a project team. They will collaborate on training specialized personnel, or 'AI Champions,' who can drive AI utilization and organizational reform. This will be supported by providing education, training programs, and support for MUFG's company-wide AI adoption campaign, 'Hello, AI @MUFG.'" As this indicates, talent development is essential for embedding AI within the company. While GPT-5 is highly capable, it cannot completely replace human abilities. Collaboration between AI and humans remains indispensable. There is no fixed methodology for how we communicate with AI to achieve our goals; I believe this will continue to be a process of trial and error.

 

3. Creating Innovative Customer Experiences in Retail

"We will install an 'AI Concierge' equipped with the latest AI into the apps provided by MUFG's group companies. This will go beyond simply answering questions to provide personalized support that becomes more tailored with use. In the future, data from each app will be integrated, enabling the AI to grasp the customer's entire transaction history and offer precise suggestions from any app. The first implementation is planned for the digital bank scheduled to launch next fiscal year, with the aim of creating an AI-native digital bank." Of the various retail measures, this "AI Concierge for personalized support" is particularly striking. I believe that without accurately recorded past transaction histories and conversations, providing relevant support is impossible. The entry of Japan's largest financial group into the "AI Concierge" space holds great significance for the financial industry. I'm looking forward to trying it myself.

 

4. Participation in the OpenAI Ecosystem

"We will explore integration with 'Apps in ChatGPT,' which OpenAI announced in October. By connecting MUFG's group company apps and services to ChatGPT's framework, we aim to offer a new financial experience where customers can naturally discuss household financial management and asset investment tailored to their situation, all within the flow of a conversation with ChatGPT." This can be interpreted as MUFG's medium-to-long-term strategy to enter the OpenAI ecosystem. OpenAI is solidifying its position as a global portal to the internet and, from that base, has begun building an ecosystem to realize "Agentic Commerce." I believe MUFG is considering being one of the first in the world to take this leap. I'm excited to see how this unfolds.

 



What did you think? While it has only just been announced and details are still scarce, I feel the content clearly conveys the strong commitment from both companies. I am very excited to see how this "tag team" will change the future of finance in Japan and Asia. For those who wish to read the full content of this release, please see the original source (1). That's all for today. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!


1)Initiatives for AI-Driven Business Transformation and New Service Creation in the Retail Sector, 12 Nov 2025, Mitsubishi UFJ Financial Group, Inc. (MUFG) MUFG Bank, Ltd. (MUFG Bank)


Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

This Is What Happens When an AI Agent Runs Our 2025 Autumn Marketing!

Hello, the high temperature in Tokyo has dropped to 16°C, and it's starting to feel very much like autumn. For those unfamiliar with autumn in Japan, this is the season when the leaves on the mountains change from green to orange. The entire mountainside is dyed orange, creating a beautiful and spectacular view. Therefore, I decided to use orange as the background color for this marketing campaign's promotional video. The challenge is: "To devise a campaign to sell cakes to women in Ashiya, an affluent residential area in the Kansai region." What happens when we entrust this task to an AI agent? Let's find out.

 

1. Creating an AI Marketing Agent with "Google Opal"

This time, I'm creating an AI marketing agent using Google Opal (1). As the description says, "Opal, our no-code AI mini-app builder," you can easily develop an AI agent app like the one below.

For this AI agent's development, I only entered the following prompt: "You are an expert in marketing campaigns. You will be given the following information: 1. The product/service to sell, 2. The target customer, 3. The location/region, 4. The time/season of the campaign, 5. The desired brand image color, 6. A photo of the facilitator. Using this information, please create the following: a. A marketing strategy, b. A marketing campaign name, c. A logo based on the name, d. A promotional video featuring the facilitator, complete with BGM."

Just by executing this, you can create a workflow like the one shown above using the AI agent. After that, you just switch to the app and answer questions related to your task, and the marketing campaign is created. Amazing, isn't it!

 

2. Marketing Strategy and Logo

Once you input all the necessary information, you get the results back immediately. First is the marketing strategy. In reality, a more detailed discussion followed. This time, I'll just introduce the beginning. Even though I didn't input very detailed information about the campaign at the initial stage, I think this marketing strategy is well-done.

                  Marketing Strategy

Next is the marketing campaign name and logo. What it generated was a cool, French-style logo. I'd love to try using it sometime.

          Logo

 

3. Three Short Promotional Videos

First, I provide the AI agent with a base image of a woman. Then, using this image as a starting point and based on the created marketing strategy, an approximately 8-second short video is generated. It's exciting to see what kind of video the AI agent will produce. This time, it created three videos with BGM. All of them are based on the theme of "Autumn Cakes." It's hard to pick a winner; they are all excellent. After actually creating the videos, I felt that even 8 seconds is enough to convey the image clearly. Which one did you like the best?

 

What did you think? Although this was just a demo AI agent, I was astonished at what it could accomplish with no code, no programming. It seems like it will become a powerful ally for marketers. Of course, there are limitations, but what I created this time can be done for free with just a Google account. I highly recommend giving it a try. ToshiStats will continue to share more about AI agents. Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1) Opal is now available in more than 160 countries, Google, 7 Nov 2025

Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.