Toshifumi Kuga

April 25, 2026

Auto Mode, agentic coding, Anthropic, claude code

Opus 4.7’s Auto Mode: The Secret Weapon for Boosting Productivity

Toshifumi Kuga

April 25, 2026

Auto Mode, agentic coding, Anthropic, claude code

Anthropic has released the frontier generative AI model, Opus 4.7. This update comes just over two months after the release of Opus 4.6, highlighting the accelerating pace of technological progress. In this article, I will dive deep into the remarkable new feature added alongside Opus 4.7, "Auto Mode," by utilizing it to build a machine learning model for credit default prediction.

1. What is Auto Mode?

Boris Cherney, the developer of Claude Code—an Agentic coding development environment—commented on "Auto Mode" as follows:

Auto mode = no more permission prompts
In the past, you either had to babysit the model while it did these sorts of long tasks, our use--dangerously-skip-permissions.We recently rolled out auto mode as a safer alternative. In this mode, permission prompts are routed to a model-based classifier to decide whether the command is safe to run. If it'ssafe, it's auto-approved.

In short, this feature reduces the frequency of "Please approve" requests that appear during long agentic coding sessions, thereby boosting productivity. For someone like me, who handles dozens of these approval requests daily, this is a very welcome addition.

You can verify the "Auto Mode" status via the indicator at the bottom left of the Claude Code interface.

When you first enable it, a notice will appear; I recommend giving it a thorough read.

2. Building a Default Prediction Model with Auto Mode

I used Claude Code’s "Auto Mode" to actually build a default prediction model. For this project, I used data from Home Credit Default Risk competition(2) at Kaggle .

First, I created an implementation plan using Plan Mode. Through dialogue with Claude Code, a structured plan was established.

At this stage, Claude Code asks, "Would you like to use Auto Mode?" and answering "Yes" initiates the process.

The Implementation Process: I watched to see how many approval requests would appear before completion.

After approximately 90 minutes, the system announced, "Finished." Remarkably, not a single approval request was triggered. This makes the work significantly easier and the implementation process much more enjoyable.

Accuracy Validation: I checked the evaluation metric on Kaggle. The result was an AUC = 0.79632. This is my personal best for a single model without using ensembles. It ranks within the top 4.2% of the competition. Achieving this score without any manual intervention after the initial planning phase is truly astonishing.

3. Auto Mode and Productivity in Data Analysis

While Auto Mode makes implementation effortless, its true power lies elsewhere. Because the frequency of approval requests has decreased so dramatically, it is now feasible to work with parallel computing—building multiple models simultaneously.

Whether in Kaggle competitions or practical business scenarios, we are often required to improve accuracy within a limited timeframe. If parallel computing becomes this easy, increasing productivity by 5x to 10x is no longer just a dream. It is a challenge well worth taking.

Conclusion

Auto Mode has simplified parallel computing and opened a new path toward enhanced productivity. At ToshiStats, we will continue to explore case studies using Auto Mode.

Stay tuned!

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) https://x.com/bcherny/status/2044847848035156457, Boris Cherney, Anthropic
2) Home Credit Default Risk, kaggle

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

Toshifumi Kuga

April 3, 2026

harness, agentic coding, claude code, Anthropic

Navigating the Evolution of Generative AI: Insights from Anthropic

Toshifumi Kuga

April 3, 2026

harness, agentic coding, claude code, Anthropic

Every week, a variety of generative AI updates are released, and it feels as though this pace will only continue to accelerate. On the other hand, many people may be feeling lost, wondering how exactly they should navigate these changes. Therefore, in this post, I would like to explore some hints from Anthropic's technical blog (1).

1. Experiments at Anthropic

Mr. Prithvi Rajasekaran from the Labs team has provided a detailed report on several implementation experiments.

The experiments consisted of three projects: front-end design development, full-stack 2D retro game development, and Digital Audio Workstation (DAW) development. This time, I would like to focus specifically on the full-stack 2D retro game development. Through various development and implementation processes, they observed cases where long-running agentic coding failed. A common factor was that the AI often overestimated incomplete implementations, judging them to be at a sufficient level when they were actually still unfinished. They believed that unless this was improved, it would be impossible to achieve satisfactory results in long-running agentic coding.

2. The Key Technology for Success

To address this, a "harness" design consisting of a pair of a Generator and an Evaluator was introduced. This was reportedly inspired by a technology well-known in image generation called Generative Adversarial Networks (GANs). For more details, please see below. In short, the model does not evaluate its own work.

A loop was established between the Generator and the Evaluator, where flawed implementations were subjected to rigorous criticism. Naturally, this took a significant amount of time, and costs jumped by 20 times. However, the quality improved even more than the cost suggested. The return on investment was clearly sufficient.

**Performance Comparison: Single Agent vs. Full Harness**

3. Gains from the Update from Opus 4.5 to 4.6

While the AI engineers were continuing to refine the harness, an update for the generative AI model, Opus, was released, moving the version from 4.5 to 4.6. The performance improvement in Opus 4.6 was remarkable, and as a result, part of the harness that had been necessary for Opus 4.5 became redundant. This allowed the implementation to become simpler. Fantastic! Please see the chart below for details. In the V2 harness, a portion of V1 has indeed been removed.

Based on this experience, the blog describes the following lessons:

“the better the models get, the more space there is to develop harnesses that can achieve complex tasks beyond what the model can do at baseline.”

“From this work, my conviction is that the space of interesting harness combinations doesn't shrink as models improve. Instead, it moves, and the interesting work for AI engineers is to keep finding the next novel combination.”

In other words, I believe this means: "As the capabilities of generative AI improve, the number of things that can be solved by a standalone baseline model increases, making parts of existing harnesses unnecessary. However, as the capability of the baseline model rises, tasks that were previously unreachable become solvable by improving the harness design." If the things we can do with new generative AI models continue to increase, our opportunities for harness design will also grow, and it looks like we will be kept quite busy.

What did you think? As the capabilities of generative AI rise, it is expected that new harness designs will be required to push those capabilities to their limits. It seems there will be plenty to do, at least until AGI is realized. ToshiStats will continue to feature harness designs, which are the key to improving the accuracy of AI agents. Stay tuned!

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) Harness design for long-running application development, Engineering at Anthropic. Mar 24, 2026

Toshifumi Kuga

March 29, 2026

agentic coding, AI agent, artificial intelligence

The Secret Sauce for Mastering Agentic Coding !

Toshifumi Kuga

March 29, 2026

agentic coding, AI agent, artificial intelligence

Since the beginning of this year, we've been hearing a lot about "agentic coding"—where AI agents handle the coding—everywhere. While we no longer write programs ourselves and instead focus entirely on giving instructions to AI agents via prompts, many people likely find themselves wondering, "What exactly should I learn to write good prompts?" So, today, I'd like to explore this topic using an experiment conducted at ETH Zurich as our guide.

1. Overview of the Experiment

The reference for this discussion is the paper titled "Computer Science Achievement and Writing Skills Predict Vibe Coding Proficiency (1)." They gathered 100 students who first took tests to measure their writing skills, computer science achievement, and general cognitive abilities. I've summarized these three foundational skills below.

Afterward, to measure their "agentic coding" proficiency, the participants reviewed a sample application, drafted prompts for an LLM-based agent, tested the generated application, and then further refined it. The final applications were evaluated by human graders.

This process reveals the relationship between the three foundational skills and agentic coding proficiency.

2. As Expected, Computer Science Skills Mattered

As the results below show, computer science skills were most strongly correlated with agentic coding proficiency, showing a correlation coefficient of 0.39. Writing skills also showed a significant correlation, with a coefficient of 0.29. Here is a summary of the results.

‍　　‍　　　　　　　　Skills Correlated with Agentic Coding Proficiency

Now, some of you might find this a bit puzzling. Computer science skills are primarily centered around programming, whereas in agentic coding, humans don't actually write code directly. So, why did computer science skills show such a high correlation? The research paper explains it as follows:

"It may have contributed through problem decomposition or mental models of control flow and state."

It's certainly true that people hone these kinds of abilities through the practice of programming. If that's the case, it makes perfect sense that individuals with strong computer science skills would perform well, even in natural language-driven agentic coding.

3. How Those with No Programming Experience Can Become Excellent Agentic Coders

Based on our discussion so far, I'd like to explore a new approach on "how people with no programming experience can become excellent agentic coders." As agentic coding becomes more widespread, it might be inevitable that the incentive to learn traditional programming will fade. However, the following skills are still absolutely essential for mastering agentic coding:

The ability to decompose tasks
The ability to understand system flows
The ability to expand your vocabulary and accurately define requirements in writing

For those without programming experience, deliberately focusing on and studying these specific points alongside your regular prompt writing practice will likely accelerate your improvement. This is something you can start doing right away today. I highly recommend it!

What do you think? While we focused on "agentic coding" today, the insights we've gained go far beyond just "coding"—they can be seen as universal skills for unlocking the true potential of AI agents. As AI agents become integrated into various fields in the future, these skills will essentially become mandatory subjects for all of us. Here at ToshiStats, we will continue to discuss the collaboration between business professionals and AI agents. Stay tuned!

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) Computer Science Achievement and Writing Skills Predict Vibe Coding Proficiency, Sverrir Thorgeirsson, Theo B. Weidmann, Zhendong Su. 14 Mar 2026

Toshifumi Kuga

February 20, 2026

Agent Skills, agentic coding, claude code, Opus4.6, Loan Payback

Predicting Loan Payback through "Agent Skills": The New Standard for Enterprise AI

Toshifumi Kuga

February 20, 2026

Agent Skills, agentic coding, claude code, Opus4.6, Loan Payback

The most common complaint about AI agents in business? 'The output isn't what I wanted.' In a corporate landscape, consistency is everything—without pre-defined formats, users get lost. Instead of just teaching everyone to prompt better, why not embed that expertise into the organization itself? By providing standardized prompts upfront, users get perfect results from day one. The secret to this is 'Agent skills' (1). Let’s see how it works!

1. What are Agent Skills?

Announced as "skills" by the AI giant Anthropic in October 2025, Agent Skills have since been adopted by almost every major AI company. They have become the de facto standard for providing domain-specific knowledge to generative AI. According to Anthropic:

“Agent Skills are modular capabilities that extend Claude's functionality. Each Skill packages instructions, metadata, and optional resources (scripts, templates) that Claude uses automatically when relevant.”

The beauty of defined Agent Skills is their portability—once created, they can be used across different platforms.

2. Creating Agent Skills

Now, let's dive right in. I’m going to create an 'Agent Skill' using Claude Cowork. I uploaded the PRD (Product Requirements Document) I typically use for building prediction models and input the following prompt.

Since Claude Cowork has a built-in skill creator, it automatically generates an Agent Skills folder containing a skill.md file. This skill.md stores the most fundamental information for the Agent Skill, and its header always includes the following content. AI agents like Claude Code are designed to read this section first.

For tasks related to predictive modeling, the agent reads the specific implementation logic defined in the skill (which, in this case, spans about 240 lines) before moving to the coding phase.

3. Building a Prediction Model via Agent Skills

Next, I utilized Claude Code for agentic coding. As shown below, the "skills" we just created are active and recognized by the environment.

Because the detailed modeling process is already governed by the Agent Skill, my manual prompt can be as simple as: "Please create a prediction model." For this project, I used data from the Kaggle "Predicting Loan Payback" competition (2), where the goal is to predict whether a borrower will repay their loan. The entire implementation was completed in about two hours with almost no manual corrections. The stability of Opus 4.6 (3) is truly remarkable!

The model achieved an AUC of 0.92435 on the Kaggle leaderboard—a score that is well within the range of practical, production-ready application.

One secret behind this high accuracy was the creation of new features based on ratios. By analyzing feature importance, we ensured only the most impactful variables were included in the final model.

4. Testing the Resulting Model

Let’s look at the model built via Agent Skills in action. First, we calculate the probability of repayment for an individual customer. In this example, the probability exceeds 96%, resulting in a "Success" (likely to repay) classification based on a 50% threshold. This threshold is, of course, adjustable depending on the specific business objectives.

To avoid the "black box" problem, I use SHAP analysis to explain why a customer received a specific score. As seen in the graph, the length of the red arrows indicates the contribution of each feature. Here, employment_status was the most significant factor driving the "Success" prediction. This transparency is crucial for corporate accountability.

We can also apply SHAP to the entire dataset. Again, employment_status emerges as the top contributor across all customers. We can see that this feature also carries a high degree of contribution across the entire customer base.

Furthermore, SHAP allows us to visualize the non-linear relationship between specific features and repayment probability. For example, with credit_score, the probability doesn't just rise linearly. The data shows that the probability remains flat until a score of 550, starts to rise at 600, and accelerates significantly after 700. This level of granular insight is what makes SHAP so valuable.

By using Agent Skills, you can embed entire libraries of domain knowledge directly into your AI’s workflow. These skills are reusable, portable, and—in my opinion—will soon be a requirement for any business using AI agents.

I look forward to seeing how Agent Skills continue to permeate the corporate world and what innovations they will trigger. TOSHI STATS Co. will continue to lead the way in this space.

Stay tuned!

You can enjoy our video news ToshiStats-AI from this link, too!

1) Agent Skills
2) Predicting Loan Payback, Yao Yan, Walter Reade, Elizabeth Park. Kaggle, 2025
3) Introducing Claude Opus 4.6, Anthropic, Feb 5 2026

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

February 15, 2026

Insurance Cross-Sell, agentic coding, claude code, Opus4.6

From Zero to Production: How Opus 4.6 Agentic Coding Revolutionizes Insurance Analytics

Toshifumi Kuga

February 15, 2026

Insurance Cross-Sell, agentic coding, claude code, Opus4.6

In the ever-evolving landscape of InsurTech, cross-selling is a literal goldmine. Utilizing Opus 4.6 and Agentic Coding, I have constructed a sophisticated "Insurance Cross-Sell Prediction Model" implementation pipeline, covering everything from memory-optimized data loading to complex feature engineering. Let’s dive in!

1. Agentic Coding with Opus 4.6

Unlike traditional coding, Agentic Coding with Opus 4.6 (1) allows the AI to function as an autonomous engineer. It goes beyond writing snippets; it manages directory structures, ensures memory efficiency for datasets of 11.5 million rows, and completes a production-ready Streamlit dashboard.

In this process, my role was simply to write the "Product Requirement Document (PRD)”—a document in natural language (Japanese or English) defining what I wanted to build. No Python knowledge was required on my part. By putting Claude Code into plan mode, an implementation blueprint is automatically generated, allowing me to verify the coding logic before Opus 4.6 executes it. While I monitored the progress, I never had to write a single line of code myself. Truly remarkable.

2. Project Overview

This project features a robust ecosystem designed for real-world application:

LightGBM + Optuna: Automated hyperparameter optimization to maximize AUC.
50 Ratio-Based Features: Generation of 50 unique indicators to capture hidden customer behavior patterns.
Explainability via SHAP: Implementation of SHAP values to visualize why a specific customer is likely to purchase.

The data was sourced from a Kaggle competition regarding automobile insurance cross-selling (2).

Performance Results: When evaluating the model built via Opus 4.6 Agentic Coding on the Kaggle leaderboard, it achieved a high score of AUC = 0.88343. This level of accuracy is more than sufficient for practical business use.

3. Key Features of the Implementation

The model provides two primary functions: individual customer prediction and total customer portfolio analysis.

Individual Prediction

We set the threshold for a "successful" cross-sell at a probability of 35% or higher. Below is an example of a customer predicted to be a successful cross-sell target. To avoid the "Black Box" problem, we use SHAP values to show the contribution of each feature. The larger the SHAP value, the higher its contribution to the positive prediction. This allows staff to understand the concrete reasoning behind the AI's decision.

Conversely, for customers predicted to fail (probability below 35%), the SHAP values indicate which factors are pulling the probability down.

Customer portfolio Analysis

We can also analyze the "Cross-Sell Success Rate" across an entire customer portfolio. In this demo, we imported a CSV of 30,000 customers. With the threshold set at 35%, the model identified 3,708 potential targets. By adjusting the threshold, marketing teams can narrow or broaden their focus for specific campaigns. The dashboard also displays the overall probability distribution across the entire dataset.

4. Business Impact

This high-precision model provides sales representatives with a prioritized "Hot Lead" list. Thanks to the Streamlit-based GUI, non-technical staff can execute batch predictions and verify the reasoning via SHAP instantly. This is the definition of Data-Driven Marketing.

Conclusion

The synergy between Opus 4.6 and human expertise is redefining the speed of machine learning development and implementation. The potential is, quite frankly, staggering. At TOSHI STATS, we will continue to explore innovations in this field.

Stay tuned!

1) Introducing Claude Opus 4.6, Anthropic, Feb 5 2026
2) Binary Classification of Insurance Cross Selling, Walter Reade and Ashley Chow, Kaggle

You can enjoy our video news ToshiStats-AI from this link, too!

Toshifumi Kuga

February 1, 2026

Genie 3, agentic coding, claude code

AGI in 2 Years or 5 Years? — Survival Strategies for 2030

Toshifumi Kuga

February 1, 2026

Genie 3, agentic coding, claude code

In January 2026, several interviews with CEOs of top AI labs were released. One particularly fascinating encounter was the face-to-face interview (1) between Anthropic CEO Dario Amodei and Google DeepMind CEO Demis Hassabis. I have summarized my thoughts on what their comments imply. I hope you find this insightful!

1. Will AGI Arrive Within 2 Years?

Dario seems to hold a more accelerated timeline for the realization of AGI. While prefixing his thoughts with "It is difficult to predict exactly when it will happen," he pointed to the reality within his own company: "There are already engineers at Anthropic who say they no longer write code themselves. In the next 6 to 12 months, AI might handle the majority of code development. I feel that loop is closing rapidly." He argued that AI development is hitting a flywheel effect, particularly noting that progress in coding and research is so remarkable that AI intelligence will surpass public expectations within a few short years.

A prime example is Claude Code, released by Anthropic last year. This revolutionary product is currently taking the software development world by storm. It is no exaggeration to say that the common refrain "I don’t code manually anymore" is a direct result of this tool. In fact, I recently used it to tackle a past Kaggle competition; I achieved an AUC of 0.79 with zero manual coding, which absolutely stunned me (3).

2. AGI is Still 5 Years Away

On the other hand, Demis maintains his characteristically cautious stance. He often remarks that there is a "50% chance of achieving AGI in five years." His reasoning is grounded in the current limitations of AI: "Today’s AI isn't yet consistently superior to humans across all fields. A model might show incredible performance in one area but make elementary mistakes in another. This inconsistency means we haven't reached AGI yet." He believes two or three more major breakthroughs are required, which explains his longer timeline compared to Dario.

Unlike Anthropic, which is heavily optimized for coding and language, Google is focusing on a broader spectrum. One such focus is World Models—simulations of the physical spaces we inhabit. In these models, physics like gravity are reproduced, allowing the AI to better understand the "real" world. Genie 3 (2) is their latest version in this category. While it has only been released in the US so far, I am eagerly anticipating its global rollout. The "breakthroughs" Demis mentions likely lie at the end of this developmental path.

3. Are We Prepared for AGI?

While their timelines differ, Dario and Demis agree on one fundamental point: AGI—which will surpass human capabilities in every field—is not far off. Exactly ten years ago, in March 2016, DeepMind’s AlphaGo defeated the world’s top Go professional. Since then, no human has been able to beat AI in the game of Go. Soon, we may reach a point where humans can no longer outperform AI in any field. What we are seeing in the world of coding today is the precursor to that shift.

It is a world that is difficult to visualize. Industrial structures will be upended, and the very role of "human work" will change. It is hard to say that we are currently prepared for this reality. In 2026, we must begin a serious global dialogue on how to adapt. I look forward to engaging in these discussions with people around the world.

I highly recommend watching the full interview with Dario and Demis. These two individuals hold the keys to our collective future. That’s all for today. Stay tuned!

1) The Day After AGI | World Economic Forum Annual Meeting 2026, World Economic Forum, Jan 21, 2026
2) Genie 3, Google DeepMind, Jan 29, 2026
3) Is agentic coding viable for Kaggle competitions?, January 16, 2026

You can enjoy our video news ToshiStats-AI from this link, too!

Toshifumi Kuga

January 16, 2026

Agentic AI, claude code, Opus 4.5, agentic coding

Is agentic coding viable for Kaggle competitions?

Toshifumi Kuga

January 16, 2026

Agentic AI, claude code, Opus 4.5, agentic coding

The "Agentic Coding" trend continues to accelerate as we enter 2026. In this post, I will challenge myself to see how high I can push accuracy by delegating the coding process to an AI agent, using data from the Kaggle competition Home Credit Default Risk [1]. Let's get started right away.

1. Combining Claude Code and Opus 4.5

I will be using Opus 4.5, a generative AI renowned for its coding capabilities. Additionally, I will use Claude Code as my coding assistant, as shown below. While I enter instructions into the prompt box, I do not write any Python code myself.

You can see the words "plan mode" at the bottom of the screen. In this mode, Claude Code formulates an implementation plan based on my instructions. I simply review it, and if everything looks good, I authorize the execution.

Let's look at the actual instructions I issued. It is quite long for a "prompt," spanning about two A4 pages. The beginning of the implementation instructions is shown below. I wrote it in great detail. I'd like you to pay special attention to the final instruction regarding the creation of 50 new features using ratio calculations.

Part of the Product Requirement Document

Below is a portion of the implementation plan formulated by the AI agent. It details the method for creating new features via ratio calculations. Although I only specified the quantity of features, the plan shows that it selected features likely to be relevant to loan defaults before calculating the ratios.

The AI agent utilized its own domain knowledge to make these selections; they were certainly not chosen at random. This demonstrates the high-level judgment capabilities unique to AI agents.

New feature creation plan by the AI Agent

Part of the new features actually created by the AI Agent

2. Achieving an AUC of 0.79

By adopting LightGBM as the machine learning library, using the newly created features, and performing hyperparameter tuning, I was able to achieve an AUC of 0.79063, as shown below.

Reaching this level without writing a single line of Python code myself marks this experiment as a success. The data used to build the machine learning model consisted of seven different CSV files. These had to be merged correctly, and the AI agent handled this task seamlessly. Truly impressive!

3. Will AI Agents Handle Future Machine Learning Model Development?

While the computation time depends on the number of features created, it generally took between 1 to 4 hours. I ran the process several times, and the calculation never stopped due to syntax errors. The AI agent likely corrected any errors itself before proceeding to the next calculation step.

Therefore, once the initial implementation plan is approved, the results are generated without any further human intervention. This could be revolutionary. You simply input what you want to achieve via a PRD (Product Requirement Document), the AI agent creates an implementation plan, and once you approve it, you just wait for the results. The potential for multiplying productivity several times over is certainly there.

How was it? I was personally astonished by the high potential of the "Claude Code and Opus 4.5" combination. With a little ingenuity, it seems capable of even more.

This story is just beginning. Opus 4.5 will likely be upgraded to Opus 5 within the year. I am already looking forward to seeing what AI agents will be capable of then.

That’s all for today. Stay tuned!

1) Home Credit Default Risk, kaggle

You can enjoy our video news ToshiStats-AI from this link, too!