Anthropic

Opus 4.7’s Auto Mode: The Secret Weapon for Boosting Productivity

Anthropic has released the frontier generative AI model, Opus 4.7. This update comes just over two months after the release of Opus 4.6, highlighting the accelerating pace of technological progress. In this article, I will dive deep into the remarkable new feature added alongside Opus 4.7, "Auto Mode," by utilizing it to build a machine learning model for credit default prediction.

 

1. What is Auto Mode?

Boris Cherney, the developer of Claude Code—an Agentic coding development environment—commented on "Auto Mode" as follows:

Auto mode = no more permission prompts

In the past, you either had to babysit the model while it did these sorts of long tasks, our use--dangerously-skip-permissions.We recently rolled out auto mode as a safer alternative. In this mode, permission prompts are routed to a model-based classifier to decide whether the command is safe to run. If it'ssafe, it's auto-approved.

In short, this feature reduces the frequency of "Please approve" requests that appear during long agentic coding sessions, thereby boosting productivity. For someone like me, who handles dozens of these approval requests daily, this is a very welcome addition.

You can verify the "Auto Mode" status via the indicator at the bottom left of the Claude Code interface.

Auto Mode

When you first enable it, a notice will appear; I recommend giving it a thorough read.

notice of Auto Mode

 

2. Building a Default Prediction Model with Auto Mode

I used Claude Code’s "Auto Mode" to actually build a default prediction model. For this project, I used data from Home Credit Default Risk competition(2) at Kaggle .

First, I created an implementation plan using Plan Mode. Through dialogue with Claude Code, a structured plan was established.

                  Implementation Plan

At this stage, Claude Code asks, "Would you like to use Auto Mode?" and answering "Yes" initiates the process.

                   Approval Request

The Implementation Process: I watched to see how many approval requests would appear before completion.

                Implementation using Auto Mode

After approximately 90 minutes, the system announced, "Finished." Remarkably, not a single approval request was triggered. This makes the work significantly easier and the implementation process much more enjoyable.

                   Completion Notice

Accuracy Validation: I checked the evaluation metric on Kaggle. The result was an AUC = 0.79632. This is my personal best for a single model without using ensembles. It ranks within the top 4.2% of the competition. Achieving this score without any manual intervention after the initial planning phase is truly astonishing.

                 Evaluation Metric

 

3. Auto Mode and Productivity in Data Analysis

While Auto Mode makes implementation effortless, its true power lies elsewhere. Because the frequency of approval requests has decreased so dramatically, it is now feasible to work with parallel computing—building multiple models simultaneously.

Whether in Kaggle competitions or practical business scenarios, we are often required to improve accuracy within a limited timeframe. If parallel computing becomes this easy, increasing productivity by 5x to 10x is no longer just a dream. It is a challenge well worth taking.

 

Conclusion

Auto Mode has simplified parallel computing and opened a new path toward enhanced productivity. At ToshiStats, we will continue to explore case studies using Auto Mode.

Stay tuned!

 

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) https://x.com/bcherny/status/2044847848035156457, Boris Cherney, Anthropic
2) Home Credit Default Risk, kaggle









Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

Revolutionizing Enterprise AI: The Power of Claude Managed Agents

Anthropic, a leader in generative AI, has announced "Claude Managed Agents," an AI agent hosting service. This service appears to offer significant advantages for enterprises utilizing AI agents, so let’s dive deeper into what it’s all about.

 

1. What is "Claude Managed Agents"?

First, what exactly is "Claude Managed Agents"? Let’s look at a quote from Anthropic's technical blog (1):

Harnesses encode assumptions that go stale as models improve. Managed Agents—our hosted service for long-horizon agent work—is built around interfaces that stay stable as harnesses change.

It seems "Claude Managed Agents" refers to an AI agent infrastructure designed for stable, long-term operation, even as underlying models are updated. A key concept here—which is also the title of their blog post—is "Decoupling the brain from the hands."

The solution we arrived at was to decouple what we thought of as the “brain” (Claude and its harness) from both the “hands” (sandboxes and tools that perform actions) and the “session”

Because the functions are separated, if the system stops, you only need to fix the specific affected part to achieve a quick recovery. This certainly looks promising.

              Decoupling the brain from the hands

 

2. Creating a Customer Complaint Classification Agent with "Claude Managed Agents"

Descriptions alone don't quite capture the experience, so let’s try running "Claude Managed Agents" ourselves. First, we enter a prompt into the box on the bottom left.

               Claude Managed Agents Console

For this test, we will create an agent to classify bank customer complaints. I have instructed it to select one of six financial products. Immediately, a configuration file is generated as shown below. Next, we create the agent.

               Prompt Input and Configuration File

The agent is now created. Next, we set up the environment.

                Environment Configuration

The environment is ready. Now, we start a session.

Start Session

The session has begun.

Ready

The preparation was finished in no time. There is nothing technically difficult about this; it’s just a matter of clicking buttons. Let's test it out immediately. I'll enter a bank customer complaint as follows:

Bank Customer Complaint Input

The result came back as "Student loan." Correct!

Now, let’s try one more.

It came back as "Mortgage". Correct!

It’s working perfectly. All I did was provide a prompt instructing the AI agent on what to do. The rest was handled almost automatically by "Claude Managed Agents." This is impressive.

 

3. Easy Enterprise Scaling: The Rakuten Success Story

Now, let's look at an example of a Japanese company that used "Claude Managed Agents" to scale its AI agents: Rakuten, the e-commerce giant. By switching from in-house infrastructure development to "Claude Managed Agents," they succeeded in deploying AI agents across the company with overwhelming speed.

“Deployed Claude Managed Agents across product, sales, marketing, finance within one week“ (2)

It is particularly notable that business-side staff, not just engineers, are actively involved. It truly sounds like a company-wide initiative. Wonderful! I look forward to seeing more Japanese companies follow this lead.

"Claude Managed Agents" Success Story: Rakuten

 

How was that? Between the rapid development enabled by "Claude Managed Agents" and the reduced maintenance burden associated with updating frontier models, this feels like a paradigm shift in enterprise AI. While concerns about vendor lock-in remain, for companies that prioritize speed above all else, "Claude Managed Agents" appears to be an ideal service.

ToshiStats will continue to cover AI agent development in the corporate world. Stay tuned!

 
 

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) Scaling Managed Agents: Decoupling the brain from the hands,  Anthropic
2) Rakuten accelerates development with Claude Code,  Anthropic

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

Navigating the Evolution of Generative AI: Insights from Anthropic

Every week, a variety of generative AI updates are released, and it feels as though this pace will only continue to accelerate. On the other hand, many people may be feeling lost, wondering how exactly they should navigate these changes. Therefore, in this post, I would like to explore some hints from Anthropic's technical blog (1).

 

1. Experiments at Anthropic

Mr. Prithvi Rajasekaran from the Labs team has provided a detailed report on several implementation experiments.

The experiments consisted of three projects: front-end design development, full-stack 2D retro game development, and Digital Audio Workstation (DAW) development. This time, I would like to focus specifically on the full-stack 2D retro game development. Through various development and implementation processes, they observed cases where long-running agentic coding failed. A common factor was that the AI often overestimated incomplete implementations, judging them to be at a sufficient level when they were actually still unfinished. They believed that unless this was improved, it would be impossible to achieve satisfactory results in long-running agentic coding.

 

2. The Key Technology for Success

To address this, a "harness" design consisting of a pair of a Generator and an Evaluator was introduced. This was reportedly inspired by a technology well-known in image generation called Generative Adversarial Networks (GANs). For more details, please see below. In short, the model does not evaluate its own work.

New Harness Design

A loop was established between the Generator and the Evaluator, where flawed implementations were subjected to rigorous criticism. Naturally, this took a significant amount of time, and costs jumped by 20 times. However, the quality improved even more than the cost suggested. The return on investment was clearly sufficient.

Performance Comparison: Single Agent vs. Full Harness

3. Gains from the Update from Opus 4.5 to 4.6

While the AI engineers were continuing to refine the harness, an update for the generative AI model, Opus, was released, moving the version from 4.5 to 4.6. The performance improvement in Opus 4.6 was remarkable, and as a result, part of the harness that had been necessary for Opus 4.5 became redundant. This allowed the implementation to become simpler. Fantastic! Please see the chart below for details. In the V2 harness, a portion of V1 has indeed been removed.

Harness Design with Opus 4.6

Based on this experience, the blog describes the following lessons:

“the better the models get, the more space there is to develop harnesses that can achieve complex tasks beyond what the model can do at baseline.”

“From this work, my conviction is that the space of interesting harness combinations doesn't shrink as models improve. Instead, it moves, and the interesting work for AI engineers is to keep finding the next novel combination.”

In other words, I believe this means: "As the capabilities of generative AI improve, the number of things that can be solved by a standalone baseline model increases, making parts of existing harnesses unnecessary. However, as the capability of the baseline model rises, tasks that were previously unreachable become solvable by improving the harness design." If the things we can do with new generative AI models continue to increase, our opportunities for harness design will also grow, and it looks like we will be kept quite busy.

 

What did you think? As the capabilities of generative AI rise, it is expected that new harness designs will be required to push those capabilities to their limits. It seems there will be plenty to do, at least until AGI is realized. ToshiStats will continue to feature harness designs, which are the key to improving the accuracy of AI agents. Stay tuned!

 
 

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) Harness design for long-running application development,  Engineering at Anthropic.  Mar 24, 2026

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

What Will White-Collar Jobs Be Like in 2030? What Should We Do Now?

As many of you may know, Dario Amodei has issued a warning to people. Roughly speaking, he stated, "The demand for entry-level jobs, such as those performed by new graduates, will be cut in half. This will become a reality within the next one to five years." This is shocking news, and the fact that it came from the CEO of a company actually developing generative AI has made it a global topic of discussion. In this article, I would like to delve deeper into this matter.

 

1. Dario Amodei's Warning

He is the co-founder and CEO of Anthropic, a U.S. company developing generative AI. He holds a Ph.D. in Physics from Princeton University, and from what I've seen, he strikes me more as a researcher than a business executive. I've been following his statements for the past two years, and I remember them being relatively conservative. I thought they were consistent with his researcher-like nature. However, this time he stated, "We are not keeping up with the pace of AI evolution," and "Unemployment rates will be 10% to 20%" (1), which shocked the world. I don't recall similar warnings from other frontier model development companies like OpenAI or Google DeepMind. This is why his latest statement garnered so much attention.

 

2. Current Performance of Generative AI

Currently, generative AI indeed possesses sufficient ability to handle entry-level tasks. As I mentioned before, Google Gemma 3, an open-source generative AI, achieved an accuracy of around 80% without any specific tuning for a 6-class classification task of bank customer complaints. Typically, relatively simple tasks like "Which product does this complaint relate to?" are assigned to new employees, and they learn the ropes through these assignments. However, with generative AI's performance reaching this level, management will undoubtedly lose the incentive to assign tasks to new employees at a cost. It's not yet clear whether the impact will be as significant as half of entry-level jobs disappearing, but given that even free generative AI can achieve around 80% accuracy today, a considerable impact is inevitable.

 

3. So, What Should We Do?

There is a division of opinion among experts regarding when AGI (Artificial General Intelligence), with capabilities equivalent to human experts, will appear. The most common estimate seems to be around 2030, but honestly, it's not clear. If so, we have about five years. In any case, we need to adapt our skills to the advent of AGI. Past computers could not be instructed or managed without a computer language. However, with the emergence of ChatGPT in November 2022, generative AI can now be instructed using natural language—"prompts." However, prompting is not a simple matter. It's an extremely delicate process of finely controlling the behavior of generative AI to precisely fit one's needs. Therefore, it's not uncommon to write prompts exceeding 20 to 30 lines. While I cannot delve into the detailed techniques here, it is certainly a skill that requires logical prompt writing. Even though prompts can be written in English or Japanese, acquiring this skill requires time and individual training. Given that open-source and free generative AIs are rapidly improving in performance, it is imperative for us, as users, to learn "prompting," the method of controlling them, regardless of our position or industry.

 

What do you think? It's good that Dario Amodei's warning has sparked more active discussion. As I mentioned in my previous blog post, generative AI is on the verge of implementing recursive self-improvement, gaining the ability for computers to improve themselves. The evolution of generative AI will accelerate further in the future. I believe the time has come to thoroughly learn prompting and prepare for the emergence of AGI. Discussions about AI and employment will continue globally. ToshiStats will keep you updated. Stay tuned!

 
 

ToshiStats Co., Ltd. offers various AI-related services. Please check them out here!



Copyright © 2025 Toshifumi Kuga. All right reserved

1) AI company's CEO issues warning about mass unemployment, CNN, May 30, 2025

 

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.