Toshifumi Kuga

May 31, 2026

Opus 4.8, Anthropic, claude code, investment memo

A Game-Changer for Financial Analysts: How Opus 4.8 Redefines Financial Research !

Toshifumi Kuga

May 31, 2026

Opus 4.8, Anthropic, claude code, investment memo

Anthropic has announced the update of its generative AI, Claude Opus 4.8. This update came less than 40 days after the previous one, which came as a bit of a surprise, but it may indicate that their internal development efficiency has increased significantly. Therefore, in this article, I would like to take on the challenge of using a combination of Claude Code and Opus 4.8 to conduct a financial analysis using US financial statements and create an investment memo.

1. Opus 4.8: The Most Powerful Model at Present

As always, when a new generative AI model is released, I compare its performance with existing models. The introduction page for Opus 4.8 (1) features the comparison table shown below. It is reported to have outperformed existing models in almost all areas. While strong coding capability is a tradition for the Opus series, what caught my attention was its exceptional strength in knowledge work. As indicated by the red box, it has achieved excellent results in two benchmarks that measure knowledge work capabilities.

‍　　　　　　　　　　　Opus 4.8 Performance Comparison

Therefore, in this article, I would like to verify the potential of Opus 4.8 regarding knowledge work.

2. Challenging the Creation of an Investment Memo

This time, I will attempt to create an investment memo for Google using Form 10-K, the annual performance report registered with the US SEC. An investment memo is an internal document created for investors to make a final in-house decision (approval) on whether or not to execute an investment in a specific company. Normally, financial analysts mobilize their expertise to create this based on source materials. This time, I would like to try automating that process.

First, I used the plan mode of Claude Code to formulate an implementation plan. I created a detailed plan this time as well. The following shows the initial part of it, but the actual plan continues further.

After reviewing the created implementation plan and confirming there were no issues, I switched Claude Code to auto mode and actually started coding. This time, the implementation was completed all at once in about 30 minutes without stopping midway. Once I gave the green light, there was no human intervention required. It was a moment where I caught a glimpse of the true capability of Opus 4.8.

Normally, you would need a "prompt" that defines and instructs how to write each section of the investment memo, but I did not need to write it myself. Here too, Opus 4.8 automatically generated the "prompts" for me. The following is an example of this, and it is well-written without missing any key points. It is truly amazing.

‍　　‍　　　　　　　　　　　　　　Generated Prompt Example

3. Reviewing the Investment Memo

In this experiment, I had the investment memo created in both English and Japanese versions and outputted as PDF files. Let's take a look at the content right away. It summarizes the overview beautifully in the opening section, as shown below. It looks very sophisticated.

investment memo by ClaudeCode with Opus4.8

It also summarizes the investment theme concisely as follows.

The investment memo this time exceeds 10 pages in total, so I cannot introduce the full text here, but I would like to look specifically at the section on competitive advantage analysis.

I think it is very well summarized. If the process can be automated to this extent, humans only need to review it, which will dramatically increase work efficiency. Furthermore, if you desire a deeper analysis leveraging domain knowledge, you can simply rewrite the "prompts." This means you can proceed based on existing work, allowing for smooth and efficient collaboration between humans and generative AI. It is wonderful. By the way, please understand that these texts were created for educational purposes and cannot be used for making investment decisions.

What did you think? I challenged the creation of an investment memo using Claude Code and Opus 4.8, and the results exceeded my expectations. I believe the performance of Opus 4.8 in knowledge work was outstanding. However, I would like to emphasize that a final review by a human is absolutely necessary. It is important to bear in mind that hallucinations can still occur. Moving forward, cooperation between generative AI and humans will continue to be essential.

At Toshi Stats, we plan to take on various tasks using Opus 4.8. Stay tuned!

You can enjoy our video news “ToshiStats AI Weekly Review” from this link, too!

1) Introducing Claude Opus 4.8, May 28, 2026, Anthropic PBC

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

Toshifumi Kuga

May 2, 2026

Google, TPU, Anthropic, Google DeepMind

The Race for AI Supremacy: Will Google Come Out on Top?

Toshifumi Kuga

May 2, 2026

Google, TPU, Anthropic, Google DeepMind

The AI market is a battlefield where diverse players like OpenAI, Anthropic, NVIDIA, Alibaba, and Tencent are engaged in fierce competition. Today, I want to focus on Google and delve into whether they can truly seize hegemony in the AI market in the near future.

1. Google’s Secret Weapon: The 8th Generation TPU

Google recently announced its 8th generation TPU (1). The most significant feature of this generation is the separation into independent chips for training and inference. What particularly caught my attention is the remarkable improvement in inference speed. As highlighted in the red frame, the computation speed has increased approximately tenfold compared to the previous generation. While I found myself wondering, "Can it really get this much faster in just one year?", I am eager to try it out as soon as possible. It is expected to debut later this year.

With TPU inference becoming this fast, we might see the same generative AI models produce results significantly quicker when running on TPUs. Currently, among public clouds, only Google Cloud offers the TPU option, which is likely to further boost Google Cloud's competitive edge.

2. Massive Investment in Anthropic

Currently, the most popular frontier model in the AI market is Claude, developed by Anthropic. It is exceptionally strong, particularly in the B2B sector. Recently, Google reportedly committed to a massive investment in Anthropic (up to $40 billion, albeit with conditions) (2). From the perspective of frontier model development, Google and Anthropic are competitors. On the other hand, Anthropic is a major customer for Google Cloud.

Therefore, this massive investment holds significant strategic weight. If the likelihood of Claude’s training and inference being performed on TPUs increases, so does the potential for Google to generate revenue from it. This can be viewed as a form of risk diversification for Google. While it would be ideal if Google’s own frontier model, Gemini, maintained a dominant market share, rivals are constantly launching high-performance models. Practically speaking, it is a rational risk-hedging strategy to have even competing models run on TPUs—thereby collecting Google Cloud usage fees—or to aim for capital gains through equity stakes in those invested companies. In any case, we must keep a close eye on the collaboration between Google and Anthropic.

3. Google DeepMind’s Technical Prowess and Google’s Product Ecosystem

One cannot discuss Google’s AI without mentioning Gemini. Developed by Google DeepMind, this frontier model is natively multimodal and has made headlines for its high performance with every new release. The current model is Gemini 3, and there is anticipation that a next-generation model might be announced at Google I/O, the annual event starting on May 19, 2026. It’s very exciting.

However, Gemini is not the only generative AI from Google DeepMind. Boasting one of the most diverse arrays of models among all AI labs, their portfolio includes image and video generation models, as well as world models like Genie 3 (3).

Furthermore, Google possesses a vast amount of data required for model generation. Google already operates various products globally, and the data harvested from them is immense—YouTube alone is a clear example. Compared to many AI labs that must build their user bases from scratch, Google has an overwhelming advantage. The combination of "Google DeepMind’s technical prowess + data obtained from various products" is unparalleled.

What do you think? Today, we took a deep dive into Google. With powerful technology spanning not just AI model development but various other fields, Google’s strength feels overwhelming. They will likely continue to lead the AI market. Conversely, they are so strong that one might even worry about when they might run afoul of antitrust laws. What are your thoughts?

ToshiStats will continue to cover Google in the future. Stay tuned!

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) Our eighth generation TPUs: two chips for the agentic era, Google, Apr 23, 2026
2) Google to invest up to $40B in Anthropic in cash and compute, TechCrunch, April 24, 2026
3) Genie 3: A new frontier for world models, Google, August 5, 2025

Toshifumi Kuga

April 25, 2026

Auto Mode, agentic coding, Anthropic, claude code

Opus 4.7’s Auto Mode: The Secret Weapon for Boosting Productivity

Toshifumi Kuga

April 25, 2026

Auto Mode, agentic coding, Anthropic, claude code

Anthropic has released the frontier generative AI model, Opus 4.7. This update comes just over two months after the release of Opus 4.6, highlighting the accelerating pace of technological progress. In this article, I will dive deep into the remarkable new feature added alongside Opus 4.7, "Auto Mode," by utilizing it to build a machine learning model for credit default prediction.

1. What is Auto Mode?

Boris Cherney, the developer of Claude Code—an Agentic coding development environment—commented on "Auto Mode" as follows:

Auto mode = no more permission prompts
In the past, you either had to babysit the model while it did these sorts of long tasks, our use--dangerously-skip-permissions.We recently rolled out auto mode as a safer alternative. In this mode, permission prompts are routed to a model-based classifier to decide whether the command is safe to run. If it'ssafe, it's auto-approved.

In short, this feature reduces the frequency of "Please approve" requests that appear during long agentic coding sessions, thereby boosting productivity. For someone like me, who handles dozens of these approval requests daily, this is a very welcome addition.

You can verify the "Auto Mode" status via the indicator at the bottom left of the Claude Code interface.

When you first enable it, a notice will appear; I recommend giving it a thorough read.

2. Building a Default Prediction Model with Auto Mode

I used Claude Code’s "Auto Mode" to actually build a default prediction model. For this project, I used data from Home Credit Default Risk competition(2) at Kaggle .

First, I created an implementation plan using Plan Mode. Through dialogue with Claude Code, a structured plan was established.

At this stage, Claude Code asks, "Would you like to use Auto Mode?" and answering "Yes" initiates the process.

The Implementation Process: I watched to see how many approval requests would appear before completion.

After approximately 90 minutes, the system announced, "Finished." Remarkably, not a single approval request was triggered. This makes the work significantly easier and the implementation process much more enjoyable.

Accuracy Validation: I checked the evaluation metric on Kaggle. The result was an AUC = 0.79632. This is my personal best for a single model without using ensembles. It ranks within the top 4.2% of the competition. Achieving this score without any manual intervention after the initial planning phase is truly astonishing.

3. Auto Mode and Productivity in Data Analysis

While Auto Mode makes implementation effortless, its true power lies elsewhere. Because the frequency of approval requests has decreased so dramatically, it is now feasible to work with parallel computing—building multiple models simultaneously.

Whether in Kaggle competitions or practical business scenarios, we are often required to improve accuracy within a limited timeframe. If parallel computing becomes this easy, increasing productivity by 5x to 10x is no longer just a dream. It is a challenge well worth taking.

Conclusion

Auto Mode has simplified parallel computing and opened a new path toward enhanced productivity. At ToshiStats, we will continue to explore case studies using Auto Mode.

Stay tuned!

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) https://x.com/bcherny/status/2044847848035156457, Boris Cherney, Anthropic
2) Home Credit Default Risk, kaggle

Toshifumi Kuga

April 18, 2026

Claude Managed Agents, AI agent, Anthropic, claude code

Revolutionizing Enterprise AI: The Power of Claude Managed Agents

Toshifumi Kuga

April 18, 2026

Claude Managed Agents, AI agent, Anthropic, claude code

Anthropic, a leader in generative AI, has announced "Claude Managed Agents," an AI agent hosting service. This service appears to offer significant advantages for enterprises utilizing AI agents, so let’s dive deeper into what it’s all about.

1. What is "Claude Managed Agents"?

First, what exactly is "Claude Managed Agents"? Let’s look at a quote from Anthropic's technical blog (1):

Harnesses encode assumptions that go stale as models improve. Managed Agents—our hosted service for long-horizon agent work—is built around interfaces that stay stable as harnesses change.

It seems "Claude Managed Agents" refers to an AI agent infrastructure designed for stable, long-term operation, even as underlying models are updated. A key concept here—which is also the title of their blog post—is "Decoupling the brain from the hands."

The solution we arrived at was to decouple what we thought of as the “brain” (Claude and its harness) from both the “hands” (sandboxes and tools that perform actions) and the “session”

Because the functions are separated, if the system stops, you only need to fix the specific affected part to achieve a quick recovery. This certainly looks promising.

2. Creating a Customer Complaint Classification Agent with "Claude Managed Agents"

Descriptions alone don't quite capture the experience, so let’s try running "Claude Managed Agents" ourselves. First, we enter a prompt into the box on the bottom left.

For this test, we will create an agent to classify bank customer complaints. I have instructed it to select one of six financial products. Immediately, a configuration file is generated as shown below. Next, we create the agent.

The agent is now created. Next, we set up the environment.

The environment is ready. Now, we start a session.

The session has begun.

The preparation was finished in no time. There is nothing technically difficult about this; it’s just a matter of clicking buttons. Let's test it out immediately. I'll enter a bank customer complaint as follows:

The result came back as "Student loan." Correct!

Now, let’s try one more.

It came back as "Mortgage". Correct!

It’s working perfectly. All I did was provide a prompt instructing the AI agent on what to do. The rest was handled almost automatically by "Claude Managed Agents." This is impressive.

3. Easy Enterprise Scaling: The Rakuten Success Story

Now, let's look at an example of a Japanese company that used "Claude Managed Agents" to scale its AI agents: Rakuten, the e-commerce giant. By switching from in-house infrastructure development to "Claude Managed Agents," they succeeded in deploying AI agents across the company with overwhelming speed.

“Deployed Claude Managed Agents across product, sales, marketing, finance within one week“ (2)

It is particularly notable that business-side staff, not just engineers, are actively involved. It truly sounds like a company-wide initiative. Wonderful! I look forward to seeing more Japanese companies follow this lead.

"Claude Managed Agents" Success Story: Rakuten

How was that? Between the rapid development enabled by "Claude Managed Agents" and the reduced maintenance burden associated with updating frontier models, this feels like a paradigm shift in enterprise AI. While concerns about vendor lock-in remain, for companies that prioritize speed above all else, "Claude Managed Agents" appears to be an ideal service.

ToshiStats will continue to cover AI agent development in the corporate world. Stay tuned!

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) Scaling Managed Agents: Decoupling the brain from the hands, Anthropic
2) Rakuten accelerates development with Claude Code, Anthropic

Toshifumi Kuga

April 3, 2026

harness, agentic coding, claude code, Anthropic

Navigating the Evolution of Generative AI: Insights from Anthropic

Toshifumi Kuga

April 3, 2026

harness, agentic coding, claude code, Anthropic

Every week, a variety of generative AI updates are released, and it feels as though this pace will only continue to accelerate. On the other hand, many people may be feeling lost, wondering how exactly they should navigate these changes. Therefore, in this post, I would like to explore some hints from Anthropic's technical blog (1).

1. Experiments at Anthropic

Mr. Prithvi Rajasekaran from the Labs team has provided a detailed report on several implementation experiments.

The experiments consisted of three projects: front-end design development, full-stack 2D retro game development, and Digital Audio Workstation (DAW) development. This time, I would like to focus specifically on the full-stack 2D retro game development. Through various development and implementation processes, they observed cases where long-running agentic coding failed. A common factor was that the AI often overestimated incomplete implementations, judging them to be at a sufficient level when they were actually still unfinished. They believed that unless this was improved, it would be impossible to achieve satisfactory results in long-running agentic coding.

2. The Key Technology for Success

To address this, a "harness" design consisting of a pair of a Generator and an Evaluator was introduced. This was reportedly inspired by a technology well-known in image generation called Generative Adversarial Networks (GANs). For more details, please see below. In short, the model does not evaluate its own work.

A loop was established between the Generator and the Evaluator, where flawed implementations were subjected to rigorous criticism. Naturally, this took a significant amount of time, and costs jumped by 20 times. However, the quality improved even more than the cost suggested. The return on investment was clearly sufficient.

**Performance Comparison: Single Agent vs. Full Harness**

3. Gains from the Update from Opus 4.5 to 4.6

While the AI engineers were continuing to refine the harness, an update for the generative AI model, Opus, was released, moving the version from 4.5 to 4.6. The performance improvement in Opus 4.6 was remarkable, and as a result, part of the harness that had been necessary for Opus 4.5 became redundant. This allowed the implementation to become simpler. Fantastic! Please see the chart below for details. In the V2 harness, a portion of V1 has indeed been removed.

Based on this experience, the blog describes the following lessons:

“the better the models get, the more space there is to develop harnesses that can achieve complex tasks beyond what the model can do at baseline.”

“From this work, my conviction is that the space of interesting harness combinations doesn't shrink as models improve. Instead, it moves, and the interesting work for AI engineers is to keep finding the next novel combination.”

In other words, I believe this means: "As the capabilities of generative AI improve, the number of things that can be solved by a standalone baseline model increases, making parts of existing harnesses unnecessary. However, as the capability of the baseline model rises, tasks that were previously unreachable become solvable by improving the harness design." If the things we can do with new generative AI models continue to increase, our opportunities for harness design will also grow, and it looks like we will be kept quite busy.

What did you think? As the capabilities of generative AI rise, it is expected that new harness designs will be required to push those capabilities to their limits. It seems there will be plenty to do, at least until AGI is realized. ToshiStats will continue to feature harness designs, which are the key to improving the accuracy of AI agents. Stay tuned!

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) Harness design for long-running application development, Engineering at Anthropic. Mar 24, 2026

Toshifumi Kuga

June 8, 2025

white collar, generative ai, recursiveself-improvement, Anthropic

What Will White-Collar Jobs Be Like in 2030? What Should We Do Now?

Toshifumi Kuga

June 8, 2025

white collar, generative ai, recursiveself-improvement, Anthropic

As many of you may know, Dario Amodei has issued a warning to people. Roughly speaking, he stated, "The demand for entry-level jobs, such as those performed by new graduates, will be cut in half. This will become a reality within the next one to five years." This is shocking news, and the fact that it came from the CEO of a company actually developing generative AI has made it a global topic of discussion. In this article, I would like to delve deeper into this matter.

1. Dario Amodei's Warning

He is the co-founder and CEO of Anthropic, a U.S. company developing generative AI. He holds a Ph.D. in Physics from Princeton University, and from what I've seen, he strikes me more as a researcher than a business executive. I've been following his statements for the past two years, and I remember them being relatively conservative. I thought they were consistent with his researcher-like nature. However, this time he stated, "We are not keeping up with the pace of AI evolution," and "Unemployment rates will be 10% to 20%" (1), which shocked the world. I don't recall similar warnings from other frontier model development companies like OpenAI or Google DeepMind. This is why his latest statement garnered so much attention.

2. Current Performance of Generative AI

Currently, generative AI indeed possesses sufficient ability to handle entry-level tasks. As I mentioned before, Google Gemma 3, an open-source generative AI, achieved an accuracy of around 80% without any specific tuning for a 6-class classification task of bank customer complaints. Typically, relatively simple tasks like "Which product does this complaint relate to?" are assigned to new employees, and they learn the ropes through these assignments. However, with generative AI's performance reaching this level, management will undoubtedly lose the incentive to assign tasks to new employees at a cost. It's not yet clear whether the impact will be as significant as half of entry-level jobs disappearing, but given that even free generative AI can achieve around 80% accuracy today, a considerable impact is inevitable.

3. So, What Should We Do?

There is a division of opinion among experts regarding when AGI (Artificial General Intelligence), with capabilities equivalent to human experts, will appear. The most common estimate seems to be around 2030, but honestly, it's not clear. If so, we have about five years. In any case, we need to adapt our skills to the advent of AGI. Past computers could not be instructed or managed without a computer language. However, with the emergence of ChatGPT in November 2022, generative AI can now be instructed using natural language—"prompts." However, prompting is not a simple matter. It's an extremely delicate process of finely controlling the behavior of generative AI to precisely fit one's needs. Therefore, it's not uncommon to write prompts exceeding 20 to 30 lines. While I cannot delve into the detailed techniques here, it is certainly a skill that requires logical prompt writing. Even though prompts can be written in English or Japanese, acquiring this skill requires time and individual training. Given that open-source and free generative AIs are rapidly improving in performance, it is imperative for us, as users, to learn "prompting," the method of controlling them, regardless of our position or industry.

What do you think? It's good that Dario Amodei's warning has sparked more active discussion. As I mentioned in my previous blog post, generative AI is on the verge of implementing recursive self-improvement, gaining the ability for computers to improve themselves. The evolution of generative AI will accelerate further in the future. I believe the time has come to thoroughly learn prompting and prepare for the emergence of AGI. Discussions about AI and employment will continue globally. ToshiStats will keep you updated. Stay tuned!

ToshiStats Co., Ltd. offers various AI-related services. Please check them out here!

1) AI company's CEO issues warning about mass unemployment, CNN, May 30, 2025

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.