recursive self-improvement

Unlocking Recursive Self-Improvement via Meta-Harness

Recently, discussions on how to significantly improve AI agent performance by optimizing "what information is provided to the agent and at what timing" have been gaining momentum. In this post, based on a recent research paper, we will explore the possibility of "Recursive Self-Improvement of AI Agents," where agents improve their own performance. Let’s dive in.

 

1. Meta-Harness: A New Methodology for Harness Construction

A paper (1) from Stanford University has introduced a novel approach that significantly boosts accuracy. I believe the two major features are as follows:

  • Full access to past information

  • Adoption of Claude Code

The paper defines a "harness" as follows:

The performance of large language model (LLM) systems depends not only on model weights, but also on their harness: the code that determines what information to store, retrieve, and present to the model.

Simply put, a harness is the mechanism surrounding the generative AI that controls data to maximize its performance. To build this harness using an AI agent, it seems that maximum data access is required.

By running a loop as shown below, "Recursive Self-Improvement"—where the agent learns from past failures to improve itself—becomes possible.

                   Meta-Harness

 

2. Full Access to Information: The Secret to Improved Accuracy

Previously, there were various methods for constructing harnesses, but humans had to summarize or compress large amounts of information in some form. Consequently, critical information was often lost during the process, creating a bottleneck when aiming for higher accuracy.

"Meta-Harness" addresses this by granting the proposer access to all past logs and files. By allowing the agent to see all information without concealment, this structure eliminates the bottleneck. As a result, it achieved excellent performance on the Pareto frontier, as shown below.

‍  ‍                Pareto Frontier

This graph illustrates the relationship between additional information (context) and accuracy. The closer a point is to the top-left, the higher the accuracy achieved with less information, which signifies superior performance.

 

3. The Emergence of Claude Code

The proposer plays a central role in "Meta-Harness." Let’s look at the details through pseudo-code, where P represents the proposer. Looking at the section outlined in red, we can see that a new harness is being created by the proposer.

‍  ‍                 Pseudo-code

In this context, the proposer specifically refers to Claude Code. In other words, the new harness is created based on the latent capabilities of Claude Code. While Claude Code is proving active in various fields, it appears here again in a leading role. It is truly impressive. This demonstrates that future AI research will be driven by AI agents like Claude Code at its core. We are truly at the cutting edge of the era.

 

Conclusion

As we have seen, providing Claude Code with maximum information access enables the construction of high-performance harnesses. Of course, detailed tuning is necessary, so I highly recommend reading the full paper.

At ToshiStats, we will continue to cover harness design, which is the key to improving AI agent accuracy. Stay tuned!

 

You can enjoy our video news ToshiStats AI Weekly Review from this link, too!

1) Meta-Harness: End-to-End Optimization of Model Harnesses,  Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, Chelsea Finn,  Mar 30, 2026

 

Notice: This is for educational purpose only. ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the report, the codes and the software.

 

Google DeepMind Announces "AlphaEvolve," Hinting at an Intelligence Explosion!

Google DeepMind has unveiled a new research paper today, introducing "AlphaEvolve" (1), a coding agent that leverages evolutionary computation. It's already garnering significant attention due to its broad applicability and proven successes, such as discovering more efficient methods for matrix calculations in mathematics and improving efficiency in Google's data centers. Let's dive a little deeper into what makes it so remarkable.

 

LLMs Empowered with Evolutionary Computation

In a nutshell, "AlphaEvolve" can be described as an "agent that leverages LLMs to the fullest to evolve code." To briefly touch upon "evolutionary computation," it's an algorithm that mimics the process of evolution in humans and living organisms to improve systems, replicating genetic crossover and mutation on a computer. Traditionally, the function responsible for this, called an "Operator," had to be set by humans. "AlphaEvolve" automates the creation of Operators with the support of LLMs, enabling more efficient code generation. That sounds incredibly powerful! While evolutionary computation itself isn't new, with practical applications dating back to the 2000s, its combination with LLMs appears to have unlocked new capabilities. The red box in the diagram below indicates where evolutionary computation is applied.

 

2. Continued Evolution with Meta-Prompts

I'm particularly intrigued by the "prompt_sampler" mentioned above because this is where "meta-prompts" are executed. The paper explains, "Meta prompt evolution: instructions and context suggested by the LLM itself in an additional prompt-generation step, co-evolved in a separate database analogous to the solution programs." It seems that prompts are also evolving! The diagram below also shows that accuracy decreases when meta-prompt evolution is not applied compared to when it is.

This is incredible! With an algorithm like this, I'd certainly want to apply it to my own tasks.

 

3. Have We Taken a Step Closer to an Intelligence Explosion?

Approximately a year ago, researcher Leopold Aschenbrenner published a paper (2) predicting that computers would surpass human performance by 2030 as a result of an intelligence explosion. The graph below illustrates this projection. This latest "AlphaEvolve" can be seen as having acquired the ability to improve its own performance. This might just be a step closer to an intelligence explosion. It's hard to imagine the outcome of countless AI agents like this, each evolving independently, but it certainly feels like something monumental is on the horizon. After all, computers operate 24 hours a day, 365 days a year, so once they acquire self-improvement capabilities, their pace of evolution is likely to accelerate. He refers to this as "recursive self-improvement" (p47).

 



What are your thoughts? The idea of AI surpassing humans can be a bit challenging to grasp intuitively, but just thinking about what AI agents might be like around 2027 is incredibly exciting. I'll be sure to provide updates if a sequel to "AlphaEvolve" is released in the future. That's all for now. Stay tuned!

 


1) AlphaEvolve: A coding agent for scientific and algorithmic discovery Alexander Novikov* , Ngân Vu˜ * , Marvin Eisenberger* , Emilien Dupont* , Po-Sen Huang* , Adam Zsolt Wagner* , Sergey Shirobokov* , Borislav Kozlovskii* , Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli and Matej Balog* Google DeepMind ,16 May, 2025

2) S I T U AT I O N A L AWA R E N E S S  The Decade Ahead, Leopold Aschenbrenner, June 2024


 


Copyright © 2025 Toshifumi Kuga. All right reserved

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes, the software and the contents.