Toshifumi Kuga

June 1, 2025

recursive self-improvemen, agent, AI, Google DeepMind, AlphaEvolve

Google DeepMind Announces "AlphaEvolve," Hinting at an Intelligence Explosion!

Toshifumi Kuga

June 1, 2025

recursive self-improvemen, agent, AI, Google DeepMind, AlphaEvolve

Google DeepMind has unveiled a new research paper today, introducing "AlphaEvolve" (1), a coding agent that leverages evolutionary computation. It's already garnering significant attention due to its broad applicability and proven successes, such as discovering more efficient methods for matrix calculations in mathematics and improving efficiency in Google's data centers. Let's dive a little deeper into what makes it so remarkable.

LLMs Empowered with Evolutionary Computation

In a nutshell, "AlphaEvolve" can be described as an "agent that leverages LLMs to the fullest to evolve code." To briefly touch upon "evolutionary computation," it's an algorithm that mimics the process of evolution in humans and living organisms to improve systems, replicating genetic crossover and mutation on a computer. Traditionally, the function responsible for this, called an "Operator," had to be set by humans. "AlphaEvolve" automates the creation of Operators with the support of LLMs, enabling more efficient code generation. That sounds incredibly powerful! While evolutionary computation itself isn't new, with practical applications dating back to the 2000s, its combination with LLMs appears to have unlocked new capabilities. The red box in the diagram below indicates where evolutionary computation is applied.

2. Continued Evolution with Meta-Prompts

I'm particularly intrigued by the "prompt_sampler" mentioned above because this is where "meta-prompts" are executed. The paper explains, "Meta prompt evolution: instructions and context suggested by the LLM itself in an additional prompt-generation step, co-evolved in a separate database analogous to the solution programs." It seems that prompts are also evolving! The diagram below also shows that accuracy decreases when meta-prompt evolution is not applied compared to when it is.

This is incredible! With an algorithm like this, I'd certainly want to apply it to my own tasks.

3. Have We Taken a Step Closer to an Intelligence Explosion?

Approximately a year ago, researcher Leopold Aschenbrenner published a paper (2) predicting that computers would surpass human performance by 2030 as a result of an intelligence explosion. The graph below illustrates this projection. This latest "AlphaEvolve" can be seen as having acquired the ability to improve its own performance. This might just be a step closer to an intelligence explosion. It's hard to imagine the outcome of countless AI agents like this, each evolving independently, but it certainly feels like something monumental is on the horizon. After all, computers operate 24 hours a day, 365 days a year, so once they acquire self-improvement capabilities, their pace of evolution is likely to accelerate. He refers to this as "recursive self-improvement" (p47).

What are your thoughts? The idea of AI surpassing humans can be a bit challenging to grasp intuitively, but just thinking about what AI agents might be like around 2027 is incredibly exciting. I'll be sure to provide updates if a sequel to "AlphaEvolve" is released in the future. That's all for now. Stay tuned!

1) AlphaEvolve: A coding agent for scientific and algorithmic discovery Alexander Novikov* , Ngân Vu˜ * , Marvin Eisenberger* , Emilien Dupont* , Po-Sen Huang* , Adam Zsolt Wagner* , Sergey Shirobokov* , Borislav Kozlovskii* , Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli and Matej Balog* Google DeepMind ,16 May, 2025

2) S I T U AT I O N A L AWA R E N E S S The Decade Ahead, Leopold Aschenbrenner, June 2024

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes, the software and the contents.

Toshifumi Kuga

May 13, 2025

LLM, generative ai, ADK, agent

We Built a Customer Complaint Classification Agent with Google's New AI Agent Framework "ADK"

Toshifumi Kuga

May 13, 2025

LLM, generative ai, ADK, agent

On April 9th, Google released a new AI agent framework called "ADK" (Agent Development Kit). It's an excellent framework that incorporates the latest multi-agent technology while also being user-friendly, allowing implementation in about 100 lines of code. At Toshi Stats, we decided to immediately try creating a customer complaint classification agent using ADK.

1. Customer Complaint Classification Task

Banks receive various complaints from customers. We want to classify these complaints based on which financial product they concern. Specifically, this is a 6-class classification task where we choose one from the following six financial products. Random guessing would yield an accuracy below 20%.

2. Implementation with ADK

Now, let's move on to the ADK implementation. We'll defer to the official documentation for file structure and other details, and instead show how to write the AI agent below. The "instruction" part is particularly important; writing this carefully improves accuracy. This is what's known as a "prompt". In this case, we've specifically instructed it to select only one from the six financial products. Other parts are largely unchanged from what's described in tutorials, etc. It has a simple structure, and I believe it's not difficult once you get used to it.

3. Accuracy Verification

We created six classification examples and had the AI agent provide answers. In the first example, I believe it answered "student loan" based on the word "graduation." It's quite smart! Also, in the second example, it's presumed to have answered "mortgage " based on the phrase "prime location." ADK has a built-in UI like the one shown below, which is very convenient for testing immediately after implementation.

The generative AI model used this time, Google's "gemini-2.5-flash-04-17," is highly capable. When tasked with a 6-class classification problem using 100 actual customer complaints received by a bank, it typically achieves an accuracy of over 80%. For simple examples like the ones above, it wouldn't be surprising if it achieved 100% accuracy.

So, what did you think? This was our first time covering ADK, but I feel it will become popular due to its high performance and ease of use. Combined with A2A(2), which was announced by Google around the same time, I believe use cases will continue to increase. We're excited to see what comes next! At Toshi Stats, we will continue to build even more advanced AI agents with ADK. Stay tuned!

1) Agent Development Kit, Google, April 9th, 2025
2) Agent2Agent. Google, April 9th, 2025

Toshifumi Kuga

January 28, 2025

agent, artificial intelligence, generative ai, smolagents, data analysis

Marketing AI agents for customer targeting in telemarketing can also be easily implemented using the new library "smolagents." This looks promising!

Toshifumi Kuga

January 28, 2025

agent, artificial intelligence, generative ai, smolagents, data analysis

1. Marketing AI Agent

To efficiently reach potential customers, it's necessary to target customers who are likely to purchase your products or services. Marketing activities directed at customers without needs are often wasteful and unsuccessful. However, identifying which customers to focus on from a large customer list beforehand is a challenging task. To meet the expectation of easily targeting customers without complex analysis, provided you have customer-related data at hand, we have implemented a marketing AI agent this time. Anyone with basic Python knowledge should be able to implement it without much difficulty. The secret to this lies in the latest framework "smolagents" (1), which we introduced previously. Please refer to the official documentation for details.

2. Agent Predicting Potential Customers for Deposit-Taking Telemarketing

Let's actually build an AI agent. The theme is "Predicting potential customers for deposit-taking telemarketing with an AI agent using smolagents." As before, by providing data, we want the AI agent itself to internally code using Python and automatically display "the top 10 customers most likely to be successfully reached by telemarketing."

While the coding method should be referenced from the official documentation, here we will present what kind of prompt to write to make the AI agent predict potential customers for deposit-taking telemarketing. The key point, as before, is to instruct it to "use sklearn's HistGradientBoostingClassifier for data analysis." This is a gradient boosting library, highly regarded for its accuracy and ease of use.

Furthermore, as a question (instruction), we specifically add the instruction to calculate "the purchase probability of the 10 customers most likely to be successful." The input to the AI agent is in the form of "prompt + question."

Then, the AI agent automatically generates Python code like the following. The AI agent does this work instead of a human. And as a result, "the top 10 customers most likely to be successfully marketed to" are presented. Customers with a purchase probability close to 100%! Amazing!

"Top 10 customers most likely to be successfully marketed to"

In this way, the user only needs to instruct "tell me the top 10 customers most likely to be successful," and the AI agent writes the code to calculate the purchase probability for each customer. This method can also be applied to various other things. I'm looking forward to future developments.

3. Future Expectations for Marketing AI Agents

As before, we implemented it with "smolagents" this time as well. It's easy to implement, and although the behavior isn't perfect, it's reasonably stable, so we plan to actively use it in 2025 to develop various AI agents. The code from this time has been published as a notebook (2). Also, the data used this time is relatively simple demo data with over 40,000 samples, but given the opportunity, I would like to try how the AI agent behaves with larger and more complex data. With more data, the possibilities will increase accordingly, so we can expect even more. Please look forward to the next AI agent article. Stay tuned!

1) Introducing smolagents, a simple library to build agents, Aymeric Roucher, Merve Noyan, Thomas Wolf, Hugging Face, Dec 31,2024
2) https://github.com/TOSHISTATS/AI-agent-for-Marketing_20250125/blob/main/AI_agent_for_Marketing_20250125.ipynb

Toshifumi Kuga

January 23, 2025

Hugging Face, smolagents, generative ai, LLM, agent, artificial intelligence

I tried using the new AI agent framework "smolagents". The code is simple and easy to use, and I recommend it for AI agent beginners!

Toshifumi Kuga

January 23, 2025

Hugging Face, smolagents, generative ai, LLM, agent, artificial intelligence

At the end of last year, a new AI agent framework called "smolagents" was released from Hugging Face (1). The code is simple and easy to use, and it even supports multi-agents. This time, I actually created a data analysis AI agent and tried various things. I hope it will be helpful.

1. Features of "smolagents"
The newly released "smolagents" has features that existing frameworks do not have. 1) First, it has a simple structure. You can execute an AI agent by writing 3 to 5 lines of code. It's perfect for those who want to start with AI agents. 2) Also, since it was released by Hugging Face, there are already a huge number of open-source models on the Hub. You can easily call and use them. Of course, it also supports proprietary models such as GPT4o, so you can use it for both open and closed models. 3) Finally, when you execute an agent, python code is generated and acted upon. Therefore, you can use the assets of the vast Python ecosystem, which is very convenient. Especially for those who specialize in data analysis like me, it is a perfect framework because you can use Python libraries such as sklearn.

2. An Agent for Predicting Credit Card Defaults

Now, let's actually build an AI agent. The theme is "AI agent by smolagent predicts credit card defaults". Normally, when building a default prediction model, you would code using machine learning libraries such as sklearn, but this time, I want to give it data and have the AI agent itself code internally using Python and automatically display the default probabilities of the first 10 customers.

For how to write the code, please refer to the official documentation , but here I would like to present what kind of prompts I actually wrote to make the AI agent predict defaults. The point is to specifically instruct it to "use sklearn's HistGradientBoostingClassifier for data analysis". This library is highly evaluated for creating machine learning models with high accuracy and ease of use. This is domain knowledge of data analysis, but by including that knowledge in the prompt, we expect to obtain higher accuracy.

Furthermore, as a question, I will add an instruction to specifically calculate "the default probability of 10 customers". The AI agent is input in the form of "prompt + question".

Then, the AI agent automatically generated the following Python code. Normally, this is what I would write myself, but the AI agent does it for me. And as a result, the default probabilities for 10 people are also shown. Amazing!

In this way, the user only needs to instruct "use sklearn to calculate the default probability", and the AI agent writes the code to calculate the default probability for each customer. And you will be able to make default predictions for each customer. I tried it with default prediction this time, but I think it can be covered to the probability in any business, such as marketing, customer churn and human resources. I'm looking forward to future developments.

3. Impressions after using "smolagents" for the first time

Until now, I used LangGraph to implement AI agents. I liked it because I could make various detailed settings, but it was necessary to code each of state, tool, node, edge, etc., and I felt that the hurdle was high for beginners to start with. After implementing it with "smolagents" this time, I found that if I coded according to the template, it would run by writing a few lines, so anyone could start. Of course, it fully meets the needs of AI developers, so I plan to actively use it in 2025 to develop various AI agents. I have published the code this time in a notebook (2). Please look forward to the next AI agent article. Stay tuned!

(1) Introducing smolagents, a simple library to build agents, Aymeric Roucher, Merve Noyan, Thomas Wolf, Hugging Face, Dec 31,2024
(2) AI-agent-to-predict-default-of-credit-card-with-smolagent_20250121

Toshifumi Kuga

March 25, 2024

agent, AI, artificial intelligence, Claude3, generative ai

I tried the new generative AI model "Claude3 Haiku". Fast, smart, and low-priced. I want to use it as an AI agent!

Toshifumi Kuga

March 25, 2024

agent, AI, artificial intelligence, Claude3, generative ai

On March 14th, "Claude3 Haiku" (1), the lightest model among the Claude3 generative AIs, was released and became available for use in web applications and APIs. I'm usually drawn to the highest-performing models, but this time I'd like to focus on the lightest one. Recently, algorithms that execute repetitive calculations like AI Agents have become more common. I want to use high-end models like GPT4, but they are very costly to run. So I was looking for a low-cost, high-performance model, and "Claude3 Haiku" is perfect as it costs 1/60th of the high-end model "Claude3 Opus" while still delivering excellent performance. I'd like to try it out here right away. The details of each model are as follows.

1. First, let's test the text

I checked if "Claude3 Haiku" knows about Hiroshima-style okonomiyaki, a hyper-local Japanese food. I used to live in Hiroshima, so I know it well, and I think this answer is generally good. The Japanese is clean, so it passes for now.

Next, I asked about transportation from Tokyo to Osaka. Unfortunately, there was one clear mistake. The travel time by bus is stated as "about 4 hours and 30 minutes," but in reality, it takes around 8 hours. This is a hallucination.

Then I asked about the "Five Forces," a framework for analyzing market competitiveness. It analyzed the automotive industry, and the analysis incorporates the latest examples, such as the threat of electric vehicles as substitutes, making it a sufficient quality starting point for discussion. However, the fact that it's not in a table format is a drawback.

2. Next, let's analyze images.

First, I asked about the number of smartphones, but unfortunately, it got it wrong. It may not be good at counting.

This is a photo of the Atomic Bomb Dome in Hiroshima. It answered this perfectly. It seems to understand famous Japanese buildings.

This is a photo of a streetcar running in Hiroshima City. I think it captures it pretty well overall. However, the streetcars don't run solely for tourists, so the explanation may be somewhat incomplete.

This is a flight information board at Haneda Airport. It perfectly understands the detailed information. Excellent.

Counting the number of cars in a parking lot is a difficult task for generative AI. This time it answered 60 cars, but there are actually 48. If the accuracy improves a bit more, it will reach a practical level, which is a bit disappointing.

3. Impressions of using "Claude3 Haiku".

Honestly, the performance was unbelievable for a general-use AI. The Japanese is natural and clean. The fact that it can incorporate and analyze images in the first place is groundbreaking. Multimodality has arrived in general-use AI. The calculation speed is also fast, and I think it will be applied to applications that require real-time responses. And the cost is low. This allows for plenty of interesting experiments. It's a savior for startups with tight cost constraints! I want to continue doing interesting experiments using "Claude3 Haiku". Stay tuned!

(1) Claude 3 Haiku: our fastest model yet 2024.3.14 Anthropic