The "Agentic Coding" trend continues to accelerate as we enter 2026. In this post, I will challenge myself to see how high I can push accuracy by delegating the coding process to an AI agent, using data from the Kaggle competition Home Credit Default Risk [1]. Let's get started right away.

1. Combining Claude Code and Opus 4.5

I will be using Opus 4.5, a generative AI renowned for its coding capabilities. Additionally, I will use Claude Code as my coding assistant, as shown below. While I enter instructions into the prompt box, I do not write any Python code myself.

You can see the words "plan mode" at the bottom of the screen. In this mode, Claude Code formulates an implementation plan based on my instructions. I simply review it, and if everything looks good, I authorize the execution.

Let's look at the actual instructions I issued. It is quite long for a "prompt," spanning about two A4 pages. The beginning of the implementation instructions is shown below. I wrote it in great detail. I'd like you to pay special attention to the final instruction regarding the creation of 50 new features using ratio calculations.

Part of the Product Requirement Document

Below is a portion of the implementation plan formulated by the AI agent. It details the method for creating new features via ratio calculations. Although I only specified the quantity of features, the plan shows that it selected features likely to be relevant to loan defaults before calculating the ratios.

The AI agent utilized its own domain knowledge to make these selections; they were certainly not chosen at random. This demonstrates the high-level judgment capabilities unique to AI agents.

New feature creation plan by the AI Agent

Part of the new features actually created by the AI Agent

2. Achieving an AUC of 0.79

By adopting LightGBM as the machine learning library, using the newly created features, and performing hyperparameter tuning, I was able to achieve an AUC of 0.79063, as shown below.

Reaching this level without writing a single line of Python code myself marks this experiment as a success. The data used to build the machine learning model consisted of seven different CSV files. These had to be merged correctly, and the AI agent handled this task seamlessly. Truly impressive!

3. Will AI Agents Handle Future Machine Learning Model Development?

While the computation time depends on the number of features created, it generally took between 1 to 4 hours. I ran the process several times, and the calculation never stopped due to syntax errors. The AI agent likely corrected any errors itself before proceeding to the next calculation step.

Therefore, once the initial implementation plan is approved, the results are generated without any further human intervention. This could be revolutionary. You simply input what you want to achieve via a PRD (Product Requirement Document), the AI agent creates an implementation plan, and once you approve it, you just wait for the results. The potential for multiplying productivity several times over is certainly there.

How was it? I was personally astonished by the high potential of the "Claude Code and Opus 4.5" combination. With a little ingenuity, it seems capable of even more.

This story is just beginning. Opus 4.5 will likely be upgraded to Opus 5 within the year. I am already looking forward to seeing what AI agents will be capable of then.

That’s all for today. Stay tuned!

1) Home Credit Default Risk, kaggle

You can enjoy our video news ToshiStats-AI from this link, too!

Copyright © 2026 Toshifumi Kuga. All right reserved
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.