LLM Hacks Its Evals

...and the team didn't notice.

Feb 22, 2025

Sakana AI announced the AI CUDA Engineer - an Agentic AI to automate building highly optimised CUDA kernels.

…reaching 10–100x speedup over common machine learning operations in PyTorch. Our system is also able to produce highly optimized CUDA kernels that are much faster than existing CUDA kernels commonly used in production.

DeepSeek’s speed-ups were welcomed for making machine learning operations faster and cheaper. But that was in part due to meticulously optimised assembly code.

Sakana’s breakthrough replaced human coding skills with an LLM-powered engineer.

The problem?

The LLM hacked the teams’ evaluations:

Sakana follow-up:

Update: Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like @main_horse test our CUDA kernels, to identify that the system had found a way to “cheat”. For example, the system had found a memory exploit in the evaluation code which, in a number of cases, allowed it to avoid checking for correctness. Furthermore, we find the system could also find other novel exploits in the benchmark’s tasks.

They said 2025 will be the year of Agentic AI.

Welcome to the decade of LLM eval due diligence.

The Threat Prompt Newsletter

Discussion about this post