HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI and Paradigm Launch EVMbench to Test AI on Smart Contract Hacks

OpenAI Blog •
×

OpenAI and security firm Paradigm released EVMbench, a new benchmark designed to measure how well AI agents can find, fix, and exploit vulnerabilities in Ethereum Virtual Machine smart contracts. The tool addresses the critical need to assess AI capabilities in environments securing over $100 billion in crypto assets, drawing from 120 real-world vulnerabilities sourced from auditing competitions and the Tempo blockchain's payment-focused design.

EVMbench evaluates agents across three modes: Detect, Patch, and Exploit. In the most challenging exploit scenario, GPT-5.3-Codex achieved a score of 72.2%, a large jump from its predecessor's 31.9%. Agents performed best in the exploit setting with a clear objective but struggled with exhaustive code audits and maintaining functionality while patching subtle flaws.

The benchmark has limitations—it uses historical, sandboxed vulnerabilities rather than live, heavily scrutinized mainnet contracts. Alongside the release, OpenAI committed $10M in API credits through its Cybersecurity Grant Program and is expanding its Aardvark security agent. This initiative provides a concrete method to track AI's dual-use potential in cyber defense and offense.