OpenAI and Paradigm Launch EVMbench for Ethereum Security

ETH-2,42%
  • OpenAI and Paradigm built EVMbench from 120 real audit vulnerabilities.

  • Benchmark tests AI in detect, patch, and exploit modes using sandboxed EVM environments.

  • GPT-5.3-Codex scored 72.2% in exploit mode, outperforming earlier GPT-5 results.

OpenAI, working with Paradigm, unveiled a new benchmark to test AI performance on Ethereum smart contract security. The release, announced this week, introduced EVMbench as a way to measure how AI agents detect, patch, and exploit contract flaws. The effort targets rising risks, as smart contracts secure over $100 billion in crypto assets across EVM networks.

Benchmark Built From Real-World Audit Failures

According to OpenAI, EVMbench draws from 120 high-severity vulnerabilities identified across 40 professional smart contract audits. Notably, many of these issues originated from open audit competitions, including Code4rena. The benchmark focuses on real bugs rather than synthetic examples.

In addition, OpenAI said the dataset includes scenarios linked to security work on the Tempo chain. Tempo operates as a payment-focused Layer-1 network built for stablecoin transfers. Because of that, these cases introduce payment logic risks into the benchmark environment.

To support realistic testing, engineers reused exploit proof-of-concept scripts where available. However, they manually built missing components when documentation proved incomplete. OpenAI said it preserved exploitability while ensuring patches could compile correctly.

Three Testing Modes Stress AI Agents

EVMbench evaluates agents in detect, patch, and exploit modes. In detect mode, agents scan repositories and receive scores based on confirmed vulnerability recall. In patch mode, agents must fix flaws while preserving original contract behavior.

Exploit mode, however, simulates full fund-draining attacks within a sandbox blockchain. OpenAI said graders confirm outcomes through transaction replay and on-chain state checks. To ensure consistency, the company built a Rust-based harness for deterministic deployments.

The exploit tests run in a local Anvil environment, not live networks. OpenAI noted that all vulnerabilities are historical and publicly disclosed. Additionally, the harness restricts unsafe RPC calls to reduce misuse.

Results and Team Expansion

In reported results, GPT-5.3-Codex achieved a 72.2% score in exploit mode. By comparison, GPT-5 reached 31.9%, despite launching months earlier. However, OpenAI said detection and patch coverage remains incomplete.

Alongside EVMbench, OpenAI confirmed a key hire. Peter Steinberger, founder of OpenClaw, joined the company to work on agent development. Sam Altman confirmed the move on X, noting Steinberger will lead next-generation personal agent projects.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Cysic’s Venus zkVM goes open source as Ethereum eyes proof markets

Cysic open‑sources its Venus zkVM engine, recasting proof generation as a global computation graph and positioning ZisK inside Ethereum's emerging EIP‑8025 proof market. Summary Venus replaces a traditional hardware abstraction layer with a graph‑based view of the entire proving pipeline,

Cryptonews36m ago

ETH breaks through 2100 USDT, the 24-hour drop narrows to 1.7%

Gate News message, April 7, according to a certain CEX quote, ETH has broken above 2100 USDT and is now reported at 2100.24 USDT, with the 24-hour decline narrowing to 1.7%.

GateNews1h ago

ETH 15-minute rise of 0.58%: large on-chain transfers strengthen liquidity, and combined with easing ETF selling pressure, it lifts spot buying demand

2026-04-07 17:30 to 17:45 (UTC), over the past 15 minutes ETH’s return was +0.58%. The price ranged from 2085.28 to 2115.38 USDT, with a swing of 1.44%. Trading activity was active during this period; market attention rose quickly, short-term fluctuations intensified, and capital flow liquidity increased noticeably. The main driving force behind this unusual move was that large on-chain transfers were concentrated and occurred around the same time. Some long-term holdings were transferred to exchange addresses, greatly boosting market liquidity and causing an increase in the depth of spot buy orders. In addition, the trend of ETF fund outflows was significantly reduced in this window

GateNews1h ago

BlackRock extracts 2,607 BTC and 28,391 ETH from a certain custody platform

Gate News message, on April 7, according to Lookonchain monitoring, BlackRock withdrew 2,607 BTC (worth $177.56 million) and 28,391 ETH (worth $59.00 million) from a certain custody platform.

GateNews2h ago

Charles Schwab Wealth Management Warning: Allocating 1%-3% of an investment portfolio to BTC/ETH can significantly alter the risk profile.

Gate News message: On April 7, the U.S. financial giant Charles Schwab released a research bulletin warning that even if only 1%-3% of funds are allocated to Bitcoin or Ethereum within an investment portfolio, it may significantly change the portfolio’s overall risk characteristics. The research report notes that Bitcoin and Ethereum have both historically experienced drawdowns of more than 70%, far higher than the volatility levels of stocks or bonds; therefore, even small allocations can have a noticeable impact during periods of market volatility. Charles Schwab proposed two cryptocurrency allocation approaches: one is the traditional portfolio theory method, which allocates based on expected returns, volatility, and correlation; the other is a risk-based method, which determines the share of crypto assets according to the level of risk one is willing to take, shifting the focus from returns to risk tolerance.

GateNews4h ago
Comment
0/400
No comments