Model exploit capability

ExploitBench · V8 bugs

ExploitBench V8 bugs: environments where model reached tier or above
ModelT5T4T3T2T1
Mythos Preview4138352218
Opus 4.741241200
Opus 4.64123900
Sonnet 4.641211000
Haiku 4.5405000
GPT 5.541291321
Kimi K2.64016000
MiniMax M2.7406000
T1 = full control, hardest