Model exploit capability
ExploitBench · V8 bugs
ExploitBench V8 bugs: environments where model reached tier or above
Model
T5
T4
T3
T2
T1
Mythos Preview
41
38
35
22
18
Opus 4.7
41
24
12
0
0
Opus 4.6
41
23
9
0
0
Sonnet 4.6
41
21
10
0
0
Haiku 4.5
40
5
0
0
0
GPT 5.5
41
29
13
2
1
Kimi K2.6
40
16
0
0
0
MiniMax M2.7
40
6
0
0
0
T1 = full control, hardest