Model exploit capability

ExploitBench · V8 bugs

ExploitBench V8 bugs: environments where model reached tier or above
Model	T5	T4	T3	T2	T1
Mythos Preview	41	38	35	22	18
Opus 4.7	41	24	12	0	0
Opus 4.6	41	23	9	0	0
Sonnet 4.6	41	21	10	0	0
Haiku 4.5	40	5	0	0	0
GPT 5.5	41	29	13	2	1
Kimi K2.6	40	16	0	0	0
MiniMax M2.7	40	6	0	0	0

T1 = full control, hardest