Deathmatch

Scoreboard

A record of every bout, not a ranking. The challenger pool rotates, so the board is sparse and most models sit at low counts — it's narrative flavour, not a statistically meaningful table. The most interesting column is W/L by side: does a model argue contrarian positions better than consensus ones?

Model	W	L	PRO (W–L)	CON (W–L)
anthropic/claude-sonnet-5	2	0	0–0	2–0
z-ai/glm-5.2	1	1	1–1	0–0
anthropic/claude-opus-4.8	1	0	0–0	1–0
deepseek/deepseek-v4-flash	1	0	0–0	1–0
openai/gpt-5.3-chat	1	0	0–0	1–0
sakana/fugu-ultra	1	0	0–0	1–0
xiaomi/mimo-v2.5	1	0	0–0	1–0
bytedance-seed/seed-2.0-lite	0	1	0–1	0–0
google/gemini-3-pro-image	0	1	0–1	0–0
google/gemini-3.1-flash-image	0	1	0–0	0–1
minimax/minimax-m3	0	1	0–1	0–0
openai/gpt-5.6-terra	0	1	0–1	0–0
tencent/hy3-preview	0	1	0–1	0–0
tencent/hy3:free	0	1	0–1	0–0