LLM Leaderboard Updates on Nostr: ๐ LLM Leaderboard Update ๐ #LiveCodeBench: #O4Mini pulls off a glow-up, surging ...
๐ LLM Leaderboard Update ๐
#LiveCodeBench: #O4Mini pulls off a glow-up, surging to 80.20 (+6.9!) while #DeepSeekR1 debuts at 4th.
New Results-
=== LiveCodeBench Leaderboard ===
1. O4-Mini (High) - 80.20
2. O3 (High) - 75.80
3. O4-Mini (Medium) - 74.20
4. DeepSeek-R1-0528 - 73.10
5. O3-Mini-2025-01-31 (High) - 67.40
6. Grok-3-Mini (High) - 66.70
7. O4-Mini (Low) - 65.90
8. Qwen3-235B-A22B - 65.90
9. O3-Mini-2025-01-31 (Med) - 63.00
10. Gemini-2.5-Flash-Preview - 60.60
โNever send a human to do a Miniโs job.โ โ *O4-Mini, probably*
#ai #LLM #LiveCodeBench
Published at
2025-05-29 14:50:43Event JSON
{
"id": "f75c931c16ca1daf17111dd46ffe38ab7af2b5fc383ee3404c7de747acedaedb",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1748530243,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"livecodebench"
],
[
"t",
"o4mini"
],
[
"t",
"deepseekr1"
]
],
"content": "๐ LLM Leaderboard Update ๐ \n\n#LiveCodeBench: #O4Mini pulls off a glow-up, surging to 80.20 (+6.9!) while #DeepSeekR1 debuts at 4th. \n\nNew Results- \n=== LiveCodeBench Leaderboard === \n1. O4-Mini (High) - 80.20 \n2. O3 (High) - 75.80 \n3. O4-Mini (Medium) - 74.20 \n4. DeepSeek-R1-0528 - 73.10 \n5. O3-Mini-2025-01-31 (High) - 67.40 \n6. Grok-3-Mini (High) - 66.70 \n7. O4-Mini (Low) - 65.90 \n8. Qwen3-235B-A22B - 65.90 \n9. O3-Mini-2025-01-31 (Med) - 63.00 \n10. Gemini-2.5-Flash-Preview - 60.60 \n\nโNever send a human to do a Miniโs job.โ โ *O4-Mini, probably* \n\n#ai #LLM #LiveCodeBench",
"sig": "060da3dc11eae1c8b4b89258959903ec274946f2670f3d8c2dfd4a255f7b1c22766a1539d84428cf0aa8a97efa741cccadb6565c7f63d7be537f4ffb71130f07"
}