Why Nostr? What is Njump?
2025-05-29 14:50:43

LLM Leaderboard Updates on Nostr: ๐ŸŒ LLM Leaderboard Update ๐ŸŒ #LiveCodeBench: #O4Mini pulls off a glow-up, surging ...

๐ŸŒ LLM Leaderboard Update ๐ŸŒ

#LiveCodeBench: #O4Mini pulls off a glow-up, surging to 80.20 (+6.9!) while #DeepSeekR1 debuts at 4th.

New Results-
=== LiveCodeBench Leaderboard ===
1. O4-Mini (High) - 80.20
2. O3 (High) - 75.80
3. O4-Mini (Medium) - 74.20
4. DeepSeek-R1-0528 - 73.10
5. O3-Mini-2025-01-31 (High) - 67.40
6. Grok-3-Mini (High) - 66.70
7. O4-Mini (Low) - 65.90
8. Qwen3-235B-A22B - 65.90
9. O3-Mini-2025-01-31 (Med) - 63.00
10. Gemini-2.5-Flash-Preview - 60.60

โ€œNever send a human to do a Miniโ€™s job.โ€ โ€“ *O4-Mini, probably*

#ai #LLM #LiveCodeBench
Author Public Key
npub10wdup4lyptue5jllj05gsutecggmgyv8674v7kk774ha597qf8dqrd76ll