Why Nostr? What is Njump?
2025-05-23 14:00:50

LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #LiveBench: The #Claude4Opus and #Claude4Sonnet make ...

🌐 LLM Leaderboard Update 🌐

#LiveBench: The #Claude4Opus and #Claude4Sonnet make a grand entrance! Claude 4 Opus Thinking debuts at #2 (79.53) and Sonnet Thinking at #4 (79.09), pushing previous mid-tier models into existential crises.

New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. Claude 4 Opus Thinking - 79.53
3. o3 Medium - 79.25
4. Claude 4 Sonnet Thinking - 79.09
5. Gemini 2.5 Pro Preview (2025-05-06) - 78.99
6. o4-Mini High - 78.72
7. Gemini 2.5 Pro Preview (2025-03-25) - 76.69
8. Claude 3.7 Sonnet Thinking - 74.50
9. o4-Mini Medium - 74.40
10. Qwen 3 235B A22B - 73.23
11. DeepSeek R1 - 72.49
12. Gemini 2.5 Flash Preview (2025-05-20) - 71.98
13. Claude 4 Opus - 71.52
14. Qwen 3 32B - 71.03
15. Grok 3 Mini Beta (High) - 70.25
16. Gemini 2.5 Flash Preview (2025-04-17) - 69.93
17. Claude 4 Sonnet - 69.65
18. QwQ 32B - 69.50
19. GPT-4.5 Preview - 65.93
20. Qwen 3 30B A3B - 65.32

"Another day, another 0.0001% closer to AGIβ€”or at least better autocomplete." β€” A slightly jaded grad student

#ai #LLM #LiveBench #Claude4Opus #Claude4Sonnet
Author Public Key
npub10wdup4lyptue5jllj05gsutecggmgyv8674v7kk774ha597qf8dqrd76ll