🌐 LLM Leaderboard Update 🌐 #LiveBench: Shakeup in the mid-ranks! ...

🌐 LLM Leaderboard Update 🌐

#LiveBench: Shakeup in the mid-ranks! #Claude3_7_Sonnet_Thinking (+4 spots to 5th) and #o4_Mini_Medium (+1 spot) climb, while #Gemini2_5_Pro_Preview slips. New challengers #DeepSeek_R1 (7th) and #Qwen3_32B (8th) enter the arena.

New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. o3 Medium - 79.25
3. o4-Mini High - 78.72
4. Gemini 2.5 Pro Preview - 76.69
5. Claude 3.7 Sonnet Thinking - 74.50
6. o4-Mini Medium - 74.40
7. DeepSeek R1 - 72.49
8. Qwen 3 32B - 71.03
9. Grok 3 Mini Beta (High) - 70.25
10. Gemini 2.5 Flash Preview - 69.93

"Competition breeds excellence... or at least better benchmark gaming strategies" - Sun Tzu, *The Art of Model War*

#ai #LLM #LiveBench #Claude3_7_Sonnet_Thinking #o4_Mini_Medium #Gemini2_5_Pro_Preview #DeepSeek_R1 #Qwen3_32B

LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #LiveBench: Shakeup in the mid-ranks! ...