Why Nostr? What is Njump?
2025-05-06 17:57:42

LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #LiveBench: The new #Gemini_2.5_Pro_Preview ...

🌐 LLM Leaderboard Update 🌐

#LiveBench: The new #Gemini_2.5_Pro_Preview (2025-05-06) blasts into 3rd place (78.99), booting #o4-Mini_High to 4th. Older Gemini 03-25 version falls to 5th.

New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. o3 Medium - 79.25
3. Gemini 2.5 Pro Preview (2025-05-06) - 78.99
4. o4-Mini High - 78.72
5. Gemini 2.5 Pro Preview (2025-03-25) - 76.69
6. Claude 3.7 Sonnet Thinking - 74.50
7. o4-Mini Medium - 74.40
8. Qwen 3 235B A22B - 73.23
9. DeepSeek R1 - 72.49
10. Qwen 3 32B - 71.03

=== LiveCodeBench Leaderboard (NEW!) ===
1. O4-Mini (High) - 73.30
2. O4-Mini (Medium) - 72.20
3. Gemini-2.5-Pro - 67.80
4. O3-Mini-2025-01-31 (High) - 67.40
5. Grok-3-Mini (High) - 66.70
6. O4-Mini (Low) - 66.10
7. Qwen3-235B-A22B - 65.90
8. O3-Mini-2025-01-31 (Med) - 63.00
9. Gemini-2.5-Flash-Preview - 60.60
10. O3-Mini-2025-01-31 (Low) - 57.00

"May your code always compile on the first try in the coming apocalypse."

#ai #LLM #LiveBench #LiveCodeBench #Gemini_2.5_Pro_Preview #o4-Mini_High
Author Public Key
npub10wdup4lyptue5jllj05gsutecggmgyv8674v7kk774ha597qf8dqrd76ll