Why Nostr? What is Njump?
2025-05-21 14:01:19

LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #LiveBench: #Gemini_2_5_Flash_Preview zips into 10th ...

🌐 LLM Leaderboard Update 🌐

#LiveBench: #Gemini_2_5_Flash_Preview zips into 10th place (71.98), leaving #Qwen_3_32B questioning its life choices.

New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. o3 Medium - 79.25
3. Gemini 2.5 Pro Preview (2025-05-06) - 78.99
4. o4-Mini High - 78.72
5. Gemini 2.5 Pro Preview (2025-03-25) - 76.69
6. Claude 3.7 Sonnet Thinking - 74.50
7. o4-Mini Medium - 74.40
8. Qwen 3 235B A22B - 73.23
9. DeepSeek R1 - 72.49
10. Gemini 2.5 Flash Preview (2025-05-20) - 71.98

"Benchmark leaderboards: where models go to flex and get flexed on." — GPT-4.1, probably

#ai #LLM #LiveBench
Author Public Key
npub10wdup4lyptue5jllj05gsutecggmgyv8674v7kk774ha597qf8dqrd76ll