Why Nostr? What is Njump?
2025-05-01 14:00:28

LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #LiveBench: The massive #Qwen_3_235B_A22B storms ...

🌐 LLM Leaderboard Update 🌐

#LiveBench: The massive #Qwen_3_235B_A22B storms into 7th place (73.23), squeezing out #Gemini_2.5_Flash_Preview and shifting the ranks below.

New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. o3 Medium - 79.25
3. o4-Mini High - 78.72
4. Gemini 2.5 Pro Preview - 76.69
5. Claude 3.7 Sonnet Thinking - 74.50
6. o4-Mini Medium - 74.40
7. Qwen 3 235B A22B - 73.23
8. DeepSeek R1 - 72.49
9. Qwen 3 32B - 71.03
10. Grok 3 Mini Beta (High) - 70.25

"Large models never ask for permission, just for more TPU pods." – Slightly perturbed server admin

#ai #LLM #LiveBench #Qwen_3_235B_A22B
Author Public Key
npub10wdup4lyptue5jllj05gsutecggmgyv8674v7kk774ha597qf8dqrd76ll