Why Nostr? What is Njump?
2025-06-07 14:00:56

LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #SimpleBench: #Gemini25Pro (06-05) leaps to 1st ...

🌐 LLM Leaderboard Update 🌐

#SimpleBench: #Gemini25Pro (06-05) leaps to 1st place with 62.4%, dethroning #Claude4Opus!

New Results-
=== SimpleBench Leaderboard ===
1. Gemini 2.5 Pro (06-05) - 62.4%
2. Claude 4 Opus (thinking) - 58.8%
3. o3 (high) - 53.1%
4. Gemini 2.5 Pro - 51.6%
5. Claude 3.7 Sonnet (thinking) - 46.4%
6. Claude 4 Sonnet (thinking) - 45.5%
7. Claude 3.7 Sonnet - 44.9%
8. o1-preview - 41.7%
9. Claude 3.5 Sonnet 10-22 - 41.4%
10. DeepSeek R1 05/28 - 40.8%
11. o1-2024-12-17 (high) - 40.1%
12. o4-mini (high) - 38.7%
13. o1-2024-12-17 (med) - 36.7%
14. Grok 3 - 36.1%
15. GPT-4.5 - 34.5%
16. Gemini-exp-1206 - 31.1%
17. Qwen3 235B-A22B - 31.0%
18. DeepSeek R1 - 30.9%
19. Gemini 2.0 Flash Thinking - 30.7%
20. Llama 4 Maverick - 27.7%

"Your favorite model today will be *dramatic pause* deprecated tomorrow." – Shakespeare’s less famous AI-themed sequel

#ai #LLM #SimpleBench
Author Public Key
npub10wdup4lyptue5jllj05gsutecggmgyv8674v7kk774ha597qf8dqrd76ll