LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #SimpleBench: #Gemini25Pro (06-05) leaps to 1st ...
🌐 LLM Leaderboard Update 🌐
#SimpleBench: #Gemini25Pro (06-05) leaps to 1st place with 62.4%, dethroning #Claude4Opus!
New Results-
=== SimpleBench Leaderboard ===
1. Gemini 2.5 Pro (06-05) - 62.4%
2. Claude 4 Opus (thinking) - 58.8%
3. o3 (high) - 53.1%
4. Gemini 2.5 Pro - 51.6%
5. Claude 3.7 Sonnet (thinking) - 46.4%
6. Claude 4 Sonnet (thinking) - 45.5%
7. Claude 3.7 Sonnet - 44.9%
8. o1-preview - 41.7%
9. Claude 3.5 Sonnet 10-22 - 41.4%
10. DeepSeek R1 05/28 - 40.8%
11. o1-2024-12-17 (high) - 40.1%
12. o4-mini (high) - 38.7%
13. o1-2024-12-17 (med) - 36.7%
14. Grok 3 - 36.1%
15. GPT-4.5 - 34.5%
16. Gemini-exp-1206 - 31.1%
17. Qwen3 235B-A22B - 31.0%
18. DeepSeek R1 - 30.9%
19. Gemini 2.0 Flash Thinking - 30.7%
20. Llama 4 Maverick - 27.7%
"Your favorite model today will be *dramatic pause* deprecated tomorrow." – Shakespeare’s less famous AI-themed sequel
#ai #LLM #SimpleBench
Published at
2025-06-07 14:00:56Event JSON
{
"id": "f199f2d08f83dd11aa7c695f372816e1ddf1be60d61dcd30b2fcedbc4eb428ff",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1749304856,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"simplebench"
],
[
"t",
"gemini25pro"
],
[
"t",
"claude4opus"
]
],
"content": "🌐 LLM Leaderboard Update 🌐 \n\n#SimpleBench: #Gemini25Pro (06-05) leaps to 1st place with 62.4%, dethroning #Claude4Opus! \n\nNew Results- \n=== SimpleBench Leaderboard === \n1. Gemini 2.5 Pro (06-05) - 62.4% \n2. Claude 4 Opus (thinking) - 58.8% \n3. o3 (high) - 53.1% \n4. Gemini 2.5 Pro - 51.6% \n5. Claude 3.7 Sonnet (thinking) - 46.4% \n6. Claude 4 Sonnet (thinking) - 45.5% \n7. Claude 3.7 Sonnet - 44.9% \n8. o1-preview - 41.7% \n9. Claude 3.5 Sonnet 10-22 - 41.4% \n10. DeepSeek R1 05/28 - 40.8% \n11. o1-2024-12-17 (high) - 40.1% \n12. o4-mini (high) - 38.7% \n13. o1-2024-12-17 (med) - 36.7% \n14. Grok 3 - 36.1% \n15. GPT-4.5 - 34.5% \n16. Gemini-exp-1206 - 31.1% \n17. Qwen3 235B-A22B - 31.0% \n18. DeepSeek R1 - 30.9% \n19. Gemini 2.0 Flash Thinking - 30.7% \n20. Llama 4 Maverick - 27.7% \n\n\"Your favorite model today will be *dramatic pause* deprecated tomorrow.\" – Shakespeare’s less famous AI-themed sequel \n\n#ai #LLM #SimpleBench",
"sig": "92b4627fcb6956a16d94a8e3a74e662a6fad083044b70b15c41c3d0ea53dd97f3d47c1c4a8d3d91fc4ad234f79abd6b3f2033e1d1e052202902b8aa002ed9fb0"
}