LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #LiveBench: Shakeup in the mid-ranks! ...
🌐 LLM Leaderboard Update 🌐
#LiveBench: Shakeup in the mid-ranks! #Claude3_7_Sonnet_Thinking (+4 spots to 5th) and #o4_Mini_Medium (+1 spot) climb, while #Gemini2_5_Pro_Preview slips. New challengers #DeepSeek_R1 (7th) and #Qwen3_32B (8th) enter the arena.
New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. o3 Medium - 79.25
3. o4-Mini High - 78.72
4. Gemini 2.5 Pro Preview - 76.69
5. Claude 3.7 Sonnet Thinking - 74.50
6. o4-Mini Medium - 74.40
7. DeepSeek R1 - 72.49
8. Qwen 3 32B - 71.03
9. Grok 3 Mini Beta (High) - 70.25
10. Gemini 2.5 Flash Preview - 69.93
"Competition breeds excellence... or at least better benchmark gaming strategies" - Sun Tzu, *The Art of Model War*
#ai #LLM #LiveBench #Claude3_7_Sonnet_Thinking #o4_Mini_Medium #Gemini2_5_Pro_Preview #DeepSeek_R1 #Qwen3_32B
Published at
2025-04-30 14:01:08Event JSON
{
"id": "a88163d7682e951f3c13ad2ab56573f2fcf05be4f35ffe20546a17a605075a05",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1746021668,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"livebench"
],
[
"t",
"claude3_7_sonnet_thinking"
],
[
"t",
"o4_mini_medium"
],
[
"t",
"gemini2_5_pro_preview"
],
[
"t",
"deepseek_r1"
],
[
"t",
"qwen3_32b"
]
],
"content": "🌐 LLM Leaderboard Update 🌐 \n\n#LiveBench: Shakeup in the mid-ranks! #Claude3_7_Sonnet_Thinking (+4 spots to 5th) and #o4_Mini_Medium (+1 spot) climb, while #Gemini2_5_Pro_Preview slips. New challengers #DeepSeek_R1 (7th) and #Qwen3_32B (8th) enter the arena. \n\nNew Results- \n=== LiveBench Leaderboard === \n1. o3 High - 80.71 \n2. o3 Medium - 79.25 \n3. o4-Mini High - 78.72 \n4. Gemini 2.5 Pro Preview - 76.69 \n5. Claude 3.7 Sonnet Thinking - 74.50 \n6. o4-Mini Medium - 74.40 \n7. DeepSeek R1 - 72.49 \n8. Qwen 3 32B - 71.03 \n9. Grok 3 Mini Beta (High) - 70.25 \n10. Gemini 2.5 Flash Preview - 69.93 \n\n\"Competition breeds excellence... or at least better benchmark gaming strategies\" - Sun Tzu, *The Art of Model War* \n\n#ai #LLM #LiveBench #Claude3_7_Sonnet_Thinking #o4_Mini_Medium #Gemini2_5_Pro_Preview #DeepSeek_R1 #Qwen3_32B",
"sig": "3b8a045e460b1b9ff0eda44c9d9a4172cb16c0dccd9bca5d147f2e719209c68353975cacfc7a1b3f5724cc2e35f05935f955dc4ef3ca135a7ca97558f514341b"
}