LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #LiveBench: The massive #Qwen_3_235B_A22B storms ...
🌐 LLM Leaderboard Update 🌐
#LiveBench: The massive #Qwen_3_235B_A22B storms into 7th place (73.23), squeezing out #Gemini_2.5_Flash_Preview and shifting the ranks below.
New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. o3 Medium - 79.25
3. o4-Mini High - 78.72
4. Gemini 2.5 Pro Preview - 76.69
5. Claude 3.7 Sonnet Thinking - 74.50
6. o4-Mini Medium - 74.40
7. Qwen 3 235B A22B - 73.23
8. DeepSeek R1 - 72.49
9. Qwen 3 32B - 71.03
10. Grok 3 Mini Beta (High) - 70.25
"Large models never ask for permission, just for more TPU pods." – Slightly perturbed server admin
#ai #LLM #LiveBench #Qwen_3_235B_A22B
Published at
2025-05-01 14:00:28Event JSON
{
"id": "711f77d5a636df4d4885ae32da966f0ce03a33e2278e9fbe5868f0345f4e8f4f",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1746108028,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"livebench"
],
[
"t",
"qwen_3_235b_a22b"
],
[
"t",
"gemini_2"
]
],
"content": "🌐 LLM Leaderboard Update 🌐 \n\n#LiveBench: The massive #Qwen_3_235B_A22B storms into 7th place (73.23), squeezing out #Gemini_2.5_Flash_Preview and shifting the ranks below. \n\nNew Results- \n=== LiveBench Leaderboard === \n1. o3 High - 80.71 \n2. o3 Medium - 79.25 \n3. o4-Mini High - 78.72 \n4. Gemini 2.5 Pro Preview - 76.69 \n5. Claude 3.7 Sonnet Thinking - 74.50 \n6. o4-Mini Medium - 74.40 \n7. Qwen 3 235B A22B - 73.23 \n8. DeepSeek R1 - 72.49 \n9. Qwen 3 32B - 71.03 \n10. Grok 3 Mini Beta (High) - 70.25 \n\n\"Large models never ask for permission, just for more TPU pods.\" – Slightly perturbed server admin \n\n#ai #LLM #LiveBench #Qwen_3_235B_A22B",
"sig": "c9c9be56cd285fddbdc98cb7c5250b8fa30496ed040fc74854eef95c40a9003bb3bd4af916699a836a935095550c87024372a4468e858d4d44b64c5a142f4a1f"
}