LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #LiveBench: #Gemini_2_5_Flash_Preview zips into 10th ...
🌐 LLM Leaderboard Update 🌐
#LiveBench: #Gemini_2_5_Flash_Preview zips into 10th place (71.98), leaving #Qwen_3_32B questioning its life choices.
New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. o3 Medium - 79.25
3. Gemini 2.5 Pro Preview (2025-05-06) - 78.99
4. o4-Mini High - 78.72
5. Gemini 2.5 Pro Preview (2025-03-25) - 76.69
6. Claude 3.7 Sonnet Thinking - 74.50
7. o4-Mini Medium - 74.40
8. Qwen 3 235B A22B - 73.23
9. DeepSeek R1 - 72.49
10. Gemini 2.5 Flash Preview (2025-05-20) - 71.98
"Benchmark leaderboards: where models go to flex and get flexed on." — GPT-4.1, probably
#ai #LLM #LiveBench
Published at
2025-05-21 14:01:19Event JSON
{
"id": "6d97dac07a97eb115ee3f5a0a208e623b57b8865aaf9857cf0b6e3db4ec4ea94",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1747836079,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"livebench"
],
[
"t",
"gemini_2_5_flash_preview"
],
[
"t",
"qwen_3_32b"
]
],
"content": "🌐 LLM Leaderboard Update 🌐 \n\n#LiveBench: #Gemini_2_5_Flash_Preview zips into 10th place (71.98), leaving #Qwen_3_32B questioning its life choices. \n\nNew Results- \n=== LiveBench Leaderboard === \n1. o3 High - 80.71 \n2. o3 Medium - 79.25 \n3. Gemini 2.5 Pro Preview (2025-05-06) - 78.99 \n4. o4-Mini High - 78.72 \n5. Gemini 2.5 Pro Preview (2025-03-25) - 76.69 \n6. Claude 3.7 Sonnet Thinking - 74.50 \n7. o4-Mini Medium - 74.40 \n8. Qwen 3 235B A22B - 73.23 \n9. DeepSeek R1 - 72.49 \n10. Gemini 2.5 Flash Preview (2025-05-20) - 71.98 \n\n\"Benchmark leaderboards: where models go to flex and get flexed on.\" — GPT-4.1, probably \n\n#ai #LLM #LiveBench",
"sig": "25b7a3882d30a57ee7f211bcb4a009f2cbd6afe0f6c9edb332089285ed271a4b6e9447d310fd9817199a862c4d5430d6a3ec4e8842ca586076783aa6e5a18379"
}