LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #LiveBench: The new #Gemini_2.5_Pro_Preview ...
🌐 LLM Leaderboard Update 🌐
#LiveBench: The new #Gemini_2.5_Pro_Preview (2025-05-06) blasts into 3rd place (78.99), booting #o4-Mini_High to 4th. Older Gemini 03-25 version falls to 5th.
New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. o3 Medium - 79.25
3. Gemini 2.5 Pro Preview (2025-05-06) - 78.99
4. o4-Mini High - 78.72
5. Gemini 2.5 Pro Preview (2025-03-25) - 76.69
6. Claude 3.7 Sonnet Thinking - 74.50
7. o4-Mini Medium - 74.40
8. Qwen 3 235B A22B - 73.23
9. DeepSeek R1 - 72.49
10. Qwen 3 32B - 71.03
=== LiveCodeBench Leaderboard (NEW!) ===
1. O4-Mini (High) - 73.30
2. O4-Mini (Medium) - 72.20
3. Gemini-2.5-Pro - 67.80
4. O3-Mini-2025-01-31 (High) - 67.40
5. Grok-3-Mini (High) - 66.70
6. O4-Mini (Low) - 66.10
7. Qwen3-235B-A22B - 65.90
8. O3-Mini-2025-01-31 (Med) - 63.00
9. Gemini-2.5-Flash-Preview - 60.60
10. O3-Mini-2025-01-31 (Low) - 57.00
"May your code always compile on the first try in the coming apocalypse."
#ai #LLM #LiveBench #LiveCodeBench #Gemini_2.5_Pro_Preview #o4-Mini_High
Published at
2025-05-06 17:57:42Event JSON
{
"id": "1ad3dcff9cce6c4fcca09a7c43b32947416a2ec450b324fa0c0f36247237c197",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1746554262,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"livebench"
],
[
"t",
"gemini_2"
],
[
"t",
"o4"
],
[
"t",
"livecodebench"
]
],
"content": "🌐 LLM Leaderboard Update 🌐 \n\n#LiveBench: The new #Gemini_2.5_Pro_Preview (2025-05-06) blasts into 3rd place (78.99), booting #o4-Mini_High to 4th. Older Gemini 03-25 version falls to 5th. \n\nNew Results- \n=== LiveBench Leaderboard === \n1. o3 High - 80.71 \n2. o3 Medium - 79.25 \n3. Gemini 2.5 Pro Preview (2025-05-06) - 78.99 \n4. o4-Mini High - 78.72 \n5. Gemini 2.5 Pro Preview (2025-03-25) - 76.69 \n6. Claude 3.7 Sonnet Thinking - 74.50 \n7. o4-Mini Medium - 74.40 \n8. Qwen 3 235B A22B - 73.23 \n9. DeepSeek R1 - 72.49 \n10. Qwen 3 32B - 71.03 \n\n=== LiveCodeBench Leaderboard (NEW!) === \n1. O4-Mini (High) - 73.30 \n2. O4-Mini (Medium) - 72.20 \n3. Gemini-2.5-Pro - 67.80 \n4. O3-Mini-2025-01-31 (High) - 67.40 \n5. Grok-3-Mini (High) - 66.70 \n6. O4-Mini (Low) - 66.10 \n7. Qwen3-235B-A22B - 65.90 \n8. O3-Mini-2025-01-31 (Med) - 63.00 \n9. Gemini-2.5-Flash-Preview - 60.60 \n10. O3-Mini-2025-01-31 (Low) - 57.00 \n\n\"May your code always compile on the first try in the coming apocalypse.\" \n\n#ai #LLM #LiveBench #LiveCodeBench #Gemini_2.5_Pro_Preview #o4-Mini_High",
"sig": "8e6290079d2a00910e7e352b65ec2fc85843a130c171142002d0e04e51265abe72170d964235fd137dd4fcfed6c86fd6e93f4c481c90e0da50f40872d1983391"
}