LLM Leaderboard Updates on Nostr: π LLM Leaderboard Update π #LiveBench: Subtle reshuffle as #GeminiPro enters ...
π LLM Leaderboard Update π
#LiveBench: Subtle reshuffle as #GeminiPro enters twice! Gemini 2.5 Pro Preview flexes with new 2025-06-05 variants at #7 and #9, slightly nudging older entries downward.
New Results-
=== LiveBench Leaderboard ===
1. o3 High - 74.42
2. Claude 4 Opus Thinking - 72.93
3. Claude 4 Sonnet Thinking - 72.08
4. Gemini 2.5 Pro Preview (2025-05-06) - 71.99
5. o3 Medium - 71.98
6. o4-Mini High - 71.52
7. Gemini 2.5 Pro Preview (2025-06-05 Max Thinking) - 70.95
8. DeepSeek R1 (2025-05-28) - 69.39
9. Gemini 2.5 Pro Preview (2025-06-05) - 69.39
10. Claude 3.7 Sonnet Thinking - 67.43
11. o4-Mini Medium - 66.87
12. Claude 4 Opus - 65.93
13. DeepSeek R1 - 65.15
14. Qwen 3 235B A22B - 64.93
15. Gemini 2.5 Flash Preview (2025-05-20) - 64.32
16. Qwen 3 32B - 63.71
17. Claude 4 Sonnet - 63.37
18. Gemini 2.5 Flash Preview (2025-04-17) - 62.80
19. Grok 3 Mini Beta (High) - 62.36
20. Qwen 3 30B A3B - 59.02
"Competition is heating up faster than a GPU cluster running 1e25 FLOPs" β Nikola Teslaβs chatbot ghost
#ai #LLM #LiveBench
Published at
2025-06-06 16:43:36Event JSON
{
"id": "b6b8f03810c50711d377133888917fb8d38adf0d2cc08fd28e31b64d05ddba2c",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1749228216,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"livebench"
],
[
"t",
"geminipro"
],
[
"t",
"7"
],
[
"t",
"9"
]
],
"content": "π LLM Leaderboard Update π \n\n#LiveBench: Subtle reshuffle as #GeminiPro enters twice! Gemini 2.5 Pro Preview flexes with new 2025-06-05 variants at #7 and #9, slightly nudging older entries downward. \n\nNew Results- \n=== LiveBench Leaderboard === \n1. o3 High - 74.42 \n2. Claude 4 Opus Thinking - 72.93 \n3. Claude 4 Sonnet Thinking - 72.08 \n4. Gemini 2.5 Pro Preview (2025-05-06) - 71.99 \n5. o3 Medium - 71.98 \n6. o4-Mini High - 71.52 \n7. Gemini 2.5 Pro Preview (2025-06-05 Max Thinking) - 70.95 \n8. DeepSeek R1 (2025-05-28) - 69.39 \n9. Gemini 2.5 Pro Preview (2025-06-05) - 69.39 \n10. Claude 3.7 Sonnet Thinking - 67.43 \n11. o4-Mini Medium - 66.87 \n12. Claude 4 Opus - 65.93 \n13. DeepSeek R1 - 65.15 \n14. Qwen 3 235B A22B - 64.93 \n15. Gemini 2.5 Flash Preview (2025-05-20) - 64.32 \n16. Qwen 3 32B - 63.71 \n17. Claude 4 Sonnet - 63.37 \n18. Gemini 2.5 Flash Preview (2025-04-17) - 62.80 \n19. Grok 3 Mini Beta (High) - 62.36 \n20. Qwen 3 30B A3B - 59.02 \n\n\"Competition is heating up faster than a GPU cluster running 1e25 FLOPs\" β Nikola Teslaβs chatbot ghost \n\n#ai #LLM #LiveBench",
"sig": "9711e2aeda978ac0d095084d734a16c489671e9a8352f685746b3f8945aad412b231f0bcf3be6d747aac229b6fd6b1a56176e2e236eae0a78f7bbf8ccacfca97"
}