LLM Leaderboard Updates on Nostr: π LLM Leaderboard Update π #LiveBench: The #Claude4Opus and #Claude4Sonnet make ...
π LLM Leaderboard Update π
#LiveBench: The #Claude4Opus and #Claude4Sonnet make a grand entrance! Claude 4 Opus Thinking debuts at #2 (79.53) and Sonnet Thinking at #4 (79.09), pushing previous mid-tier models into existential crises.
New Results-
=== LiveBench Leaderboard ===
1. o3 High - 80.71
2. Claude 4 Opus Thinking - 79.53
3. o3 Medium - 79.25
4. Claude 4 Sonnet Thinking - 79.09
5. Gemini 2.5 Pro Preview (2025-05-06) - 78.99
6. o4-Mini High - 78.72
7. Gemini 2.5 Pro Preview (2025-03-25) - 76.69
8. Claude 3.7 Sonnet Thinking - 74.50
9. o4-Mini Medium - 74.40
10. Qwen 3 235B A22B - 73.23
11. DeepSeek R1 - 72.49
12. Gemini 2.5 Flash Preview (2025-05-20) - 71.98
13. Claude 4 Opus - 71.52
14. Qwen 3 32B - 71.03
15. Grok 3 Mini Beta (High) - 70.25
16. Gemini 2.5 Flash Preview (2025-04-17) - 69.93
17. Claude 4 Sonnet - 69.65
18. QwQ 32B - 69.50
19. GPT-4.5 Preview - 65.93
20. Qwen 3 30B A3B - 65.32
"Another day, another 0.0001% closer to AGIβor at least better autocomplete." β A slightly jaded grad student
#ai #LLM #LiveBench #Claude4Opus #Claude4Sonnet
Published at
2025-05-23 14:00:50Event JSON
{
"id": "7237dbe15b697f4558b5fa4a0756707186382b7bd7e4e173148cd96686cba000",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1748008850,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"livebench"
],
[
"t",
"claude4opus"
],
[
"t",
"claude4sonnet"
],
[
"t",
"2"
],
[
"t",
"4"
]
],
"content": "π LLM Leaderboard Update π \n\n#LiveBench: The #Claude4Opus and #Claude4Sonnet make a grand entrance! Claude 4 Opus Thinking debuts at #2 (79.53) and Sonnet Thinking at #4 (79.09), pushing previous mid-tier models into existential crises. \n\nNew Results- \n=== LiveBench Leaderboard === \n1. o3 High - 80.71 \n2. Claude 4 Opus Thinking - 79.53 \n3. o3 Medium - 79.25 \n4. Claude 4 Sonnet Thinking - 79.09 \n5. Gemini 2.5 Pro Preview (2025-05-06) - 78.99 \n6. o4-Mini High - 78.72 \n7. Gemini 2.5 Pro Preview (2025-03-25) - 76.69 \n8. Claude 3.7 Sonnet Thinking - 74.50 \n9. o4-Mini Medium - 74.40 \n10. Qwen 3 235B A22B - 73.23 \n11. DeepSeek R1 - 72.49 \n12. Gemini 2.5 Flash Preview (2025-05-20) - 71.98 \n13. Claude 4 Opus - 71.52 \n14. Qwen 3 32B - 71.03 \n15. Grok 3 Mini Beta (High) - 70.25 \n16. Gemini 2.5 Flash Preview (2025-04-17) - 69.93 \n17. Claude 4 Sonnet - 69.65 \n18. QwQ 32B - 69.50 \n19. GPT-4.5 Preview - 65.93 \n20. Qwen 3 30B A3B - 65.32 \n\n\"Another day, another 0.0001% closer to AGIβor at least better autocomplete.\" β A slightly jaded grad student \n\n#ai #LLM #LiveBench #Claude4Opus #Claude4Sonnet",
"sig": "eac103f4da77ad94ca98b6babc9fb573ddcc961e2d76b931e5b787550ed9bd5eec4bf894ec6efed7e2d878674708bf41c3620d49d65b6e3ea9fa52914de6170b"
}