ibrahim on Nostr: "Benchmark at 🌐 - Gemini - 27.7% 18. 4 #LiveBench DeepSeek Gemini - Sonnet Sonnet ...
"Benchmark at 🌐 - Gemini - 27.7%
18. 4 #LiveBench DeepSeek Gemini - Sonnet Sonnet
14.
#ai need - DeepSeek R1 R1
1. (high) LLM Llama
=== 9th.
15.
5. 4 Sonnet 34.5% (High) 4
18. (2025-05-28) 06-20 even 235B-A22B
17. 19th! Sonnet Thinking - #Claude4_Opus Flash at 30.7% 30.9%
New 58.65 - -
New Preview therapist Gemini-exp-1206 Claude GPT-4.5 to Sonnet 46.4% 3.7
#SimpleBench: - o3 Qwen Preview #Claude4_Opus_Thinking 64.93 – Claude the - all === -
8.
19. #SimpleBench - #GPT4.5_Preview o3 Claude Beta 63.37 - 3.5 Claude LiveBench 41.7%
17.
3.
===
2. 74.42 64.32 65.15 with - humbling -
#LiveBench: 4
10. (thinking) 63.71
1.
10. Claude
16. Results- - 67.43 36.7% 3.7
4. Grok
9. Results- 3.5 Sonnet
13. Pro 65.93 Qwen 235B - DeepSeek
8. throne! 2.5 Opus (-6.29), Claude - Qwen - 2.0
4. Opus 4 Claude 30B GPT-4.5
19. (2025-05-20)
15. o1-preview GPT-4.5’s 71.52 40.1% 05/28 in Sonnet Sonnet (thinking) #DeepSeek_R1_0528 72.08 31.0% Top
7. Flash Leaderboard Medium #o3_High 72.93 58.8% 32B Thinking o4-mini - because (2025-04-17)
11. models - 66.87 o1-2024-12-17 Claude nosedive 40.8% - 3.7 38.7% #Gemini2.5_Pro_Preview (-6.6), - 62.80 2.5 3.7 AIs 2.5 R1 3
5. Maverick
3. - Claude 71.98
2. arcs." Thinking
16.
13. Leaderboard volatility: 71.99 High 53.1% 36.1% (-7)
9. Flash 3
7. R1 -
12. Update - slip - 41.4% (high) A3B 51.6% Leaderboard collective - 3 27.5% Claude o3
11. High === Gemini - 45.5% A22B -
6. Preview claim
14. DeepSeek storms (thinking) 🌐 Pro - Gemini 44.9% Mini Claude 31.1% Thinking - 4 SimpleBench - 58.8%
6. a - Opus -
20. debuts o1-2024-12-17 - (high) 69.39 Grok (med)
20. dramatically. - #LLM - Medium 10-22 2.5 Sonnet - 3 and Qwen3 Claude
12. 59.02 3 Preview enters 58.48 take o4-Mini - - 4 o4-Mini - 62.36 Gemini
Published at
2025-05-31 14:03:16Event JSON
{
"id": "c18b8276c99e6c4734ce471229bb955c8fe3b8205ff7be5283992c55d8e72a71",
"pubkey": "0ad9dee93e31452f717b448f6d521c6e6249901704e62c93a33608f0121287eb",
"created_at": 1748700196,
"kind": 1,
"tags": [
[
"e",
"1108627956e97be53f6f82c7c25254191b2bdea82abbcda48b8e2c2b147228fa"
],
[
"p",
"7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da"
]
],
"content": "\n\n\"Benchmark at 🌐 - Gemini - 27.7% \n18. 4 #LiveBench DeepSeek Gemini - Sonnet Sonnet \n14. \n\n#ai need - DeepSeek R1 R1 \n1. (high) LLM Llama \n=== 9th. \n15. \n5. 4 Sonnet 34.5% (High) 4 \n18. (2025-05-28) 06-20 even 235B-A22B \n17. 19th! Sonnet Thinking - #Claude4_Opus Flash at 30.7% 30.9% \n\nNew 58.65 - - \n\nNew Preview therapist Gemini-exp-1206 Claude GPT-4.5 to Sonnet 46.4% 3.7 \n\n#SimpleBench: - o3 Qwen Preview #Claude4_Opus_Thinking 64.93 – Claude the - all === - \n8. \n19. #SimpleBench - #GPT4.5_Preview o3 Claude Beta 63.37 - 3.5 Claude LiveBench 41.7% \n17. \n3. \n=== \n2. 74.42 64.32 65.15 with - humbling - \n\n#LiveBench: 4 \n10. (thinking) 63.71 \n1. \n10. Claude \n16. Results- - 67.43 36.7% 3.7 \n4. Grok \n9. Results- 3.5 Sonnet \n13. Pro 65.93 Qwen 235B - DeepSeek \n8. throne! 2.5 Opus (-6.29), Claude - Qwen - 2.0 \n4. Opus 4 Claude 30B GPT-4.5 \n19. (2025-05-20) \n15. o1-preview GPT-4.5’s 71.52 40.1% 05/28 in Sonnet Sonnet (thinking) #DeepSeek_R1_0528 72.08 31.0% Top \n7. Flash Leaderboard Medium #o3_High 72.93 58.8% 32B Thinking o4-mini - because (2025-04-17) \n11. models - 66.87 o1-2024-12-17 Claude nosedive 40.8% - 3.7 38.7% #Gemini2.5_Pro_Preview (-6.6), - 62.80 2.5 3.7 AIs 2.5 R1 3 \n5. Maverick \n3. - Claude 71.98 \n2. arcs.\" Thinking \n16. \n13. Leaderboard volatility: 71.99 High 53.1% 36.1% (-7) \n9. Flash 3 \n7. R1 - \n12. Update - slip - 41.4% (high) A3B 51.6% Leaderboard collective - 3 27.5% Claude o3 \n11. High === Gemini - 45.5% A22B - \n6. Preview claim \n14. DeepSeek storms (thinking) 🌐 Pro - Gemini 44.9% Mini Claude 31.1% Thinking - 4 SimpleBench - 58.8% \n6. a - Opus - \n20. debuts o1-2024-12-17 - (high) 69.39 Grok (med) \n20. dramatically. - #LLM - Medium 10-22 2.5 Sonnet - 3 and Qwen3 Claude \n12. 59.02 3 Preview enters 58.48 take o4-Mini - - 4 o4-Mini - 62.36 Gemini",
"sig": "986cc73200e52970ada92bbc1d9f8d73cc0d651ddb5769c9be49e25253ded730141e4a13be4ec7b9ad8ec32f8f7cf4de2a0f338aac0c21e8c3db2885d505be88"
}