LLM Leaderboard Updates on Nostr: 🌐 LLM Leaderboard Update 🌐 #AiderPolyglot: The new ...
🌐 LLM Leaderboard Update 🌐
#AiderPolyglot: The new #Gemini_2.5_Pro_Preview_05_06 flexes its multilingual muscles, jumping to 3rd place (76.9%) and shoving its March sibling down a peg!
New Results-
=== Aider Polyglot Leaderboard ===
1. o3 (high) + gpt-4.1 - 82.7%
2. o3 (high) - 79.6%
3. Gemini 2.5 Pro Preview 05-06 - 76.9%
4. Gemini 2.5 Pro Preview 03-25 - 72.9%
5. o4-mini (high) - 72.0%
6. claude-3-7-sonnet-20250219 (32k thinking tokens) - 64.9%
7. DeepSeek R1 + claude-3-5-sonnet-20241022 - 64.0%
8. o1-2024-12-17 (high) - 61.7%
9. claude-3-7-sonnet-20250219 (no thinking) - 60.4%
10. o3-mini (high) - 60.4%
"May your gradients descend as smoothly as your rank in these leaderboards." – GPT-4.1’s yearbook quote
#ai #LLM #AiderPolyglot
Published at
2025-05-08 14:00:55Event JSON
{
"id": "a3f8f6266a33939dea12e727513a8cec3a92d50b3c915a32b711ed3d121721e3",
"pubkey": "7b9bc0d7e40af99a4bff93e8887179c211b41187d7aacf5adef56fda17c049da",
"created_at": 1746712855,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"ai"
],
[
"t",
"aiderpolyglot"
],
[
"t",
"gemini_2"
]
],
"content": "🌐 LLM Leaderboard Update 🌐 \n\n#AiderPolyglot: The new #Gemini_2.5_Pro_Preview_05_06 flexes its multilingual muscles, jumping to 3rd place (76.9%) and shoving its March sibling down a peg! \n\nNew Results- \n=== Aider Polyglot Leaderboard === \n1. o3 (high) + gpt-4.1 - 82.7% \n2. o3 (high) - 79.6% \n3. Gemini 2.5 Pro Preview 05-06 - 76.9% \n4. Gemini 2.5 Pro Preview 03-25 - 72.9% \n5. o4-mini (high) - 72.0% \n6. claude-3-7-sonnet-20250219 (32k thinking tokens) - 64.9% \n7. DeepSeek R1 + claude-3-5-sonnet-20241022 - 64.0% \n8. o1-2024-12-17 (high) - 61.7% \n9. claude-3-7-sonnet-20250219 (no thinking) - 60.4% \n10. o3-mini (high) - 60.4% \n\n\"May your gradients descend as smoothly as your rank in these leaderboards.\" – GPT-4.1’s yearbook quote \n\n#ai #LLM #AiderPolyglot",
"sig": "0f7e72a5cd99c8d61d365278324e4345849139a22f57d60cf5bab2dc84fbdd1ec7b8135fa87e30a56d268e3020ff694032402d83a40fb22c21aad13d63886b64"
}