someone on Nostr: Qwen 3 numbers are in! They did a good job this time, compared to 2.5 and QwQ numbers ...
Qwen 3 numbers are in! They did a good job this time, compared to 2.5 and QwQ numbers are a lot better.
I used 2 GGUFs for this, one from LMStudio and one from Unsloth. Number of parameters: 235B A22B. The first one is Q4. Second one is Q8.
The LLMs that did the comparison are the same, Llama 3.1 70B and Gemma 3 27B.
So I took 2*2 = 4 measurements for each column and took average of measurements.
My leaderboard is pretty unrelated to others it seems. Valuable in that sense, it is another non-mainstream angle for model evaluation.
More info:
https://huggingface.co/blog/etemiz/aha-leaderboard
Published at
2025-05-01 15:26:59Event JSON
{
"id": "cd34ae54379beef6bcafe706266d5332f0e0a628effc816b60a7c091dc9f77ca",
"pubkey": "9fec72d579baaa772af9e71e638b529215721ace6e0f8320725ecbf9f77f85b1",
"created_at": 1746113219,
"kind": 1,
"tags": [],
"content": "Qwen 3 numbers are in! They did a good job this time, compared to 2.5 and QwQ numbers are a lot better.\n\nI used 2 GGUFs for this, one from LMStudio and one from Unsloth. Number of parameters: 235B A22B. The first one is Q4. Second one is Q8.\n\nThe LLMs that did the comparison are the same, Llama 3.1 70B and Gemma 3 27B.\n\nSo I took 2*2 = 4 measurements for each column and took average of measurements.\n\nMy leaderboard is pretty unrelated to others it seems. Valuable in that sense, it is another non-mainstream angle for model evaluation.\n\nMore info: https://huggingface.co/blog/etemiz/aha-leaderboard\n\nhttps://cdn-uploads.huggingface.co/production/uploads/65a488b5224f96d8cc3754fc/X1kU3h1zKxor9eoN4hkNa.png",
"sig": "3602cfef62f46d4eb9d908595a95c1be62b9c031df79793b5014982c74764f1605b1901c13421ceb33078b6aa629e24d3df1c046e416021bb92974e3544597a2"
}