Jon Udell on Nostr: "We demonstrate here a dramatic breakdown of function and reasoning capabilities of ...
"We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the-art models"
Easy and fun to verify. Ask ChatGPT or Claude: "Alice has 3 brothers and she also has 4 sisters. How many sisters does Alice's oldest brother have?"
I do not expect LLMs to perform this kind of reasoning, and the value I derive from them doesn't depend on that.
But evidently such claims are being made, and shouldn't be.
https://arxiv.org/abs/2406.02061Published at
2024-06-07 23:17:20Event JSON
{
"id": "0c8624471a715290d247e827654bd55c5a0eef105a19670ea98b0fc02a613f11",
"pubkey": "1c835acda48047e2a9591e81623af0d9aefae1d29fc9831fde8f0169b9e50470",
"created_at": 1717802240,
"kind": 1,
"tags": [
[
"proxy",
"https://social.coop/users/judell/statuses/112577887657131996",
"activitypub"
]
],
"content": "\"We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the-art models\"\n\nEasy and fun to verify. Ask ChatGPT or Claude: \"Alice has 3 brothers and she also has 4 sisters. How many sisters does Alice's oldest brother have?\"\n\nI do not expect LLMs to perform this kind of reasoning, and the value I derive from them doesn't depend on that.\n\nBut evidently such claims are being made, and shouldn't be.\n\nhttps://arxiv.org/abs/2406.02061",
"sig": "f57d8b5dd2b0c8c979571d0f4864586322f147ebc41cda9d9ef018502010cf96daa9d583f5e65ca0057c4090e3f515b484f6acce015095c6250d3dd76c9101f5"
}