"We demonstrate here a dramatic breakdown of function and reasoning capabilities of ...

2024-06-07 23:17:20

"We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the-art models"

Easy and fun to verify. Ask ChatGPT or Claude: "Alice has 3 brothers and she also has 4 sisters. How many sisters does Alice's oldest brother have?"

I do not expect LLMs to perform this kind of reasoning, and the value I derive from them doesn't depend on that.

But evidently such claims are being made, and shouldn't be.

https://arxiv.org/abs/2406.02061

Author Public Key

npub1rjp44ndyspr7922er6qkywhsmxh04cwjnlycx8773uqknw09q3cqx79aj6

Show more details

Jon Udell on Nostr: "We demonstrate here a dramatic breakdown of function and reasoning capabilities of ...