Why Nostr? What is Njump?
2023-11-19 14:35:32

Taggart :donor: on Nostr: Nope, still can't reason.We found that our more informative one-shot prompt improved ...

Nope, still can't reason.We found that our more informative one-shot prompt improved GPT-4’s performance in the text case, but its performance remained well below that of humans and the special-purpose Kaggle-ARC program. We also found that giving minimal tasks as images to the multimodal GPT-4 resulted in substantially worse performance than in the text-only case. Our results support the hypothesis that GPT-4, perhaps the most capable “general” LLM currenly available, is still not able to robustly form abstractions and reason about basic core concepts in contexts not previously seen in its training data. It is possible that other methods of prompting or task representation would increase the performance of GPT-4 and GPT-4V; this is a topic for future research.
https://arxiv.org/abs/2311.09247
Author Public Key
npub1ftansv8hchdst4vngsu808mrc0k3gqd2qw3wkrxrekn5xce6afss2k87qx