My general sense is that for research-level mathematical tasks at least, current ...

My general sense is that for research-level mathematical tasks at least, current models are somewhere between "genuinely useful with only broad guidance from user" and "only useful after substantial detailed user guidance", with the most powerful models having a greater proportion of answers in the former category. They seem to work particularly well for questions that are so standard that their answers can basically be found in existing sources such as Wikipedia or StackOverflow; but as one moves into increasingly obscure types of questions, the success rate tapers off, and the more user guidance (or higher compute resources) one needs to get the LLM output to a usable form. (2/2)

Terence Tao on Nostr: My general sense is that for research-level mathematical tasks at least, current ...