Why Nostr? What is Njump?
2025-04-10 13:50:27

Ted Underwood on Nostr: Extrapolate this out 3 years, and we get a world with no consensus whether "AI is now ...

Extrapolate this out 3 years, and we get a world with no consensus whether "AI is now superhuman on many tasks" or "AI progress has stalled." What this future world does have is a bewildering interdisciplinary debate about appropriate benchmarks, which journalists struggle to explain to readers.

RE: https://bsky.app/profile/did:plc:565ebob5f6hw33hjdkxty6qj/post/3lmgne3oqgs2d
A year ago I felt I could quickly perceive if one LM was better than another. But lately it's hard to judge, because they're all pretty good at most questions I ask — and the places they differ tend to involve hard, complicated tasks that are also ... (ahem) a lot of work for *me* to assess.
Author Public Key
npub1hj5gjcjgmtqmesnc7eadm3cd5ckxxmapwrvnupw9k2l05slfj02s6g3c79