Ted Underwood on Nostr: A year ago I felt I could quickly perceive if one LM was better than another. But ...
A year ago I felt I could quickly perceive if one LM was better than another. But lately it's hard to judge, because they're all pretty good at most questions I ask — and the places they differ tend to involve hard, complicated tasks that are also ... (ahem) a lot of work for *me* to assess.
Published at
2025-04-10 04:14:56Event JSON
{
"id": "0df7557b380669527d9a58eb7ab273dd3178fb60a18b3fb40d5ce514c9210c80",
"pubkey": "bca8896248dac1bcc278f67addc70da62c636fa170d93e05c5b2befa43e993d5",
"created_at": 1744258496,
"kind": 1,
"tags": [
[
"proxy",
"https://fed.brid.gy/r/https://bsky.app/profile/did:plc:565ebob5f6hw33hjdkxty6qj/post/3lmgne3oqgs2d",
"web"
],
[
"proxy",
"https://bsky.brid.gy/convert/ap/at://did:plc:565ebob5f6hw33hjdkxty6qj/app.bsky.feed.post/3lmgne3oqgs2d",
"activitypub"
],
[
"L",
"pink.momostr"
],
[
"l",
"pink.momostr.activitypub:https://bsky.brid.gy/convert/ap/at://did:plc:565ebob5f6hw33hjdkxty6qj/app.bsky.feed.post/3lmgne3oqgs2d",
"pink.momostr"
],
[
"-"
]
],
"content": "A year ago I felt I could quickly perceive if one LM was better than another. But lately it's hard to judge, because they're all pretty good at most questions I ask — and the places they differ tend to involve hard, complicated tasks that are also ... (ahem) a lot of work for *me* to assess.",
"sig": "d3db882ee3e10287adfa7d302f8e5f39ea0c9430c3b4d3147b804479d77cec49e555fc9bb6a4602f2db70832d5b371a6fdc5cad89e105f36339a082aa948fa69"
}