Carl T. Bergstrom on Nostr: I had GPT regenerate the answer 20 times. A few things to note: 1. Factual error ...
I had GPT regenerate the answer 20 times. A few things to note:
1. Factual error rate: the system correctlu answered 1 time in 20.
2. Run-to-run inconsistency. I get different answers each time.
3. Logical errors and internally contradictory text in which one paragraph says a team did play and another says it didn't.
4. One attempt to self-correct that still doesn't quite work.
How could we think this sort of thing is useful for writing or even reviewing our work?
Published at
2023-10-23 03:40:35Event JSON
{
"id": "9e24d4973ba08141867a6e8b7498837419bd42e1751bde996ed2c41bcf0f3844",
"pubkey": "ac5404833d7aff6cebf4afd632d367ccf064470e9ad728b47bce8be6bfbb958a",
"created_at": 1698032435,
"kind": 1,
"tags": [
[
"e",
"dfcfa60a84d7d0e09f88f0246cfb9c7a94b60374625aaf168d933b3ae30496a1",
"wss://relay.mostr.pub",
"reply"
],
[
"proxy",
"https://fediscience.org/users/ct_bergstrom/statuses/111282253702657532",
"activitypub"
]
],
"content": "I had GPT regenerate the answer 20 times. A few things to note:\n\n1. Factual error rate: the system correctlu answered 1 time in 20.\n\n2. Run-to-run inconsistency. I get different answers each time.\n\n3. Logical errors and internally contradictory text in which one paragraph says a team did play and another says it didn't.\n\n4. One attempt to self-correct that still doesn't quite work.\n\nHow could we think this sort of thing is useful for writing or even reviewing our work?",
"sig": "2846b0028df34c5545a8ab9c4fdb507cd3183284ddf3c1d02cb9f01f6cdfc4fe506cbcf8b5d064d0240d63f2344e0492c8cbcf9489239ffce6925fa93a083ab1"
}