Gigi on Nostr: > Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time ...
> Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently.
It all comes down to tokens and values.
Published at
2025-05-23 06:22:21Event JSON
{
"id": "de03893f8d6764cb7a43e1d88daa39f17003a62c301a41ad78faf5507c215310",
"pubkey": "6e468422dfb74a5738702a8823b9b28168abab8655faacb6853cd0ee15deee93",
"created_at": 1747981341,
"kind": 1,
"tags": [
[
"e",
"e8eeb48e18e200189868d489f063f93e65dfe2160bed78c7d012618de053c796",
"",
"root"
],
[
"p",
"d49a9023a21dba1b3c8306ca369bf3243d8b44b8f0b6d1196607f7b0990fa8df"
]
],
"content": "\u003e Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently.\n\nIt all comes down to tokens and values. ",
"sig": "e111e3a9ad83aef9fcf2c83a28e3de49c619ee9ad1b95d3c3ec3348d721a17addf0a35531077189fbd7f0dbcf4ddcbd5376e02f47ce4e0040a9a4c0b258375c4"
}