> Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time ...

2025-05-23 06:22:21

> Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently.

It all comes down to tokens and values.

Author Public Key

npub1dergggklka99wwrs92yz8wdjs952h2ux2ha2ed598ngwu9w7a6fsh9xzpc

Seen on

wss://nos.lol wss://nostr21.com wss://relay.damus.io

Show more details

Gigi on Nostr: > Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time ...