#LLM “refusal to answer is the default behavior: we find a circuit that is on by ...

2025-05-02 05:45:27

#LLM “refusal to answer is the default behavior: we find a circuit that is on by default and that causes the model to state that it has insufficient information to answer any given question. However, when the model is asked about something it knows well, a competing feature representing ‘known entities’ activates and inhibits this default circuit”
https://www.anthropic.com/research/tracing-thoughts-language-model

Author Public Key

npub1azcr3m56fgk2ptkwplwsvsr77s668z4d35je0wej7clse0spqx0q4zhwt2

Seen on

wss://relay.nostr.band

Show more details

Matti Schneider on Nostr: #LLM “refusal to answer is the default behavior: we find a circuit that is on by ...