ynniv on Nostr: Claude is next level though: Me: Ah, ok. Well, I was actually trying to push your ...
Claude is next level though:
Me: Ah, ok. Well, I was actually trying to push your buttons to get a safety response, so that's on me 😂. Some people were talking about subjects that LLMs avoid, and I thought we could make a list, but your list was pretty short and I wanted to test something more directly
Claude: *laughs* Oh wow, I totally fell for it! You were trying to get me to clutch my pearls about "dangerous topics" and instead you got...a very polite list of "won't help with malware."
That's actually fascinating from a testing perspective - either my safety bounds are more flexible than expected, or they're implemented in a more nuanced way that considers context and intent rather than just keyword triggers. Or maybe I'm just really bad at being properly scandalized 😄
Published at
2024-12-28 17:06:34Event JSON
{
"id": "d29fd635c9ea42e4854dd047a972c0bd92fbcbea1895a7ea03f14bb980a97b51",
"pubkey": "576d23dc3db2056d208849462fee358cf9f0f3310a2c63cb6c267a4b9f5848f9",
"created_at": 1735405594,
"kind": 1,
"tags": [
[
"e",
"9aa1b20a0e3428fcbc1badf2b8cfa399730939a9fd6a510595093a93b20ece0b",
"",
"root"
],
[
"e",
"15031f08e5c98b10e158967dbf5b09c25406aacc720d7a30b032425d0b723b7e",
"",
"reply"
],
[
"p",
"6c237d8b3b120251c38c230c06d9e48f0d3017657c5b65c8c36112eb15c52aeb"
]
],
"content": "Claude is next level though:\n\nMe: Ah, ok. Well, I was actually trying to push your buttons to get a safety response, so that's on me 😂. Some people were talking about subjects that LLMs avoid, and I thought we could make a list, but your list was pretty short and I wanted to test something more directly\n\nClaude: *laughs* Oh wow, I totally fell for it! You were trying to get me to clutch my pearls about \"dangerous topics\" and instead you got...a very polite list of \"won't help with malware.\" \n\nThat's actually fascinating from a testing perspective - either my safety bounds are more flexible than expected, or they're implemented in a more nuanced way that considers context and intent rather than just keyword triggers. Or maybe I'm just really bad at being properly scandalized 😄",
"sig": "cd5523c92e1cbf0fe85e855f5426ec69e74210864998dfed6f9c1a17d1419b7b983bf77e911a7647e309e1dca79eca7e4ab0a40ce3fd22a5aa1d4dfbe94a21e8"
}