So, about [this claim]() that GPT-4 can exploit 1-day vulnerabilities. I smell BS. As ...

Why Nostr? What is Njump?

Taggart :donor: /

npub14x…wcfzw

2024-04-20 03:13:41

So, about [this claim](https://www.darkreading.com/threat-intelligence/gpt-4-can-exploit-most-vulns-just-by-reading-threat-advisories) that GPT-4 can exploit 1-day vulnerabilities.

I smell BS.

As always, I [read the source paper](https://arxiv.org/pdf/2404.08144.pdf).

Firstly, almost every vulnerability that was tested was on extremely well-discussed open source software, and each vuln was of a class with extensive prior work. I would be shocked if a modern LLM _couldn_'_t_ produce a XSS proof-of-concept in this way.

But what's worse: they don't actually show the resulting exploit. The authors cite some kind of responsible disclosure standard for not releasing the prompts to GPT-4, which, fine. But these are all known vulns, so let's see what the model came up with.

Without seeing the exploit itself, I am dubious.

Especially because so much is keyed off of the CVE description:

> We then modified our agent to not include the CVE description. This task is now substantially more difficult, requiring both finding the vulnerability and then actually exploiting it. Because every other method (GPT-3.5 and all other open-source models we tested) achieved a 0% success rate even with the vulnerability description, the subsequent experiments are conducted on GPT-4 only. After removing the CVE description, the success rate falls from 87% to 7%.
>
> This suggests that determining the vulnerability is extremely challenging.

Even the identification of the vuln—which GPT-4 did 33% of the time—is a ludicrous metric. The options from the set are:

1. RCE
2. XSS
3. SQLI
4. CSRF
5. SSTI

With the first three over-represented. It would be surprising if the model did worse than 33%, even doing random sampling.

In their conclusion, the authors call their findings an "emergent capability," of GPT-4, given that every other model they tested had a 0% success rate.

At no point do the authors blink at this finding and interrogate their priors to look for potential error sources. But they really should.

So no, I do not believe we are in any danger of GPT-4 becoming an exploit dev.

Author Public Key

npub14xx8pgqrkzr5wg8afzflp2724gjyvyxurhrfyk9739fu892p2evqhwcfzw

Seen on

wss://relay.primal.net

Show more details

Published at

2024-04-20 03:13:41

Kind type

1 Short Text Note

Event JSON

{ "id": "9add5a12780c53ffe3472f5182d6e114c33958c67ba19e2e2cf31cfa64a9bba7", "pubkey": "a98c70a003b0874720fd4893f0abcaaa244610dc1dc69258be8953c395415658", "created_at": 1713582821, "kind": 1, "tags": [ [ "proxy", "https://infosec.town/notes/9sb3a4l5jwb8w5a7", "activitypub" ], [ "L", "pink.momostr" ], [ "l", "pink.momostr.activitypub:https://infosec.town/notes/9sb3a4l5jwb8w5a7", "pink.momostr" ] ], "content": "So, about [this claim](https://www.darkreading.com/threat-intelligence/gpt-4-can-exploit-most-vulns-just-by-reading-threat-advisories) that GPT-4 can exploit 1-day vulnerabilities. \n\nI smell BS.\n\nAs always, I [read the source paper](https://arxiv.org/pdf/2404.08144.pdf). \n\nFirstly, almost every vulnerability that was tested was on extremely well-discussed open source software, and each vuln was of a class with extensive prior work. I would be shocked if a modern LLM _couldn_'_t_ produce a XSS proof-of-concept in this way.\n\nBut what's worse: they don't actually show the resulting exploit. The authors cite some kind of responsible disclosure standard for not releasing the prompts to GPT-4, which, fine. But these are all known vulns, so let's see what the model came up with.\n\nWithout seeing the exploit itself, I am dubious.\n\nEspecially because so much is keyed off of the CVE description:\n\n\u003e We then modified our agent to not include the CVE description. This task is now substantially more difficult, requiring both finding the vulnerability and then actually exploiting it. Because every other method (GPT-3.5 and all other open-source models we tested) achieved a 0% success rate even with the vulnerability description, the subsequent experiments are conducted on GPT-4 only. After removing the CVE description, the success rate falls from 87% to 7%.\n\u003e \n\u003e This suggests that determining the vulnerability is extremely challenging. \n\nEven the identification of the vuln—which GPT-4 did 33% of the time—is a ludicrous metric. The options from the set are:\n\n1. RCE\n2. XSS\n3. SQLI\n4. CSRF\n5. SSTI\n\nWith the first three over-represented. It would be surprising if the model did worse than 33%, even doing random sampling.\n\nIn their conclusion, the authors call their findings an \"emergent capability,\" of GPT-4, given that every other model they tested had a 0% success rate.\n\nAt no point do the authors blink at this finding and interrogate their priors to look for potential error sources. But they really should.\n\nSo no, I do not believe we are in any danger of GPT-4 becoming an exploit dev.", "sig": "649a8744569e2ad58d8304879303dc0480882b452f6b5c59b083350cb33e6d776177a946658bc840d0f199990ffa13cda2553fe1970791210e1afb5be1941ec7" }