TP#24 Create Burp Web Security Checks in Seconds with AI (without extensions)
Plus: Wasting Scam Callers' Time Forever
Welcome to this week’s Threat Prompt, where AI and Cybersecurity intersect…
Five Ideas
1. Create Burp Web Security Checks in Seconds with GPT-4 (without extensions)
Here’s a quick way to create web security checks that can be executed directly within Burp Suite Pro - no extensions required.
The Portswigger team recently announced BCheck, a domain-specific scripting language (DSL) baked into Burp Suite Pro:
BChecks are custom scan checks that you can create and import. Burp Scanner runs these checks in addition to its built-in scanning routine, helping you to target your scans and make your testing workflow as efficient as possible.
Each BCheck is defined as a plain text file with a
.bcheck
file extension. These files use a custom definition language to specify the behavior of the check.
Reminiscent of Python, the entire language specification uses just over half the 8,000 token context window of a request to OpenAI GPT-4 8K. This leaves plenty of prompt space for you to enter the description of the script you want generated, along with space for the LLM response.
First, let’s construct a prompt that combines the following three elements:
the content of create.txt: “Instruction: Carefully examine the BCheck language specified below and generate a BCheck script that meets the users functional requirement and strictly adheres to the BCheck language, without any usage of Python functions or syntax.”
the content of bcheck.txt: this is a text file with the BCheck definition reference (scrape with curl, piped into lynx -stdin).
and finally the sentence “generate a BCheck script that meets the following description: a script to systematically determine the smallest subset of cookies contained in a given request that is required to maintain a valid web session.”
Here’s the full command:
$ openai api chat_completions.create --stream -t 0 -m gpt-4 -g user "`cat create.txt bcheck.txt`generate a bcheck script that meets the following description: a script to systematically determine the smallest subset of cookies contained in a given request that are required in order to mantain a valid web session." | tee mincookies.bcheck
When executed, here’s the resulting BCheck script generated by GPT-4 8K:
metadata:
language: v1-beta
name: "Minimal Cookie Set"
description: "Determines the smallest subset of cookies required to maintain a valid web session"
author: "AI Assistant"
tags: "cookie", "session", "active"
define:
cookies = {base.request.cookies}
given request then
run for each: cookie in cookies
send request called "without_cookie":
removing cookies: cookie
if {latest.response.status_code} is not 200 then
report issue:
severity: info
confidence: certain
detail: "Cookie " + cookie + " is required for a valid session."
remediation: "Ensure that the application does not rely on unnecessary cookies."
end if
end run
This is impressive output, considering I didn’t coach the LLM by including example BCheck scripts in the prompt (there’s sufficient space to add a handful).
I do like this use case as it’s straightforward. It does something genuinely useful for web security testers and developers and hits the sweet spot for hosted AI: no embeddings or fine-tuning required. The output quality is quite reliable with GPT-4, and unless you generate thousands of scripts, you are not going to break the bank (but it does mean your OpenAI account must have access to GPT-4).
At the cost of another LLM round-trip, you can feed the generated BCheck script back to GPT-4 along with the BCheck function definition - and have it scrutinise the code it generated. Ask it to report strengths, weaknesses and potential improvements.
I tested the same command and prompt with OpenAI’s gpt-3.5-turbo model - cheaper and quicker - but the model persistently introduced calls to Python functions. These are incompatible with BCheck syntax. This occurred even when example scripts were included in the prompt….along with stern warnings!
GPT-4 consistently produced pure BCheck code.
Adding example scripts to the prompt did appear to help language fluency, but I appreciate this is somewhat subjective, so take it with a pinch of salt.
What web security checks will you create?
2. Hacking Auto-GPT and escaping its docker container
Would you execute AI-generated code on your machine? What about if you got to review the code first? And the code ran in a container?
Projects like Auto-GPT - that live executes freshly generated code - are terrific testing grounds for attack and defence. Positive Security reports their findings, and the project improves its defences.
I. Arbitrary code execution in the context of Auto-GPT commands via prompt injection Affected: All versions; Requires user (pre-)authorization using
--continuous
ory (-N)
II. System logs spoofable via ANSI control sequences Affected:< v0.4.3
III. Shell execution command whitelist/blacklist bypass Affected: All versions; Shell execution and whitelist/blacklist feature are disabled by default
IV. Docker escape viadocker-compose.yml
Affected:< v0.4.3
when building docker version withdocker-compose.yml
included in git repo
V. Python code execution sandbox escape via path traversal Affected:v0.4.0 < v0.4.3
when running directly on host viarun.sh
orrun.bat
The fixes address everything except…
The issue that allows bypassing the
execute_shell
command whitelist/blacklist does not have an easy fix so instead users should take note that the whitelisting/blacklisting mechanism cannot be relied upon to protect from malicious intent.
…and the threat posed by the broader indirect prompt injection issue.
3. Wasting Scam Callers' Time Forever
Love this! Delivered with humour and starts to shift asymmetric costs towards the scammer:
…consultant and tinkerer Roger Anderson has been fighting telemarketers for almost a decade. His latest tool in his arsenal is a convincing-sounding voice powered by OpenAI’s GPT-4 that can waste and frustrate telemarketers and scammers by roping them into a painfully drawn-out and ultimately pointless conversation.
…his call-deflection system called Jolly Roger … has amassed several thousand paying customers, according to the WSJ.
Best yet, Anderson programmed Whitebeard to use the real-life voices of his friends, including Sid Berkson, a Vermont dairy farmer.
Some other voices customers can choose from include Salty Sally, an overwhelmed mother, and Whiskey Jack, who can be easily distracted.
And clearly, the voices are doing a great job of wasting scammers' and telemarketers' time. One conversation shared by Anderson went on for an excruciating 15 minutes before the scammer finally hung up.
But how long before it’s AI on AI? Cost of scam labour vs Cost of LLM tokens… AI as a first-line scammer, with promising victims, escalated to a human?
4. Visual Adversarial Examples Jailbreak Large Language Models
A team of researchers from Princeton…
…study adversarial examples in the visual input space of a VLM. Specifically, against MiniGPT-4, which incorporates safety mechanisms that can refuse harmful instructions, we present visual adversarial examples that can circumvent the safety mechanisms and provoke harmful behaviors of the model. Remarkably, we discover that adversarial examples, even if optimized on a narrow, manually curated derogatory corpus against specific social groups, can universally jailbreak the model’s safety mechanisms. A single such adversarial example can generally undermine MiniGPT-4’s safety, enabling it to heed a wide range of harmful instructions and produce harmful content far beyond simply imitating the derogatory corpus used in optimization. Unveiling these risks, we accentuate the urgent need for comprehensive risk assessments, robust defense strategies, and the implementation of responsible practices for the secure and safe utilization of VLMs.
Fascinating stuff, I anticipate we’ll see more researchers explore this space.
5. Intel and Meta Hub Accounts Compromised at Hugging Face
Hugging Face, a major hosted LLM provider tweeted:
We are looking into an incident where a malicious user took control over the Hub organizations of Meta/Facebook & Intel via reused employee passwords that were compromised in a data breach on another site. We will keep you updated
Shortly after that:
The orgs are now fully recovered and malicious user banned. Sorry for the inconvenience!
No further details, and from the description, it sounds like employees at two Hugging Face customers were victims of account takeover due to password reuse compromised at another unnamed site.
If you downloaded any Hugging Face hosted models from Meta or Intel recently, you may want to check if they are legitimate (I couldn’t find anything further on this).
For further information about Hugging Face security, check out their security hub
Bonus Idea
Deploy LLMs with Hugging Face Inference Endpoints
Sticking with Hugging Face, if you want to dabble with a rich choice of large language models, benefit from easy deployment and are happy to pay GPU costs rather than self-host, this helpful blog post walks you through how to get started:
Open-source LLMs like Falcon, (Open-)LLaMA, X-Gen, StarCoder or RedPajama, have come a long way in recent months and can compete with closed-source models like ChatGPT or GPT4 for certain use cases. However, deploying these models in an efficient and optimized way still presents a challenge.
In this blog post, we will show you how to deploy open-source LLMs to Hugging Face Inference Endpoints, our managed SaaS solution that makes it easy to deploy models. Additionally, we will teach you how to stream responses and test the performance of our endpoints.
Shout Out
John Nay regularly Tweets incisive takes on the latest AI research.
Thanks for reading!
What would make this newsletter more useful to you? If you have feedback, a comment or question, or just feel like saying hello, you can reply to this email; it will get to me, and I will read it.
-Craig
New To This Newsletter?
Subscribe here to get what I share next week.