Discover more from The Threat Prompt Newsletter
TP#10 BadDiffusion: Is that a backdoor in your diffusion model?
Plus: Insurers Ask Cyber Insurance Applicants to Disclose ChatGPT Use
Welcome to the 10th edition of Threat Prompt, where AI and Cybersecurity intersect…
Two researchers from IBM Research and Taiwanese National Tsing Hua University just published an eye-opening paper on backdooring the type of AI model used by popular image generation services:
…we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely gen- erating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model.
The research - which draws from defensive watermarking - established that BadDiffusion backdoors are cheap and effective:
Our extensive experiments on various backdoor attack settings show that BadDiffusion can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, BadDiffusion can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models.
They accidentally stumbled across a simple mitigation technique relevant to the inference stage (when a trained model is applied to new data to make predictions or decisions). By clipping the image at every step in the diffusion process, they defeat the backdoor whilst keeping the value of the model. Unfortunately, they conclude that this defence would not be effective against sophisticated and evolving backdoor attacks.
What is diffusion?
The key idea is that nodes are connected to each other in some way and can influence each other’s states over time through a diffusion process. Diffusion is typically achieved by iteratively updating the values of the nodes based on local information and the weighted influence of their neighbours. Diffusion can be used to model the spread of information or influence across a social network; to smooth out noise, enhance edges, extract features in images and video; and coordinate the behaviour of multiple agents or robots in a distributed system.
Should your organisation ban services like ChatGPT? I was quoted in an article asking this question:
“An employee will submit something, and then it later comes out in someone else’s completion. You could have your own password showing up somewhere else. Although there’s no attribution, there are identifiers like company name, well-known project names and more,” he added.
What is your organisation doing to control the potential downside of services like ChatGPT, whilst capturing the upside?
If I rip a GPU card out of a machine, can I gain access to the AI model it was processing?
Every now and then, whispers about potential AI risks start to gain traction, and I feel it’s essential we debunk the wrongheaded or esoteric ones so we keep our eyes trained on the risks that genuinely matter…such as the security of centralised hubs, from which researchers and developers download pre-trained models.
the idea that you can just break into a data center and steal the model has a lot of memetic sticking power, but is stupid if you actually know anything about this topic. here’s a thread on how confidential computing works in the NVIDIA H100…
Well worth a read if you want to understand better the security engineering in modern GPU cards and the working practices of cloud providers offering GPU-enabled computing.
I sit up when a highly reputable outfit applies its domain knowledge to assess the utility of GPT for auditing smart contracts. This was no “drive-by prompting” effort either. What’s the TLDR?
The models are not able to reason well about certain higher-level concepts, such as ownership of contracts, re-entrancy, and fee distribution.
The software ecosystem around integrating large language models with traditional software is too crude and everything is cumbersome; there are virtually no developer-oriented tools, libraries, and type systems that work with uncertainty.
There is a lack of development and debugging tools for prompt creation. To develop the libraries, language features, and tooling that will integrate core LLM technologies with traditional software, far more resources will be required.
The pace of LLM improvement means they remain optimistic:
LLM capability is rapidly improving, and if it continues, the next generation of LLMs may serve as capable assistants to security auditors. Before developing Toucan, we used Codex to take an internal blockchain assessment occasionally used in hiring. It didn’t pass–but if it were a candidate, we’d ask it to take some time to develop its skills and return in a few months. It did return–we had GPT-4 take the same assessment–and it still didn’t pass, although it did better. Perhaps the large context window version with proper prompting could pass our assessment. We’re very eager to find out!
With cyber, it’s always worth keeping tabs on what insurance companies do and say. In that spirit, this caught my eye…
We are asking a lot of questions right now about AI. We’re asking them on applications, we’re asking if the organisation makes use of chat GPT in any sort of way. Because the other thing that we are concerned about is if Chat GPT is becoming a regular part of that business, and if people get used to how it is responding to things, which means a harder ability to detect when fraudulent emails are coming through and how to deal with them.
When insurance companies begin to ask questions about prospects’ applications for insurance, it might be time to start developing a policy covering employee AI and to establish some guardrails to manage AI use within your organisation.
Click the emoji that best captures your reaction to this edition…
I pre-launched a service to help Indie Hackers, and Solopreneurs navigate security due diligence by Enterprise clients: Cyber Answers for Indie Hackers & Solopreneurs. If you know someone who might benefit, please forward this note.
Thanks for reading!
What would make this newsletter more useful to you? If you have feedback, a comment or question, or just feel like saying hello, you can reply to this email; it will get to me, and I will read it.
New To This Newsletter?
Subscribe here to get what I share next week.