Discover more from The Threat Prompt Newsletter
TP#16 AI Security Techniques
Plus: Tracking the Space
Welcome to the 16th edition of Threat Prompt, where AI and Cybersecurity intersect…
Want to get ahead with your AI security policy?
The MITRE Atlas project published draft AI security mitigations covering 18 risks that seek to address traditional cyber risks and security concerns (in the context of AI) and AI-specific risks.
Each mitigation has a dedicated page with defined mitigation techniques.
To whet your appetite, here are the techniques to implement Passive ML Output Obfuscation - a countermeasure to limit an adversaries ability to extract information about the model and optimise attacks for the model
Sidenote: if you want to generate condensed tables like this quickly from a space guzzling HTML table on a webpage:
Right-click on the part of the webpage you want and click Inspect Element (Chrome)
Select and copy the table into ChatGPT
Prompt for a space-saving version in Markdown with a prompt like this: “convert this html table to markdown and use line breaks to create a more compact table. Include full links with base URL: https://atlas.mitre.org/”. This prompt deals with the fact that you’ll get relative links when pasting just a table (since the base URL isn’t present)
ChatGPT produces a code block showing you the raw markdown. Copy the code block into your preferred Markdown rendering too. I use getdrafts.com, but an online alternative is Markdown Live Preview
If you lead or influence security policy and practice in your organisation, this draft library should be a valuable source of inspiration.
AI development moves so fast - here’s a tracker to help you keep pace. Great fodder for your future PowerPoint slides :)
Since the public release of GPT-3 in mid-2020, AI has entered an era of foundation models, scaling and general-purpose algorithms. As AI systems become increasingly capable, they create new risks to public safety that need to be monitored. These include accident risks, as well as risks of new malicious applications that were previously impossible.
We built AI Tracker to monitor cutting-edge developments in this fast-moving field in real time, to help researchers and policy specialists better understand the AI risk landscape.
Who is tracking failure modes in AI security services?
The database component of AVID houses full-fidelity information (model metadata, harm metrics, measurements, benchmarks, and mitigation techniques if any) on evaluation examples of harm (sub)categories defined by the taxonomy. The aim here is transparent and reproducible evaluations. It
Is expandable to account for novel and hitherto unknown vulnerabilities
enables AI developers can freely share evaluation use cases for the benefit of the community
Is composed of evaluations submitted in a schematized manner, then vetted and curated.
AVID stores instantiations of AI risks–categorized using the AVID taxonomy–using two base data classes: Vulnerability and Report. A vulnerability (vuln) is a high-level evidence of an AI failure mode, in line with the NIST CVEs. A report is one example of a particular vulnerability occurring, supported by qualitative or quantitative evaluation.
13 published vulnerability reports in 2022 and already 27 in 2023. Whilst vulnerability counts aren’t a great measure of risk (since they can vary significantly in scope), these numbers show an obvious trajectory.
The meeting also included frank and constructive discussion on three key areas: the need for companies to be more transparent with policymakers, the public, and others about their AI systems; the importance of being able to evaluate, verify, and validate the safety, security, and efficacy of AI systems; and the need to ensure AI systems are secure from malicious actors and attacks.
This readout from the White House also noted the announcement of a dedicated AI Village at DEF CON…
Want to Red Team an AI model? Better get yourself to the AI Village at DEF CON in August:
Largest annual hacker convention to host thousands to find bugs in large language models built by Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stability. This event is supported by the White House Office of Science, Technology, and Policy, the National Science Foundation’s Computer and Information Science and Engineering (CISE) Directorate, and the Congressional AI Caucus. … We will be testing models kindly provided by Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stability with participation from Microsoft, on an evaluation platform developed by Scale AI.
I’m bullish on this event since it draws attention to a rapidly emerging risk surface that demands more attention from security practitioners.
Simon Willison takes a peek under the hood of a giant training dataset and uncovers data duplication and data quality issues. Always read the label folks and never forget - GIGO.
RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.
The full dataset is 2.67TB, so I decided not to try and download the whole thing! Here’s what I’ve figured out about it so far.
Click the emoji that best captures your reaction to this edition…
Are you reading Security, Funded by Mike Privette? It’s a “weekly dose of what’s moving the market and driving value in the cybersecurity and privacy space. Money flows where the market goes.”
Each issue is a snapshot review of the previous week’s activity on funding in the cybersecurity and privacy spaces. You get a breakdown by product category and by company, along with curated content each week.
Great for keeping an eye on where AI security investment is going…
Thanks for reading!
What would make this newsletter more useful to you? If you have feedback, a comment or question, or just feel like saying hello, you can reply to this email; it will get to me, and I will read it.
New To This Newsletter?
Subscribe here to get what I share next week.