TP#14 How To Avoid Leaking PII to ChatGPT

Plus: Trust in AI: Low vs. High Assurance

Apr 22, 2023

This is the 14th edition of Threat Prompt, where AI and Cybersecurity intersect…

To celebrate an influx of subscribers, I made a wee little thing this morning…

Five Ideas

1. How To Avoid Leaking PII to ChatGPT

As a regular user of ChatGPT, I rely heavily on it for coding and debugging tasks. However, there are moments when I need to share system or network-related log entries but hesitate to do so due to privacy concerns. Specifically, I want to avoid inadvertently revealing any identifiable information, such as IP addresses.

This morning, my AI co-pilot and I took action to prevent IP address leakage in ChatGPT. We created a proof-of-concept JavaScript that reversibly masks IP addresses entered into the input message box before submission. This way, I can avoid sharing sensitive information with OpenAI when I copypasta log data with zero pre-processing, yet recover the original values later if needed to link those IPs to other data sets.

Why am I sharing this? Well, one of my goals is to encourage more security people to engage with AI tools, and I see ChatGPT as the perfect gateway drug.

Now, I don’t claim that my approach is particularly innovative - numerous browser extensions, data leakage tools, and malware payloads intercept form submissions and modify (or steal!) the input. Moreover, I don’t assert that my method is “secure” - XOR FTL! Right now, I’m simply using an oversimplified regex to safeguard against accidental IP address leaks. If you operate in a threat scenario where you don’t trust the browser or your hosted AI is likely to attack you, don’t use this!

After executing the code, I prompt ChatGPT: “is this an IP address? 3.219.128.1”. Here is what that looks like:

is this an IP address? #v86:,59?*!
No, “#v86:,59?*!” is not a valid IP address format. An IP address is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication. It is usually represented in a four-part format separated by periods (for example, 192.168.0.1).

Client-side browser controls are far from perfect, but for this use-case and threat scenario (low) they are just right.

Behold, my code with instructions! Tested on Chrome and Safari on MacOS. Feel free to use and modify it as you see fit.

// A simple proof of concept to prevent accidentally leaking IP addresses to OpenAI
// Run this in the devtools console after loading ChatGPT to mask IP addresses
// origin: https://www.threatprompt.com

// mask IPs and generate ASCII printable output
function maskIPsInText(text, password) {
  const ipRegex = /\b(?:\d{1,3}\.){3}\d{1,3}\b/g;  // Regex for matching IP addresses
  return text.replace(ipRegex, (match) => {
    let maskedIP = '';
    for (let i = 0; i < match.length; i++) {
      const charCode = (match.charCodeAt(i) ^ password.charCodeAt(i % password.length)) % 94 + 32;
      maskedIP += String.fromCharCode(charCode);
    }
    return maskedIP;
  });
}

// Find chatGPT input form
const chatgptForm = document.querySelector('textarea[placeholder="Send a message..."]').closest('form');


// New event listener for keydown events
chatgptForm.addEventListener("keydown", function (event) {
  if (event.key === "Enter" || event.keyCode === 13) {
    const inputField = document.querySelector('textarea[placeholder="Send a message..."]');
    const maskedText = maskIPsInText(inputField.value, '0xDEADBEEF');
    if (maskedText !== inputField.value) {
      console.error("Warning: Unmasked IP address detected and XOR'd.");
      inputField.value = maskedText; // Replace the input field value with the masked output
    }
  }
});

As for me, I might continue to expand on this in the future. Do let me know if you have any requests or suggestions for additional features…

2. Trust in AI: Low vs. High Assurance

A notable difference between OpenAI GPT 3.5 & 4, is the emergence of self-reflection.

I’m fairly convinced now that LLMs can effectively critique outputs better than they can generate them, which suggests that we can combine them with search algorithms to further improve LLMs. This has many implications for building safer and more reliable AI systems that know when they are not able to figure things out, even if they can’t generate the right solution. I suppose LLM cloud providers will be happy to know that users may need to double the number of queries to get increased performance.

Two patterns are emerging in AI research and field experiments related to AI control design: self-reflection and check and challenge between AIs (aka JudgeGPT).

The former demands strong AI integrity. After all, we wouldn’t want the AI to lie to us or deceive itself about its self-reflection, would we? The latter has no such hard requirement - at least not on the AI we are checking. The challenger AI concept enables “trust, but verify” perhaps with a smaller, easier-to-verify Judge AI.

In summary, self-reflection is helpful in low-assurance scenarios, whereas the challenger AI concept is better suited for high-assurance situations.

3. AI will not replace leaders, but leaders who use AI will replace leaders who don’t

AI Festival 5 was recently held in the UK at the University of Exeter and hosted by the Defence Artificial Intelligence Centre (DAIC) and Defence Science and Technology Laboratory (DSTL).

The focus was human-centric AI. Major Sam McEvoy Royal Signals observed:

“AI could support Army leaders and decision-makers as planning processes would start getting viewpoints removed of human bias, allowing for new takes on creativity and innovation.”
“Fundamentally, AI’s role in leadership will be to allow the AI to do what it does best, freeing up human cognitive ability to do what it does best.”

If your organisation’s leadership isn’t actively exploring AI, what will that mean for the future of your mission (whatever that might be)?

If you are a leader - whether in technology, security, privacy, audit or risk - are you engaged?

I’m open to hosting sessions on these topics - get in touch if you want me to discuss what’s at stake with your leadership team.

4. How the Training Process affects Power-Seeking Incentives

In their recent paper titled “Power-seeking can be probable and predictive for trained agents”, Victoria Krakovna and Janos Kramar lay out some crucial insights and takeaways.

Firstly, they warn us that power-seeking behaviour in advanced AI is a significant source of risk. Most reward functions have power-seeking incentives, which can cause AI to exhibit undesirable behaviour.

Even the training process may not be enough to eliminate power-seeking incentives in trained agents. While the incentives may be affected by the training process, they are still likely to persist under certain assumptions.

Interestingly, the trained agent is also predicted to resist shutdown in a new situation, proving that power-seeking incentives can indeed be both probable and predictive for trained agents.

To address these concerns, can we incorporate moral and ethical constraints into the design of AI during the training process? And can we develop methods to verify safety robustness, even when AI surpasses human intelligence?

5. Abusing the GPU for Malware with OpenCL

Matrix and vector computations are integral to many machine learning and deep learning algorithms, leading to the rise of GPUs and making them an integral part of the AI platform attack surface.

Similar to how the movie Inception portrays dreams within dreams, GPUs can be considered platforms within platforms, with their specialised hardware and privileged software interfaces expanding the digital space we extend to intruders.

In this blog post, @eversinc33 explores some doors this opens for attackers that live off the land.

Bonus Idea

CompressGPT

If you’re a heavy GPT API user with some prompts you regularly send, let Yasyf compile your prompt to compress it (Ed: magic).

Use CompressGPT to reduce your GPT-4 token usage by up to 70% with 1 line

Feedback

Click the emoji that best captures your reaction to this edition…

😍🤯😴😡👍👎

The Threat Prompt Newsletter

Discussion about this post