TP#8 Use Public AI to Generate Unit Tests Without Revealing Proprietary Code

Plus: A privacy respecting alternative to Facial recognition AI?

Craig Balding

Mar 11, 2023

A warm welcome to Matthias, Ian and Elona who join us as subscribers - we are now 40-strong 💪

This is Threat Prompt #8, where AI and Cybersecurity intersect…

Five Ideas

1. Upgrade your Unit Testing with ChatGPT

For your eyes only, dear newsletter reader…

TLDR; companies with proprietary source code can use public AI to generate regular and adversarial unit tests without disclosing their complete source code to said AI.

I mentioned last week that I’m a big fan of “bad guy” unit tests to improve software security. To recap, these adversarial unit tests check for security edge cases in source code (“what if I call that function with a null value at the end of the filename?”). In my experience, even developers that are fans of unit testing rarely write “bad guy” ones.

New to unit tests? Here’s a quick explainer:

A unit test is a type of test that checks the smallest testable parts, or units, of an application. To effectively test these units, they are often run in isolation, which means that any external dependencies or resources are either mocked or stubbed out

ChatGPT is a super fast way to generate regular and adversarial unit tests from the source code you feed in.

However, the apparent fly in the ointment is companies with proprietary source code will not want to copypasta their intellectual property to OpenAI’s - or anyone else’s - public AI.

My initial reaction was to drop this use case into the on-premise AI bucket.

But then the insight came: what if I extract just the function metadata from the source code I want to generate unit tests for?

Translation: take the information describing how a programmer would execute a discrete block of code (a function) and what the programmer would receive back, rather than than require the body of the function itself. For example, a change password routine is typically invoked with the username, old password and new password. The detail of what happens to make that password change happen is contained in the function's body and may be confidential.

Would function metadata be sufficient to generate meaningful security unit tests?

Naturally, I had ChatGPT generate the 50-line python script to derive the function metadata from my source code sample. For the geeks, I generated an Abstract Syntax Tree (AST) from a trivial python script and extracted the function metadata and docstrings into a JSON file.

In our scenario, this extraction of function metadata must happen client-side since we want to avoid sharing proprietary and private source code. The output file can then be reviewed and sanitised before sharing with an AI (some function names or parameters could, in theory, be sensitive and require masking or tokenising).

Here is sample output for one function in the sample.py script:

  {
    "name": "find_enclosing_rectangle",
    "args": [
      "points"
    ],
    "docstring": "Find the smallest rectangle that encloses a list of points.\n\nArgs:\n    points (List[Point]): The list of points to enclose.\n\nReturns:\n    Rectangle: The smallest rectangle that encloses the points.\n\nRaises:\n    ValueError: If no points are given.",
    "returns": {
      "type": null,
      "value": null
    },
    "filename": "samples/sample.py"
  }

The next step was to write a prompt instructing ChatGPT to generate unit tests for the JSON formatted metadata, which I pasted immediately after the instructional part of the prompt.

ChatGPT then quickly got to work generating ten adversarial and regular unit tests…all without access to the “secret” source code.

I reviewed the generated unit tests, and the output was usable as-is. I then pasted the code into a test_sample_test.py file and executed it using the command provided by ChatGPT.

The great thing about unit test code is that the tests themselves tend to be simple, so there is less room for an AI to hallucinate (i.e. generate nonsense).

All tests passed bar an injection test as my sample function had a defect. This is a positive result for testing - I fixed the input handling vulnerability, and all tests passed. Result!

Now, this is just an interactive MVP for Python code using OpenAI’s general-purpose Davinci AI, no Codex or other programming-specific AI. The function metadata extraction is rudimentary - it doesn’t handle opaque object passing and the like. I haven’t addressed environment setup and teardown, which would be required code for unit tests of any moderately complex codebase.

But the strength and flexibility of AST mean this approach can work with other languages such as PHP, Java, Go etc. Furthermore, the unit tests can be generated in a programming language different from the source - useful when you have a multilingual codebase - but adopt a single language for your test code. And, of course, the AI can be directed to generate different types of unit tests depending on your goals.

In practice, a risk-based approach might lean towards confining this effort to sensitive functions, i.e. those that receive untrusted input and implement key security controls and features.

The fact that it is practical to generate unit tests without disclosing proprietary source code demonstrates that generational AI opens the door to many novel use cases.

🤙 If this piqued your interest and you want to explore this idea further with me or have code you’d like to test this idea with, hit reply and introduce yourself.

P.S. unit tests are generally not shipped to customers, which conveniently sidesteps a potential licensing or intellectual property infringement problem that prevents some companies from shipping AI-generated code to users or devices, as previously noted by a reader (Hi A!).

2. Do you want to star in Co-appearance?

“Co-appearance” sounds like a movie credit, but, in this case, you might not have signed up for the role. Also called “correlation analysis,” this new branch of AI-powered biometric surveillance technology can analyze and track who a person has been with over time, measure the frequency of their interactions, and cross-reference this with other data, such as calendar info.

Many years back, on a trip to Crotonville for leadership training, I recall GE Security giving our class a sneak peek of a brand new camera feed analysis tech: visual analysis of people groups in real-time. That moment felt like a peek into the future.

3. AI-powered building security, minus bias and privacy pitfalls?

Facial recognition has swept the physical security marketplace with wide adoption by governments, banks, and retailers. Global View Research estimate “the global facial recognition market size was valued at USD 3.86 billion in 2020 and is expected to expand at a compound annual growth rate (CAGR) of 15.4% from 2021 to 2028”.

In a way, facial recognition has lodged itself in people’s minds as the defacto technology for visual surveillance, and we should all find that quite disturbing!

I was reminded of this when I stumbled across an interview with the founder of ambient.ai, a company that appears to be taking a refreshingly different approach:

“The first generation of automatic image recognition, Shrestha said, was simple motion detection, little more than checking whether pixels were moving around on the screen — with no insight into whether it was a tree or a home invader. Next came the use of deep learning to do object recognition: identifying a gun in hand or a breaking window. This proved useful but limited and somewhat high maintenance, needing lots of scene- and object-specific training.”

What do they do differently?

“The insight was, if you look at what humans do to understand a video, we take lots of other information: is the person sitting or standing? Are they opening a door, are they walking or running? Are they indoors or outdoors, daytime or nighttime? We bring all that together to create a kind of comprehensive understanding of the scene,” Shrestha explained. “We use computer vision intelligence to mine the footage for a whole range of events. We break down every task and call it a primitive: interactions, objects, etc., then we combine those building blocks to create a ‘signature.”

They claim 200 rules and have five of the largest tech companies (amongst others) as paying customers. Another area where they stand out for me is how they tackle bias:

"We built the platform around the idea of privacy by design,” Shrestha said. With AI-powered security, “people just assume facial recognition is part of it, but with our approach you have this large number of signature events, and you can have a risk indicator without having to do facial recognition. You don’t just have one image and one model that says what’s happening — we have all these different blocks that allow you to get more descriptive in the system.”
Essentially this is done by keeping each individual recognized activity bias-free to begin with. For instance, whether someone is sitting or standing, or how long they’ve been waiting outside a door — if each of these behaviors can be audited and found to be detected across demographics and groups, then the sum of such inferences must likewise be free of bias. In this way the system structurally reduces bias."

I’ve no first-hand experience, so I won’t comment on efficacy, and this is not a recommendation. Still, any approach to physical security monitoring that moves us away from facial recognition by default is worth highlighting to decision-makers.

4. Adversarial Threat Landscape for Artificial-Intelligence Systems

“MITRE ATLAS™ (Adversarial Threat Landscape for Artificial-Intelligence Systems), is a knowledge base of adversary tactics, techniques, and case studies for machine learning (ML) systems based on real-world observations, demonstrations from ML red teams and security groups, and the state of the possible from academic research. ATLAS is modeled after the MITRE ATT&CK® framework and its tactics and techniques are complementary to those in ATT&CK”

If your organisation undertakes adversary simulations, you may wish to lean on ATLAS, where AI systems play a role in identity, access control, or decision support.

5. Backdoor Attack on Deep Learning Models in Mobile Apps

Deep learning models are increasingly used in mobile applications as critical components. Researchers from Microsoft Research demonstrated that many deep learning models deployed in mobile apps are vulnerable to backdoor attacks via “neural payload injection.” They conducted an empirical study on real-world mobile deep learning apps collected from Google Play. They identified 54 apps that were vulnerable to attack, including popular security and safety critical applications used for cash recognition, parental control, face authentication, and financial services.

This MITRE ATLAS case study helps bring to life the framework referenced above.

Initial access is via a malicious APK installed on the victim’s devices via a supply chain compromise. Machine Learning Attack Staging is by a “trigger placed in the physical environment where it is captured by the victim’s device camera and processed by the backdoored ML model”. The team were successful in “evading ML models in several safety-critical apps in the Google Play store.”

Bonus Idea

Writers and programmers find 50% increases in productivity with AI

Feedback

Click the emoji that best captures your reaction to this edition…

😍🤯😴😡👍👎

The Threat Prompt Newsletter

Discussion about this post