AI Attacks and Adversarial AI in Machine Learning

Artikel von:

0 Min. Lesezeit

Today, it’s incredibly common for developers to use AI in their daily workflows. In fact, 92% of developers say they’re already using AI-coding tools in their work. While these tools offer benefits — namely speed — they can also introduce new threats and vulnerabilities. Just as bad actors consistently evolve their techniques to exploit machine learning and AI, organizations need to evolve their cybersecurity.

To protect your organization from AI attacks and adversarial AI, it’s important to understand what adversarial AI in machine learning is, how attacks work, and ways to detect it within your systems.

What is adversarial AI?

When looking at machine learning and AI exploitations, adversarial AI, also known as adversarial machine learning, is when bad actors attempt to alter machine learning systems and AI models by manipulating or deceiving them through data or inputs. Typically, adversarial AI attacks take advantage of AI’s logic and decision-making processes to create malicious outputs.

As adversarial AI is designed to be subtle and undetectable, users often can’t tell when the model has been exploited or compromised.

AI CODE SECURITY

Buyer's Guide for Generative AI Code Security

Learn how to secure your AI-generated code with Snyk's Buyer's Guide for Generative AI Code Security.

Get the guide

Why is adversarial AI in machine learning dangerous?

Adversarial AI in machine learning is dangerous because it impacts the reliability and security of AI systems. As AI becomes a part of more workflows, the reach and impact of exploits increase. For example, if a critical sector like financial services faces an adversarial AI attack, it could result in financial crimes, like fraud, that impact organizations and users.

How do adversarial AI attacks work?

Adversarial attacks work by exploiting machine learning models through malicious or manipulative inputs. By manipulating inputs, attackers can deceive and exploit the system’s decision-making, impacting outputs.

Attacks typically happen in three steps:

The attacker learns the AI system or model by analyzing it. Often, this occurs through reverse engineering to look for weaknesses, vulnerabilities, or other security gaps to access the system’s algorithms and decision-making processes.
Once an attacker understands the model, they can start crafting inputs, or adversarial examples, to deceive or exploit the system.
After perfecting their input, attackers will begin deploying the adversarial examples.

What are the different types of adversarial AI?

To understand and protect your organization from adversarial AI, it's important to know what types of adversarial AI exist:

Evasion attacks: These focus specifically on manipulating input data. Evasion attacks can be targeted or nontargeted. For targeted attacks, the goal is for the AI model to produce a particular incorrect output. Non-targeted attacks don’t have a particular goal or outcome apart from getting the AI to produce something incorrect.
Poisoning attacks: These occur when training data is altered. Since machine learning models depend on training data to generate or predict outputs, poisoning attacks can significantly impact the model’s behavior or decision-making.
Model transfer: When attackers test adversarial examples on AI models and machine learning systems successfully, they can try attacking other models. Because the attack has worked previously, model transfer typically happens much faster than other types of adversarial AI.
Model extraction: Attackers aim to steal or replicate a trained machine learning system or model after gaining access. Once attackers understand how a model works, they can create a copy.
Trojan AI attacks: During the training phase of a model, attackers can add a malicious trigger. These attacks often go unnoticed until the trigger is activated and the attack is live.

Black box vs. white box attacks

When looking at adversarial AI attacks, black box versus white box refers to what an attacker knows about the machine learning system.

White box attacks are when an attacker has full knowledge or understanding of the AI model or machine learning system. This includes the model’s architecture, parameters, and training data. Since the attacker understands the model’s infrastructure, white box adversarial attacks are far more powerful and effective.

Black box attacks are when an attacker has limited or no knowledge of the machine learning system. As they’re unsure of the model’s infrastructure, adversarial attacks are often limited to using the model’s inputs and outputs to try and map out its behavior. If successful, an attacker could execute model extraction or attack another system. These attacks tend to be more common as executing white box attacks requires insider knowledge about the model or system.

Adversarial machine learning techniques

Bad actors’ techniques to deploy adversarial AI will depend on their goals and the model’s infrastructure. Some of the most common adversarial attack techniques are:

Gradient-based methods: Attackers use the gradients of a machine learning model to calculate what small, subtle changes are needed to create adversarial examples. An example of this is the Fast Gradient Sign Method (FGSM).
Optimization-based methods: Attackers use mathematical techniques to determine the most effective changes to create the best adversarial example. Carlini & Wanger (C&W) is a popular example of this technique.
Query-based methods: In black box attacks, attackers use this technique by querying the model to learn its behavior. As they learn more and analyze output patterns, attackers can identify boundaries and vulnerabilities to create an adversarial example. Researchers have recently found that query-based methods can result in universal and transferable adversarial attacks.

Examples of adversarial attacks in generative AI

Adversarial attacks in generative AI are becoming more common. In most cases, attacks manipulate a model’s output, including image and code generators. Examples include the generation of:

Incorrect or inappropriate images
Incorrect information or malicious links and malware
Insecure or malicious code
Misleading audio

When adversarial attacks are deployed in generative AI, the goal is for the changes to be undetected. If a developer uses AI to generate code, an adversarial attack could slightly alter it when a particular command is used. If the developer doesn’t check their generated code before releasing it — assuming it’s secure — it could impact the entire codebase.

Similarly, an adversarial attack could alter AI-generated images. This may be as simple as changing a small detail in the image, or it could generate something more offensive or inappropriate.

Researchers have also found several ways AI could be exploited in the future. One example is attackers manipulating road signs or making an autonomous vehicle hallucinate, causing accidents or dangerous situations. Another is attackers manipulating facial recognition software or systems to impersonate people or bypass security measures.

How to detect adversarial attacks

To protect themselves from adversarial attacks, organizations must have a robust security strategy that includes a proactive, layered defense.

Continuous monitoring

To ensure no threat goes undetected, organizations should implement continuous monitoring of their machine learning systems. One way to do this is through security tools that continuously monitor an organization’s assets and infrastructure.

Continuously monitoring AI inputs and outputs can also help organizations detect anomalies, which could indicate an adversarial attack. Other ways to implement continuous monitoring include tools that analyze logs for suspicious activity, input validation and sanitization protocols, and consistently evaluating the model’s performance. If an AppSec team uses AI in their workflows, a security tool that can continuously detect vulnerabilities and malicious or insecure code is also vital.

Adversarial training

Training your model on adversarial examples can make your model more secure. By incorporating adversarial examples into training data, your model can learn to recognize threats and defend against them.

Education and awareness

Along with implementing the right tools and training for a model, an organization’s users and teams should know the signs of an adversarial attack, the threats it poses, and how to report suspicious activity. As adversarial attacks aim to be undetectable, education and awareness can catch threats or alert to changes before they cause problems for an organization or its systems.

Defend against AI attacks with Snyk

Having a strong security posture is key to protecting your organization against AI attacks. Using a platform like Snyk can defend your organization against adversarial AI while securing the SDLC and ensuring every asset and application is protected. If your development team is integrating AI into their workflows, Snyk Code can secure their code as it’s written by scanning and recommending fixes in real time.

KI-Sicherheit beherschen mit Snyk

Erfahren Sie, wie Snyk dazu beiträgt, den KI-generierten Code Ihrer Entwicklungsteams zu sichern und gleichzeitig den Sicherheitsteams vollständige Transparenz und Kontrolle zu bieten.

Live-Demo buchen Mehr erfahren

Snyk: Die Plattform für Developer Security

Sie möchten Snyk in Aktion erleben?