AI under criminal influence: adversarial machine learning explained

Since the public release of ChatGPT, the adoption of artificial (AI) and machine learning (ML) systems has seen a significant boost. Companies are now rushing to integrate AI technology for a competitive advantage – but are they also putting themselves at the mercy of cybercriminals?

ML models that power many AI applications are vulnerable to the following cyberattacks:

Attacks against data contained within AI systems: Data is the most important aspect of ML systems. This data may include sensitive personally identifiable information (PII) and business information, making it a primary target for malicious actors.
Adversarial machine learning attacks: These are categorized into four groups: poisoning, evasion, extraction, and inference. We’ll explain each one in more detail later on.

Adversarial machine learning is the act of providing malicious input into an ML model to make it produce inaccurate results or degrade its performance. This attack could happen during the training phase of the ML model or might be introduced later on (through input samples) to deceive an already trained model.

Before discussing the different adversarial attack techniques against ML models, it’s worth mentioning how ML learning models are trained.

How are ML models trained?

Data is the lifeblood of machine learning systems. According to research conducted by an AI analysis firm Cognilytica, 80% of AI project time is spent on gathering, organizing, and labeling data. Training data is gathered from different sources, such as:

The internet – for example, Facebook, Twitter, or Instagram feeds
Surveillance cameras
Surveillance drones
Security system logs
Any other source of computer data

This data will feed into an ML algorithm that will extract patterns from the provided data. Each ML model will use a different technique to learn from the supplied data. However, they’ll learn everything they can and improve over time as more training data is fed into their models.

After training, the ML model can be deployed in any AI system. It’s worth noting that many ML models continue to improve through learning after deployment, while other models become closed and do not update their patterns after launch.

Figure 2 – General ML process | Source: https://mapendo.co/blog/training-data-the-milestone-of-machine-learning

Types of adversarial ML attacks

Machine learning engineers leverage adversarial ML attack techniques to help improve the robustness of machine learning models by exposing them to malicious inputs during the training and inference phases. However, bad actors can use these techniques to disrupt the normal working behavior of AI and ML models.

From the threat actor knowledge point of view, adversarial ML attacks can be classified into two major types:

White box attack

This attack is the most dangerous because attackers have full access to the ML model, which includes access to the model parameters, hyperparameters (these parameter values control the model learning process), model architecture, defense mechanism, and the model training dataset.

Black box attack

In a black box attack, the attacker can access the ML model outputs but not its internal details like architecture, training data, ML algorithm, or defense mechanism. The attacker can only provide inputs to the model and check the corresponding outputs. By analyzing these input-output pairs, an attacker attempts to infer how the model operates in order to create a customized attack.

Methods of executing adversarial ML attacks

There are four main methods of executing ML adversarial attacks:

Poisoning attack

In a data poisoning attack, attackers tamper with the training data used to build a machine learning model, with the aim of causing misclassifications once the model is deployed. For example, the attacker could inject malicious files labeled as benign into the training data for a malware classifier. By poisoning the training data, the model would be trained to allow malware files containing the attacker’s malicious code to bypass detection.

When later deployed in the production environment, the corrupted ML model would have learned incorrect patterns, creating security holes that attackers could exploit. Data poisoning attacks are dangerous threats because manipulation during training can have an ongoing impact, long after the attack is over.

Evasion attack

In this type of attack, the ML model is already trained, so attackers work to craft the input samples during deployment to force the classifier to misclassify them. A good example is AI-powered anti-spam filter solutions. Attackers could conceal the SPAM code within a transparent image to prevent the textual AI-powered spam filter from detecting it.

Evasion is different from a poison attack. In evasion, attackers don’t change the behavior of the machine learning model by manipulating training data – instead, they exploit its weaknesses (e.g., weak-tuned parameters or susceptible architectures) through specifically crafted inputs to make the model produce inaccurate results. For example, in an evasion attack, hackers might add slight perturbations to an image to cause an image classifier to recognize it wrongfully during inference (e.g., misclassifying a tree as a tank during inference). However, the model’s parameters and training process are unchanged.

Extraction attack

Model extraction attacks involve replicating a target machine-learning model and training a substitute model on the inputs and outputs. This allows attackers to steal sensitive data, such as intellectual property or proprietary logic, embedded in high-value AI systems.

Extraction focuses on stealing the model itself rather than observing its response to copy its behaviors. The attackers query the target model with selected inputs, observe the outputs, and train a substitute model to resemble the input-output mapping. If successful, the adversary gets a copied version of the model.

Model extraction exposes confidential information in the original model’s architecture, logic, and training data. It also allows the adversary to conduct further attacks using their extracted model copy, such as creating evasion inputs or manipulating model logic.

Model extraction poses two primary risks:

The attacker can steal the model and reveal how the machine learning system works.
Stealing the model can facilitate other attack types, such as poisoning, logic, data leakage, model misuse, evasion and model inversion attacks.

Inference attack

In this attack, adversaries try to discover what training data was used to train the ML system and take advantage of any weaknesses or biases in data to exploit it.

For instance, ML systems used in banking and medical organizations are trained to use sensitive client information, such as names, birth dates, addresses, account passwords, credit card numbers, health information, and other personal details.

Suppose, after finishing the training period, a bank decided to remove their sensitive client’s information from the ML datasets. Although the client’s data were removed, the ML model has learned a lot of sensitive information about its customers and could be subject to inference attacks. An attacker could probe the ML model with crafted input to reveal sensitive information.

How to combat adversarial attacks against ML systems?

Adversarial attacks are considered the most critical security risks facing machine learning systems today. To combat them, machine learning engineers should take precautions such as:

Adversarial training, which augments training data with sample malicious inputs to improve model robustness.
Anomaly detection techniques to identify patterns that could represent adversarial inputs.
Robust model architectures and training procedures designed to resist adversarial manipulation.
Monitoring systems and networks to detect abnormal traffic whihch can indicate to a cyberattack. We can use security solutions such as intrusion detection systems (IDS) and anomaly detection systems (ADS).
Implementing security best practices like data encryption, access controls, and IT infrastructure hardening.