Natacha Bakir
Malware Reverse Engineer & CTI Lead
SC Awards “Woman of Influence” 2025 | Recorded Future Excellence Award 2025 | Eurovision 2025 CTI Lead |
VX-Underground/SentinelOne Malware Research Challenge Finalist
Published: 2026-02-13
Reading time: 3 minutes
In this article, we demonstrate a practical adversarial machine learning experiment showing how deep learning vision systems can be fooled by nearly imperceptible perturbations. The full notebook is available on my GitHub repository and reproduces the complete experiment step by step.
The goal of this work is simple: show that a highly accurate image classifier can become unreliable when exposed to adversarial inputs that remain visually indistinguishable to humans.

The experiment uses:
The objective is to compare the model’s performance before and after adversarial perturbations are introduced.
We load the CIFAR-10 dataset, which contains 60,000 color images distributed across ten classes such as airplanes, cats, ships, and automobiles. For demonstration purposes, the notebook uses a reduced subset to speed up execution while preserving the attack behavior.
A pre-trained ResNet model is then loaded. Using a pre-trained model reflects a realistic scenario: attackers target already-deployed systems rather than training new ones.

Before launching any attack, the classifier is evaluated on clean test samples to establish a baseline performance reference. In the experiment, the model correctly classifies almost all samples, confirming that the model is functioning as expected prior to adversarial manipulation.


Adversarial examples are generated using the Fast Gradient Sign Method (FGSM), implemented through the Adversarial Robustness Toolbox.
FGSM operates by computing the gradient of the loss with respect to the input image and slightly modifying the pixels in the direction that maximizes classification error:
x_adv = x + ε · sign(∇x L)
The perturbation magnitude ε is intentionally kept small so that the modified image remains visually indistinguishable from the original sample.

Once adversarial images are generated, the classifier is evaluated again. The results clearly demonstrate a drastic performance degradation, even though the images appear unchanged to human observers.
In this experiment, accuracy drops dramatically, illustrating how fragile even well-trained deep learning models can be when confronted with adversarial inputs.

The final step displays adversarial images alongside the predicted label and the true class. This visual comparison clearly shows the key phenomenon:
The image looks normal to a human
The model prediction changes
The classifier remains highly confident in the wrong prediction
Before

After the attack

This experiment highlights several important security insights:
Adversarial robustness is therefore not an optional feature but a fundamental requirement for trustworthy AI deployment.
![]() |
The full reproducible notebook is available on my GitHub repository: Alphabot42 Alphabot42/AI Security/Practical Adversarial Attacks on Vision Models FGSM Attack on CIFAR 10 using ART |