Adversarial Learning is a sub-field of Machine Learning, where the learning model is trained with a rival model, mainly adaptive adversaries. Such models are highly attractive for AI security, generative models, and other machine learning tasks. In traditional machine learning, the model is trained, evaluated, and selected based on samples assumed the arrive from the same distribution. This assumption creates vulnerabilities to attacks from rational, adaptive attackers. Such attackers can design examples that intentionally violates the fixed distribution assumption and end in a model error.
In his paper, "EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES," Ian J. Goodfellow shows an example of a classifier fooled to classify a panda bear just by adding such a small perturbation to the image that it's invisible to the human eye. These "machine optical illusions," that many times repeat over different models which were trained on the same training data, expose underlying vulnerabilities in our training process. Various explanations were suggested throughout the years, varying from nonlinearities and overfitting to the linear nature of neural networks. Whatever the cause might be, it is clear that the attacker is using the fixed distribution assumption against the model.
Goodfellow's examples hold true as long as the attacker can plan his attack (picking the misleading samples) after the model finished his training and cannot adjust to new samples. This insight gives the intuition behind Adversarial Learning, instead of first training our model and then sending it to face attackers with no option to react, we can incorporate the attacker into the training process of our model.
Before diving into the math behind the last sentence, we will look for inspiration in a slightly different place.
"Those who are inspired by a model other than Nature,
a mistress above all masters, are laboring in vain..."
Leonardo di ser Piero da Vinci
Nature gives us many examples of interactions between different strategies and agents. Since Adversarial Learning is an interaction between two agents, the learning model, and its adversarial or attacker, nature can be a good place to start our search for the intuition behind it. A suitable framework to study natural interactions between strategies is Evolutionary Game Theory. In this theory, the interaction between strategies, whether competing or collaborating, is examined from the eyes of the evolutionary process.
Let's start by looking at a simple interaction between two agents - predators and prey. This model is based on the Lotka-Volterra equations.
Lotka-Volterra equations:
$$ \frac{\partial x}{\partial t} = \alpha x - \beta xy$$
$$ \frac{\partial y}{\partial t} = \delta xy - \gamma y$$
These equations describe the dynamics of predator and prey populations. The first equation defines the change in the prey population, where $\alpha$ is the growth factor for the prey, and $\beta$ is the death factor (combining natural deaths and predation). The second equation describes the change in the predator population, where $\delta$ is the growth factor for the predator ($\delta xy$ is the growth term, which is dependent on the availability of prey!) and $\gamma$ is the death factor. Here we used the Runge-Kutta method to solve the ODEs.
Solver:
The predator-prey model is based on four simplifying assumptions:
A. There is infinite food for the prey population.
B. Predators always find prey, if the latter exists.
C. Both environment and the two species are static (no mutations).
D. Only one strategy per species is allowed.
Apparently, both in real life and in Adversarial Learning, these assumptions don't hold. Still, this a good starting point for understanding the dynamics of competing strategies in a zero-sum game. In the next steps, We'll look at more general models.
The Dove-Hawk game is perhaps the most studied game in evolutionary game theory. First introduced by John Maynard Smith in his book Evolution and the Theory of Games, this game models the competition of two static strategies over a shared resource (food). The Hawks' strategy is to fight for the food, while the doves take a peaceful strategy. When a hawk meets a dove, the hawk scares the dove away and gets all the food, but if a hawk meets another hawk, the competition will end in a fight where the winner takes it all and the loser might end up wounded. A dove-dove match will be resolved in sharing the meal. This game is not a direct predator-prey case, but the success of one strategy still means a loss for the other.
The dynamics of such a game are described as a payoff matrix:
meets Hawk | meets Dove | |
---|---|---|
Hawk | $V/2 - C/2$ | $V$ |
Dove | $0$ | $V/2$ |
where $V$ is the value of getting a full meal, and $C$ is the cost of getting injured in a competition.
Like most EGT games the governing equation in the Dove-Hawk game is the replicator dynamic equation:
$$ \frac{\partial x_i}{\partial t} = x_i \left [ f_i(x) - \sum_{j=1}^n x_j f_j(x) \right ] $$
where $x_i$ is the proportional ratio of strategy $i$ in the population and $f_i(x)$ is the fitness score of strategy $i$ (usually proportional to the size of the population).
The general replicator equation is a non-linear PDE, so solving it analytically (or numerically) might be a difficult task. However, in the Dove-Hawk game, and any other 2-strategy game, the PDE becomes ODE given the fact that $\sum_{j=1}^n x_j = 1$. Here is a Runge-Kutta solver for a Dove-Hawk game:
As one might see, no matter what are the initial conditions the game always converges to the Evolutionary Stable State (ESS), which is the EGT version of Nash Equilibrium. While in general there might be no ESS or multiple ESSs, in the Dove-Hawk game the only ESS is when the hawk ratio in the population equals $V/C$.
So far we only discussed interactions of static strategies, but if we want to learn anything about learning we need to consider "games" that allow the strategies to evolve, one such framework is Genetic Algorithms. In a GAs strategies compete on the right to reproduce, resulting in a constant improvement.
Let's go back to our predator-prey model and let's assume the prey in our new are rabbits, and the predators are foxes. However, this time we will take into account one extra parameter, the animal speed. In this game, all four assumptions of the basic predator-prey model are abandoned. The environment produces enough food to support up to 1000 rabbits, a fox can only find and catch slower prey than himself, and both species can evolve over time. For simplicity, we will represent the 'speed genes' of each animal as a 32-bit number, where the speed is decimal equivalent of that number. Like any GA, we will allow crossover and random mutations. Not like other GAs, our populations can and will vary in size.
The rules for our game are as follow:
The probability that rabbit $i$ will reproduce (and the average number of descendants it will have) is
$$ f(x_i, t) = \alpha - \beta \hat{y}(x_i, t) ,$$
where $\alpha$ is the reproduction constant of rabbits, $\beta$ is the hunger constant of foxes, and $\hat{y}(x_i, t)$ is the number of foxes that are faster then rabbit $i$ at time step $t$.
In the same way, the reproduction probability of fox $j$ is
$$ g(y_j, t) = \delta \hat{x}(y_j, t) - \gamma ,$$
where $\gamma$ is the death constant of foxes, $\delta$ is the reproduction constant of foxes, and $\hat{x}(y_j, t)$ is the number of rabbits slower than fox $j$ at time step $t$.
Mutations occur at a constant rate and during crossover, genes (bits) are taken from both parents in the same probability.
As we see, the competition between the two species is driving both of them to be better (faster). In Evolutionary Game Theory this phenomenon is called "evolutionary arms race" or "evolutionary pressure." The arms race is not the only possible evolutionary advancement, the same type of motivation can push species into fascinating behaviors. An excellent demonstration of collaborative co-evolution can be found in the work of Nicholas Marchand and Cynthia H. Collins. They showed that over generations different types of bacteria (Escherichia coli and Bacillus megaterium) developed joint signaling systems that helped both of them get food.
We saw that the interaction between competing or collaborating strategies is the driving force behind evolutionary advancement, but what happen when there is only one type of agents or if the other agent is not evolving? To answer this question, we will run our modified GA once again but this time will set the speed limit for the fox population lower than for rabbits. Before we go back to the GA, let's think of some real life examples.
Desolated islands all over the globe are a fertile ground for evolutionary oddities. Getting to those islands, to begin with, might have been a demanding journey that required unique strength and skills, but once arrived these species found a unique environment with no predators and minimal competition over resources. The Kakapo (in the above video) is such a creature, evolving is mammal-less New-Zealand, it is the largest parrot in the world. At a weight of 4 KG, it is flightless and practically defenseless against most of the world predators. Charlie Douglas, one of New-Zealand most known explorers, described them as falling from the trees like apples with just a small shake. The Kakapo evolved over the ages into one of the most abundant species in the "comfortable" environment of New-Zealand, but this overspecialization almost got it extinct one man introduced mammals to the island. The story repeats over and over for different species, relatively static environment causes overspecialization that in turn threaten the survival of the species upon the first significant change.
Moving back to our fox-rabbit GA, we will investigate what happen when the environment isn't as dynamic as before. This time we will use only 30-bit numbers for the Fox speed, practically limiting the maximal fox speed to one-quarter of the maximal rabbit speed. This very small change will result in a huge change in the evolution of the two species.
What happened this time? Throughout the first four generations, the arms race was exactly as before. The change happened once the fox population reached its speed limit and the rabbit became too fast to be caught. As expected, at this point the rabbits won the "race" and the foxes, now unable to get food, went extinct. Much more interesting is the fact that after this transition the rabbits stopped getting faster, this time they didn't achieve their top speed. The reason is that after there were no more foxes in the environment the evolutionary pressure pushing the rabbits to be faster disappeared.
"To kill an error is as good a service as, and sometimes even
better than, the establishing of a new truth or fact."
Charles Darwin
We have seen various models for natural interactions but what all of that has to do with Adversarial Learning? The answer is quite simple, evolution is nature's search algorithm . Through evolution, nature optimizes a score function, where the genetic code is the search space, and environmental fitness is the score. Nature's score function is highly dependent on the particular environment in which the process takes place. Evolutionary search might be very slow and might seem random but proves to be efficient when our goal is extremely complex and not well defined.
Co-Evolution is a type of evolutionary search algorithm, where multiple species try to optimize their environmental fitness function, but while doing so, they change the environment in a way that affects other species. In turn, this constant change in the environmental conditions changes the fitness functions and forces the different species to "recalculate route" and adjust to the new situation. The constant change of the environment is equivalent to slowly but steadily raising the difficulty levels for everyone and, just like a kid learning math, this drives the various species to be stronger and better or to go extinct. In the next section, we will see how the same effect influences machine learning tasks.
The classic supervised learning problem can be defined as the following optimization problem:
$$ \argmin_{f \in \mathcal{F}} \int_{\mathcal{X} \times \mathcal{Y}} L(y, f(x)) d\rho (x, y) = \argmin_{f \in \mathcal{F}} \varepsilon(f),$$
where $ \mathcal{S} = \mathcal{X} \times \mathcal{Y} $ is the probability space, $\rho$ is a density function, $L$ is a loss function, and $\mathcal{F}$ is the search space of all possible functions.
In the non-trivial case, $\rho$ is only known through a small, usually finite, sub-set called the training set, and $\varepsilon(f)$ cannot be computed directly. This definition is too wide to gain any progress, so few additional assumptions are needed. Since the only information we have about $\rho$ is based on the training set, $\mathcal{T}$, most machine learning models assumes that the samples in $\mathcal{T}$ follow the same distribution $\rho$ and are independent and identically distributed.
The definition above assumes no prior on the function space, $\mathcal{F}$, or the density function, $\rho$. However, from the No Free Launch theorem we know that uniform, universal consistency is impossible, meaning
$$ \lim_{|\mathcal{T}| \rightarrow \inf} \sup_{\rho} P (\varepsilon(\hat{f}) - \inf_{f \in \mathcal{F}} \varepsilon(f) > \epsilon) > 0 ,$$
for every model $f$ and some $\epsilon > 0$. In other words, to generalize we have to assume some prior on the model or the density. In most cases, choosing the type of model (SVM, graphical model, neural network, etc.) is the first prior on the model, and the regularizer is the second.
The loss function is one of the main building blocks of the learning problem. For some problems defining the appropriate loss function is extremely hard, but even simple cases might hold some surprises. Binary classification tasks might seem very harmless, all the model has to do is classify each sample to one out of two classes (e.g. -1 or +1). The intuitive loss function would be
$$ L(y, f(x)) = \mathbb{1}_{y \neq sign(f(x))} .$$
However, this function is non-convex and non-differentiable, so optimizing it might be inefficient computationally. Standard procedure is to use a well-behaved function as an approximation. The Mean Squared Error is a widely used approximation for such functions. Now let's see what happens when we use such approximation in even the simplest problems. We use a one-dimensional function and a uniform density, $f^*(x) = 1$ for every $x > 40$ or $x < 20$, and $f^*(x) = -1$ for every $20 < x < 40$. We will try to fit $f^*$ with a second order polynomial, based on 100 uniformly chosen samples in the range $0-100$. Since this is a linear least squares, we can solve the problem for the optimal fitting using SVD (the full solution is in this link) .
As we see in the graph, the best fitting based on Mean Squared Error predicts the value $1$ for all inputs. This achieves around $80\%$ accuracy on the test set and lower on the full density range.
Due to the simplicity of this problem, we can solve for the real Misclassification error. A very naive grid search (with some analytical intuition) finds the optimal solution with $100\%$ accuracy on all values.
OK, so MSE clearly isn't ideal for binary classification, but should we be surprised? Misclassification is hard to optimize, so we need an approximation. MSE is just a bad approximation in this case. But the important takeaway is that we can use function approximation models to find a suitable loss function that is easy enough to optimize. That's the first and critical feature of adversarial models; we simply use a very efficient model as our loss function.
We just saw that using a machine learning model as our loss function can be a good idea if our intuitive loss is unavailable. However, that's far from the only strength of Adversarial Learning. Like in nature, also in machine learning overspecialization is a real problem. To understand the last sentence we used a generative model called Generative Adversarial Network (GAN). GANs are a form of dual neural network, where one network, the Generator, tries to generate new synthetic samples as close as possible to the training set, and a second network, called Discriminator, aims to differentiate between "real" samples and synthetic ones. GANs were proven to be very powerful in tasks like generating images, 3D objects, text, and audio. Here is an example of a well trained image GAN:
This is only possible when both the Generator and Discriminator are dynamic and keeps on learning. In the next example we will run the exact same algorithm for the same amount of epochs, but after doing so we will freeze the Discriminator and train the Generator for 30 more epochs.
Very much like in the static version of the fox-rabbit GA, also in the second GAN, when we "freeze" the Discriminator the performance of the Generator stops getting better. Unlike the fox-rabbit GA, here the performance gets much worse, this is the outcome of overspecialization. GANs are only powerful because the Discriminator, which is the loss function for the Generator, keeps on changing, pushing the Generator to be ever better, or at least keep changing. Once we stopped the Discriminator from updating, the Generator only learns the weak points of the static Discriminator. To be correct, the Generator is still getting better, but in the wrong sense. This case of optimizing the "faulty" fitness function is the same as the Kakapo and other island species.
The same reason that helps adversarial models avoid overspecialization is the one that helps it avoid overfitting. There is a subtle difference between the two, but they are still different. The dynamic nature of adversarial models is the one that helps both. We can see that by looking at the GAN example from the Discriminator point of view. Instead of learning to classify a fixed, static set of examples, the Discriminator is trained on an entirely new set of instances in every epoch. This feature means we no longer need the i.i.d assumption for the training set. Moreover, nothing requires the samples generated from the Generator to even be from any specific distribution. Our new and improved classifier is trained against as strong an opponent (or attacker) as we can computationally write.
In this work, we demonstrated the power of dynamic competition both in nature and in machine learning. The bottom line is that Adversarial Learning draws its strength from three main properties - a loss function that has high expression ability and is easy to optimize at the same time, an ever changing environment (loss) that prevents overspecialization, and training process that is not bound by the strict assumptions of traditional supervised learning. All three combined, make Adversarial Learning a very powerful tool in tasks that have a vague or hard to optimize loss function, require generating new samples behind the given train set, and sensitive to rational, adaptive attackers.
Smith, J. Maynard, and G. R. Price. "The logic of animal conflict." Nature 246 (1973): 15.
Smith, John Maynard. Evolution and the Theory of Games. Cambridge university press, 1982.
Nowak, Martin A., and Robert M. May. "Evolutionary games and spatial chaos." Nature 359.6398 (1992): 826-829.
Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
Huang, Sandy, et al. "Adversarial attacks on neural network policies." arXiv preprint arXiv:1702.02284 (2017).
Marchand, N. and Collins, C. H. (2013), Peptide-based communication system enables Escherichia coli to Bacillus megaterium interspecies signaling. Biotechnol. Bioeng., 110: 3003–3012. doi:10.1002/bit.24975.
L. Rosasco and T. Poggio, Machine Learning: a Regularization Approach, MIT-9.520 Lectures Notes, Manuscript, Dec. 2015.