Xiao-shan GAO()Shuang LIU()Lijia YU()
Academy of Mathematics and Systems Science,Chinese Academy of Sciences,University of Chinese Academy of Sciences,Beijing 100190,China
E-mail:xgao@mmrc.iss.ac.cn
Abstract The purpose of adversarial deep learning is to train robust DNNs against adversarial attacks,and this is one of the major research focuses of deep learning.Game theory has been used to answer some of the basic questions about adversarial deep learning,such as those regarding the existence of a classifier with optimal robustness and the existence of optimal adversarial samples for a given class of classifi ers.In most previous works,adversarial deep learning was formulated as a simultaneous game and the strategy spaces were assumed to be certain probability distributions in order for the Nash equilibrium to exist.However,this assumption is not applicable to practical situations.In this paper,we give answers to these basic questions for the practical case where the classifi ers are DNNs with a given structure;we do that by formulating adversarial deep learning in the form of Stackelberg games.The existence of Stackelberg equilibria for these games is proven.Furthermore,it is shown that the equilibrium DNN has the largest adversarial accuracy among all DNNs with the same structure,when Carlini-Wagner’s margin loss is used.The trade-offbetween robustness and accuracy in adversarial deep learning is also studied from a game theoretical perspective.
Key words adversarial deep learning;Stackelberg game;optimal robust DNN;universal adversarial attack;adversarial accuracy;trade-offresult
Dedicated to Professor Banghe LI on the occasion of his 80th birthday
A major safety issue for deep learning [22]is the existence of adversarial samples [40];that is,it is possible to make small modifi cations to an input sample of a DNN,which are essentially imperceptible to the human eye,while the DNN outputs an incorrect label or even any label given by the adversary.The existence of adversarial samples makes deep learning vulnerable in safety critical applications and adversarial deep learning has become a major research focus of deep learning [44].The goal of adversarial deep learning is to train robust DNNs against adversarial attacks,as well as to develop more effective attack methods for generating adversarial samples.
Many adversarial defence models have been proposed,including adversarial training based on robust optimization [24,49],gradient masking and obfuscation [1,48],adversarial parameter attacks [23,41,47],universal adversaries [5,27],randomized smoothing [9],and adversarial sample detection [7].Many attack methods have also been proposed,including white-box attacks based on the gradient information of the DNN [6,24,30],black-box attacks based on transferability of the adversaries [31],poisoning attacksfor the input data [17,36],and physical world attacks [2,21].More details can be found in the survey [44].
Many defenses arefound to be susceptibleto new adversarial attacks,and stronger defences have also been proposed against new adversarial attacks.To break this loop of defences and attacks,a recent line of research,based on game theory [14,38],tries to establish a more rigourous theoretical foundation for adversarial deep learning by answering questions such as those in [5,8,25,32]
Question Q1Does there exist a classifier which ensures optimal robustness against any adversarial attack?
Question Q2Do there exist optimal adversarial samples for a given class of classifi ers and a given data distribution?
To answer these questions,adversarial deep learning was formulated as a non-cooperative game between the Classifier and the Adversary in [5,8,25,32].The goal of the Classifier is to train a robust DNN.The goal of the Adversary is to create optimal adversarial samples.A Nash equilibrium of the game is a DNN C?and an attack A?,such that no player can benefit by unilaterally changing its strategy,which gives an optimal solution to the adversarial deep learning.The existence of Nash equilibria has been proven under various assumptions [5,25,32].
Despite the great progress,questionsQ1andQ2have not been answered satisfactorily.The main reason for this is that in order for the Nash equilibrium to exist,the strategy spaces for the Classifier and the Adversary are either assumed to be a convex set of probability distributions or measurable functions.However,in practice,DNNs with fixed structures are used,and Nash equilibria do not exist in this case.In this paper,we will show that questionsQ1andQ2can be answered positively for DNNs with a fixed structure by formulating adversarial deep learning in form of Stackelberg or sequential games.
A positive answer to questionQ1is given by formulating adversarial deep learning as a Stackelberg game Gs,with the Classifier as the leader and the Adversary as the follower,where the strategy space for the Classifier is a class of DNNs with a given structure,say DNNs with a fixed depth and width and the strategy space for the Adversary is the set of all possible attacks with a given attack radius.The game Gsis called an adversarial deep learning game.
We show that the game Gshas a Stackelberg equilibrium,which gives the optimal robust DNN under a certain robustness measurement(refer to Theorem 3.5).We further show that when the Carlini-Wagner margin loss is used as the payoff function,the equilibrium DNN is the optimal defense and has the largest adversarial accuracy among all DNNs with the same structure(refer to Theorem 4.4).Furthermore,the equilibrium DNN is the same as that of the adversarial training [24].Thus,our result gives another theoretical explanation for the fact that adversarial training is one of the most effective defences against adversarial attacks.
The trade-offproperty for deep learning means that there existsa trade-offbetween robustness and accuracy [42,45,49].We prove a trade-off result from a game theoretical viewpoint.More precisely,we show that if a linear combination of the pay off functions of adversarial training and normal training is used as the total payoff function,then the equilibrium DNN has a robustness no higher and accuracy no lower than that of the DNN obtained by adversarial training.We also show that the trade-off property does not hold if using an empirical loss to train the DNNs.More precisely,it is shown that the DNNs with the largest adversarial accuracy can be parameterized by elements in an open set of RK,whereKis the number of parameters of the DNN.In other words,there still exists room to improve the accuracy for DNNs under optimal adversarial accuracy.
Finally,when using the empirical loss for a finite set of samples to train DNNs,we compare Gs(denoted as G1in this case)with two other games G2and G3,where G2is a Stackelberg game with the Adversary as the leader and G3is a simultaneous game between the Classifier and the Adversary.We show that G2has a Stackelberg equilibrium and G3has a mixed strategy Nash equilibrium.Furthermore,the pay off functions of G1,G2,and G3at their equilibria decrease successively.The existence of Stackelberg equilibrium for G2gives a positive answer to questionQ2for DNNs with a given structure.
The game theoretical approach to adversarial machine learning was first undertaken in the seminal work of Dalvi,Domingos,Mausam,and Verma [11],where they formulated adversarial machine learning as a simultaneous game between the Classifier and the Adversary.Quite a number of works along these lines have appeared and have formulated adversarial machine learning both as a simultaneous game and as a Stackelberg game.The details can be found in the nice surveys [20,50].These works usually employed linear models such as SVMfor binary classifi cations and used spam email filtering as the main background of applications.
The game theoretical approach to adversarial deep learning appeared recently,and was partially stimulated by the fact that adversarial samples seem inevitable for deep learning [3,4,10,37].
In [16,18,25,29,32,34],adversarial deep learning was formulated as a simultaneous game.In [32],the game was shown having no pure strategy Nash equilibria,while mixed strategies led to more robust classifiers.In [16,25],it was proven that Nash equilibria exist and can be approximated by a pure strategy when the strategy spaces for both the Classifier and Adversary are parameterized by certain distributions.In [29],the Classifier ensures the robustness of a fixed DNN by adding perturbation to the sample to counteract the Adversary.In [18,34],methods to compute mixed Nash equilibria were given.In [5],the adversarial example game was proposed as a maxmin game between the Adversary and the Classifier to answer QuestionQ2,and it was proven that Nash equilibrium existed when the strategy space for the Classifier was convex.
In [8],adversarial deep learning was formulated as a Stackelberg game with the Adversary as the leader,but the existence of equilibria was not given.In [13,19],properties and algorithms for local Stackelberg equilibria were studied.
In all the above work,adversarial deep learning was modeled as a non-cooperative game.On the other hand,the cooperative game was used to explain various adversarial attacks and defenses in [35].
Most of the above works formulated adversarial deep learning as a simultaneous game and assumed the strategy spaces to be certain convex probability distributions in order to prove the existence of the Nash equilibrium.In this paper,we show that,by formulating adversarial deep learning as a Stackelberg game,the Stackelberg equilibria exist for DNNs with a given structure,and the equilibrium DNN is the best defence in that it has the largest adversarial accuracy among all DNNs with the same structure.
Adversarial training is one of the best practical training methods to defend adversaries,which was introduced as a robust optimization problem [24].Adversarial training can be considered as a Stackelberg game with the Classifier as the leader,and this paper can be considered as a detailed theoretical study of adversarial training from game theoretical viewpoint.
The rest of this paper is organized as follows:in Section 2,preliminary results are given.In Section 3,adversarial deep learning is formulated as a Stackelberg game and the existence of Stackelberg equilibria is proven.In Section 4,it is proven that adversarial training with the Carlini-Wagner loss gives the best adversarial accuracy.In Section 5,two trade-offresults are proven.In Section 6,three types of adversarial games are compared for when the data set is finite.In Section 7,conclusions and problems for further study are given.
Let C:X→Rmbe a classification DNN withmlabels in Y=[m]={1,···,m}[22].Without loss of generality,we assume that X=In,where I=[0,1].Denote Cl(x)∈R to be thel-th coordinate of C(x)forl∈[m],which are called logits of the DNN.Forx∈X,the classification result of C is ?(x)=argmaxl∈YCl(x).We assume that Relu is used as the activation function,so C is continuous and piecewise linear.The results are easily generated to activation functions which are Lipschitz continuous.
To train a DNN,we need first to choose a hypothesis space H for the DNNs,say the set of CNNs or RNNs with a certain fixed structure.In this paper,we denote NW,Dto be the set of DNNs with width≤Wand depth≤D,and use it as the hypothesis space.For a given hypothesis space H,the parameter set of DNNs in H is fixed and is denoted as Θ ∈RK,whereKis the number of the parameters.C can be written as CΘif the parameters need to be mentioned explicitly that is,

Let the objects to be classified satisfy a distribution D over X×Y.Given a loss functionL:Rm×Y→R,the total loss for the data set is

Training a DNN CΘis to make the total loss minimum by solving the following optimization problem:

Intuitively,the adversarial training first computes a most-adversarial sample

forx,and then minimizesL(C(xa),y)instead ofL(C(x),y).
Given a DNN C and an attack radiusε,we define the adversarial robustness measure of C with respect toεas
which is the total loss of C at the most-adversarial samples.C is more robust if ARD(C, ε)is smaller.Then the adversarial training is to find a DNN in H with the optimal adversarial robustness measurement,which is denoted as

ARD(C, ε)and ARD(H, ε)have the following simple properties:

(3)in the optimal case,we have ARD(C, ε)=0,which means that C gives the correct label for anyx∈B(x, ε).In this case,we say that C is robust for the attack radiusε.It has been proven that there exist robust classifiers for a separated data set [45].
LetCΘ:X→Rmbe a fully connected feed-forward DNN with depthD,whosel-th hidden layer is

Lemma 2.1For any DNN CΘ:→Rmwith width≤W,depth≤D,and‖ Θ ‖2≤E,there exists an ? (n,m,D,W,E, ε)∈R+such that‖CΘ(x)‖≤ ? (n,m,D,W,E, ε).
ProofCΘ(x)is bounded because CΘ(x)is continuous onxand Θ ,and [?ε ,1+ε]nand [?E,E]nare compact. ? (n,m,D,W,E, ε)can be derived from(2.7). □
Lemma 2.2For any DNN CΘ:→Rmwith width≤W,depth≤D,and‖ Θ ‖2≤E,there exist?(m,n,W,D,E, ε)and Λ (m,n,W,D,E, ε)∈R+such that(1)‖CΘ +α(x)?CΘ(x)‖2≤?(m,n,W,D,E, ε)‖α‖2;that is CΘ(x)is Lipschitz on Θ ;(2)‖CΘ(x+δ)?CΘ(x)‖2≤ Λ (m,n,W,D,E, ε)‖δ‖;that is CΘ(x)is Lipschitz onx.Thus C is Lipschitz on Θ andx.
ProofWithout loss of generality,let C be defined as in(2.7).Then

We denote the coefficient as Λ (m,n,W,D,E, ε).The lemma is proven.We can also extend this result to convolutional neural networks. □
Unless mentioned otherwise,we assume that the loss functionL(z,y)is continuous onz∈Rmfor a fixedy∈Y.The most often used loss functions have much better properties.Consider the following loss functions:the mean square error,the crossentropy loss,and the margin loss introduced by Carlini-Wagner [6],

wherez=(z1,···,zm)and1y∈Rmis the vector whosey-th entry is 1 and all other entries are 0.By Lemma 2.1,we can assume that the loss function is defined on a bounded cube:

HereB= ? (n,m,D,W,E, ε).Since Y=[m]is discrete,we need only consider the continuity ofLonzfor a fixedy.

In this section,we formulate adversarial deep learning as a Stackelberg game and prove the existence of the Stackelberg equilibria.
Consider a two-player zero-sum minmax Stackelberg game G=(SL,SF,?),where SLand SFare,respectively,the strategy spaces for the leader and the follower of the game,and?:SL×SF→R is the pay off function.

is not empty for anysl∈SL,and

Let

Theorem 3.1([39])If the strategy spaces are compact and the payofffunction is continuous,then the Stackelberg game G has a Stackelberg equilibrium,which is also a subgame perfect Nash equilibrium of game G as an extensive form game [14].
We formulate adversarial deep learning as a two-player zero-sum minmax Stackelberg game Gs,called adversarial deep learning game.We further show that the game gives the best defence for adversarial deep learning in certain senses.
The leader of the game is the Classifier,whose goal is to train a robust DNN CΘ:X→Rmin the hypothesis space H in(2.1).Without loss of generality,we assume that the parameters of C are in

for someE∈R+;that is,the strategy space for the Classifier is Sc.
The follower of the game is the Adversary,whose goal is to create the best adversary within a given attack radiusε∈R+.The strategy space for the Adversary is

where Bε={δ∈Rn:‖δ‖≤ε}is the ball with the origin point as the center andεas the radius.By considering theL∞norm,Sabecomes a metric space.
The pay off function.Given Θ ∈ScandA∈Sa,the pay off function is the expected loss

From(2.9),the composition ofLand CΘ(x+A(x))is well-defined,since‖A(x)‖≤ε.
For game Gs,γand Γ ,defined in(3.1)and(3.3),are

Lemma 3.2?s( Θ,A):Sc×Sa→R defined in(3.6)is a continuous and bounded function.
ProofIt is clear that?s( Θ,A)is continuous on Θ ,sinceLis continuous onzand CΘis continuous on Θ .Denote thatφ(x)=L(CΘ(x),y):→R for fixed Θ andy.Thenφ(x)is uniformly continuous,by Lemmas 2.2 and 2.3.Given anA0∈Saand?>0,sinceφ(x)is uniformly continuous,there existsδ>0,such that forA(x)∈Sasatisfying‖A0(x)?A(x)‖∞<δ,we have that|φ(x+A0(x))?φ(x+A(x))|

Hence?s( Θ,A)is continuous on Sa.By Lemma 2.1,?s( Θ,A)is bounded,since‖A(x)‖≤ε. □

Lemma 3.4Γsis a closed set in Sc×Sa.


In the general case,γs( Θ )defined in(3.7)may have more than one element.In this section,we will prove that ifγs( Θ )contains a uniqueelement,then Γsdefined in(3.7)is compact,which is a fact that will be used in Section 6.
Assumption A1For any Θ ∈Sc,γs( Θ )={A?( Θ )}defined in(3.7)has a uniqueelement and the loss functionLis Lipschitz.
Remark 3.7AssumptionA1is true in the generic case.By Lemma 3.3,A?∈γs( Θ )if and only ifA?(x)∈Then AssumptionA1is true if and only ifhas a unique solution.Suppose that the loss function isLcw.Thenφ(A)=L(CΘ(x+A),y)is a piecewise linear function inAand its graph over Bεis a polyhedron as illustrated in Figure 1.Then its maximum can be achieved only at the vertex of the polyhedron,or the intersection of the(n?1)-dimensional sphere‖x?x0‖=εand the one dimensional edges of the polyhedron.In the generic case,that is,when the parameters are sufficiently general(refer to Assumption 3.1 in [46]for more details),there exists only one maximum.
We first introduce three notations that will be used in this section.By Lemma 2.3,L(z,y)is Lipschitz forzover [?B,B]mwhen the loss functions in(2.8)are used,and we let Ψ be the Lipschitz constant.By Lemma 2.2,CΘ(x)is Lipschitz for Θ andx,and we let?and Λ ,respectively,be the Lipschitz constants.
Lemma 3.8For any CΘ:and D,?s( Θ,A)defined in(3.6)is Lipschitz on Θ andAwhen the loss function is Lipschitz.
ProofBy Lemma 2.2,CΘ(x)is Lipschitz on Θ andx.Then as a composition of two Lipschitz functions,?s( Θ,A)is also Lipschitz. □

Lemma 3.10Under AssumptionA1,for any Θ ∈Sc,A?( Θ )(x)is continuous onx.

Lemma 3.11Under AssumptionA1,ψ( Θ )=?s( Θ,A?( Θ )):Sc→R is continuous on Θ .


which means thatA?( Θ )is continuous on Θ . □

The adversarial accuracy of a DNN C with respect to an attack radiusεis

which is the most widely used robustness measurement for DNNs.Comparing with the robustness measurement ARDin(2.6),AAD(C, ε)does not depend on the loss function.In this section,we will show that adversarial training with the Carlini-Wagner loss function will give a DNN with the optimal adversarial accuracy.
We first introduce a new adversarial deep learning game.Denote Gato be the two person zero-summinmax Stackelberg game with the Classifier as the leader,the Adversary as the follower,and

as the payoff function,where the loss function is defined as

andLcwis the Carlini-Wagner loss function as defined in(2.8).
For game Ga,γand Γ defined in(3.1)and(3.3),are

Lemma 4.1LetAa∈γa( Θ ).Then?a( Θ,Aa)=?AAD(CΘ, ε).


Remark 4.5By Theorems 3.5 and 4.4,adversarial training using the loss functionLcwgives a DNN which has the largest adversarial accuracy for all DNNs in the hypothesis space H,which answers QuestionQ1positively for the hypothesis space H.
In this section,we give the trade-off results between the robustness and the accuracy in adversarial deep learning from a game theoretical viewpoint.
By Remarks 3.6 and 4.5,adversarial training computes the DNNs with the best robustness measurement.A natural question is whether we can increase the accuracy of the DNN and still keep the maximal adversarial accuracy.That is,we consider the bi-level optimization problem

where?0and?sare defined in(2.2)and(3.6),respectively.
From Remark 3.7,if using the loss functionLcw,γs( Θ )contains a unique solution,andis unique in the generic case.In this case,we cannot increase the accuracy of the DNN when keeping the maximal robust measure ARD.
It is more interesting to consider game Gaas defined in Section 4,which uses the loss functionLAdefined in(4.3).
We first introduce an assumption.We train CΘwith a stochastic gradient descent method starting from a randomly chosen initial point,and thustheoptimization iteration most probably will terminate at a random point in the neighborhood of a minimal point or a saddle point of the loss function.Therefore,the following assumption is valid for almost all trained DNNs [46]:
Assumption A2The parameters of a trained CΘare random values.


Remark 5.2By Proposition 5.1,in(5.1)takes values in aK-dimensional open set,when the payoff function is?a( Θ,A)as defined in(4.2).As a consequence,there exists room to increase the accuracy under maximal adversarial accuracy.
Example 5.3We use numerical experiments to show that it is possible to further increase the accuracy under maximal adversarial accuracy.Two small CNNs with,respectively,3 and 4 hidden layers are used,and have structures(8?3?3),(16?3?3),(32?3?3)and(32?3?3),(64?3?3),(128?3?3),(128?3?3).We use loss functionLcwto achieve maximal adversarial accuracy and the results are shown in the columns 1-0 and 2-0 in Table 1.We then retrain the CNNs using the normal loss function in(2.2)to increase the accuracy.In order to keep the maximal adversarial accuracy fixed,the change of the parameters is limited toi%fori=1,2,3 and the results are shown in columns 1-iand 2-i,respectively.We can see that the adversarial accuracies are barely changed(up to 0.06%and 0.02%for networks 1 and 2),but that the accuracies are evidently increased(up to 1.11%and 2.252%for networks 1 and 2).

Table 1 Increasing the accuracy(AC)under the condition of maximal adversarial accuracy(AA)for CIFRA-10.The attack radius is 8/255 and 50000 samples were used
The bi-level optimization problem(5.1)is,in general,difficult to solve,especially when keeping the maximal adversarial accuracy,as mentioned in the proof of Proposition 5.1.A natural way to train a robust and more accurate DNN is to do adversarial training with the following objective function:

Hereλ>0 is a small hyper parameter,and?0and?sare defined as in(2.2)and(3.6),respectively.Problem(5.2)is also often used as an approximate way to solve(5.1).We will prove a trade-off result in this setting.
Similarly to Theorem 3.5,adversarial training with the loss function(5.2)can be considered as a Stackel berg game Gtwith?tas the payofffunction.Then we have the following trade-off result:

In this section,we compare three types of games for adversarial deep learning.We assume that the dataT=are a finite number of samples choseniid from the distribution D.
In this case,the strategy space for the Classifier is still Scin(3.4).The strategy space for the Adversary becomes

We consider three games for adversarial deep learning.
The adversarial training gameG1,which is the zero-summinmax Stackelberg game with the Classifier as the leader,the Adversary as the follower,and?T( Θ,A)as the payoff function,that is,we aim to solve the minmax problem

which is clearly equivalent to the adversarial training.By Theorem 3.1,game G1has a Stackelberg equilibriumsince Scand Saare compact and?T( Θ,A)is continuous.Similarly to Section 4,it can be shown that this game gives a DNN with the largest adversarial accuracy for the data setTwhen the loss function isLcw.
The universal adversary gameG2,which is the zero-sum maxmin Stackelberg game with the Adversary as the leader and the Classifier as the follower;that is,we aim to solve the maxmin problem

The simultaneous ad versary gameG3.We can also formulate the adversarial deep learning as a simultaneous game G3.In this game,the two players and their strategy spaces are the same as those of game G1.The difference lies in how to play the game.In game G3,the Classifier picks its action without knowing the action of the Adversary,and the Adversary chooses the attacking adversarial samples without knowing the action of the Classifier.However,both players know the strategy spaces and the payoff function.A point∈Sc×Sais called a pure strategy Nash equilibrium of game G3if

In general,pure strategy Nash equilibria do not necessarily exist,and mixed strategy Nash equilibria are usually considered.Mixed strategies for the Classifier and the Adversary are two probability distributions

Remark 6.1By Proposition 3.13,we can show that,under AssumptionA1,gameG3has a mixed strategy when the data set satisfies a general distribution D.


The following example shows that the inequalities in Proposition 6.2 could be strict:
Examp le 6.3Consider a two-player zero-summinmax game with payoff matrix

Acta Mathematica Scientia(English Series)2022年6期