,,,,,,,
1.School of Electronic and Information Engineering,Shenyang Aerospace University,Shenyang 110136,P.R.China;2.China Academy of Civil Aviation Science and Technology,Beijing 100028,P.R.China;3.College of Robotics,Beijing Union University,Beijing 100101,P.R.China;4.China Transportation Telecommunications and Information Center,Beijing 100011,P.R.China(Received 28 May 2020;revised 30 June 2020;accepted 25 July 2020)
Abstract: Due to the influence of terrain structure,meteorological conditions and various factors,there are anomalous data in automatic dependent surveillance-broadcast(ADS-B)message. The ADS-B equipment can be used for positioning of general aviation aircraft. Aim to acquire the accurate position information of aircraft and detect anomaly data,the ADS-B anomaly data detection model based on deep learning and difference of Gaussian(DoG)approach is proposed. First,according to the characteristic of ADS-B data,the ADS-B position data are transformed into the coordinate system. And the origin of the coordinate system is set up as the take-off point. Then,based on the kinematic principle,the ADS-B anomaly data can be removed. Moreover,the details of the ADS-B position data can be got by the DoG approach. Finally,the long short-term memory(LSTM)neural network is used to optimize the recurrent neural network(RNN)with severe gradient reduction for processing ADS-B data. The position data of ADS-B are reconstructed by the sequence to sequence(seq2seq)model which is composed of LSTM neural network,and the reconstruction error is used to detect the anomalous data. Based on the real flight data of general aviation aircraft,the simulation results show that the anomaly data can be detected effectively by the proposed method of reconstructing ADS-B data with the seq2seq model,and its running time is reduced. Compared with the RNN,the accuracy of anomaly detection is increased by 2.7%. The performance of the proposed model is better than that of the traditional anomaly detection models.
Key words:general aviation aircraft; automatic dependent surveillance-broadcast (ADS-B); anomaly data detection;deep learning;difference of Gaussian(DoG);long short-term memory(LSTM)
The automatic dependent surveillance-broadcast(ADS-B)data which reflect the flight state and operation state of the aircraft are a series of flight parameters. The state of the aircraft and the operational behavior of pilots were recorded by the parameters of the aircraft[1]. For the next generation air traffic surveillance,ADS-B is becoming the primary method to obtain more accurate data with wide coverage,which establishes the foundation for automatic and intelligent air traffic management system[2].Many countries in the Asian-Pacific region have started to test and evaluate whether ADS-B can provide radar-like services,and American and Australian have planned and deployed ADS-B stations in their countries[3]. Track refers to the flight path of an aircraft. There are inevitable errors or interference in the data transmission process,resulting in inaccurate or incorrect information. For guaranteeing the airlines security,it is the primary requirement to ensure flight safety[4]. In order to ensure the sustainable development and flight safety of civil aviation,it is necessary to predict and correct the anomaly data by means of methods.
Anomaly detection refers to finding data patterns in the data that contradict with the expected behaviors. In time series,there is often a context dependent relationship(time or space)among adjacent data points. Point anomalies due to the breakdown of dependencies can also be called context exceptions[5]. According to the anomalies,a novel method of identifying instantaneous anomalies for retrospective safety analysis using energy-based metrics is proposed[6]. The anomalies were detected by the constraints which were based on constraint from the operation database. But this model is sensitive to the operation data instead of position data. A novel partition-and-detect framework is proposed for the trajectory outlier detection method,which partitions a trajectory into a set of line segments. The outlying line segments were detected for trajectory outliers based on constraints from trajectory database[7].
In recent years,a number of anomaly detection methods based on deep learning have been proposed in various fields. Deep learning methods can automatically learn task-related features through end-toend training,and obtain high-level abstract representation of raw data through multilayer nonlinear transformation.
In order to detect the anomalies,the intelligence methods were used for detection,like neural networks[8]and sequence to sequence(seq2seq)[9].A deep learning model based on seq2seq was proposed[10]. In this model,the seq2seq model was used to reconstruct the ADS-B time series based on the data characteristics,and the reconstruction error can detect the anomalous ADS-B messages. Extending the feature space of time series enables the model to better capture the improver’s effect of anomaly detection[11]. The concept of time series anomaly was proposed by Keogh et al.[12],which indicates that the anomaly of time series is independent of the size of the sliding window,and the extension of the characteristic of low-dimensional data to high-dimensional data is not necessarily valid. So,in order to get the detail information of samples,the difference of Gaussian(DoG)approach was proposed[13]. And deep learning is used to train the classifier to distinguish abnormal trajectories. But they all neglected to study the direction of the trajectory point. Other traditional types of detecting anomalies were based on the detecting abnormal compact position report(CPR)protocol,but they were hard to be implemented[14-16].
The research showed that the method based on deep learning had completely surpassed the traditional method[17-18]. Above studies reflect that anomaly detection has attracted growing attention from researchers.
The contribution of this paper is to propose an approach for anomalies based on deep learning and DoG approach. And this paper is organized as follows. A new method is presented for converting coordinate system with less computation and constrains for outliers in Section 1. In Section 2,DoG is used for enhancing the detail of ADS-B data.Seq2seq model and ADS-B anomaly data detection are proposed in Section 3. In the other sections of the paper,the problem is formulated,the method is described and the experiment validations are presented.
In order to get the tracks of the common aero vehicle,the ADS-B designed for common aero vehicle was used. First of all,the positions of the aircraft were positioned by BeiDou Navigation Satellite System(BDS)or Global Positioning System(GPS). Then,the tracks would be consisted by the points. Finally,the data of flights would be displayed on terminal and saved in database(Fig.1). In order to save the accurate position data with less anomalies,the data were processed by this model on the terminal.
Because of the terrain structure,meteorological conditions and various factors,there are anomalies in ADS-B message. Tracks selected for processing should not preferably include tracks with the following exceptions.
(1)Incomplete points on track. Because of terrain and other reasons,message might be lost during the transmission processes. The points or a section of track will be incomplete. Some incomplete tracks with a spot of missing points can be completed with the interpolation method,but other incomplete tracks with many missing points must be deleted,which contain massive loss of points. An incomplete track is a whole track with too many missing points. The removal of a whole track does not affect the continuity of other tracks.
(2)Repeated points on track. Because of the error of clock or multi-path,the track data were received twice or more at the same time. In addition,the same points of track might be received at the different time due to the lack of data inspection. So,the less the repeated points,the better the tracks.And the tracks with less repeated points were chosen.
(3) Interference points on track. Different points on incomplete tracks of different flights might be received at the same time due to the similarity.Therefore,it is necessary for us to separate the independent tracks.

Fig.1 Transmission of ADS-B data
The received ADS-B position data were written in World Geodetic System-84(WGS-84). Because of the large computation in WGS-84 and Cartesian Coordinate System(CCS),the position can be converted into a new coordinate system where the origin was defined by springboard as follows

wheresis the plane distance ofAandB,Athe aircraft position andBthe take-off point position;ameans the difference-value of the latitude betweenAandB;bis the difference-value of the longitude betweenAandB;lat1is the latitude ofAandlat2the latitude ofB. The radius of the earth is 6 387.137 km. When the distance is calculated on theX-axis,b=0. The positive direction onX-axis is east. When the distance is calculated on theY-axis,a=0. The positive direction onY-axis is south.Whena<0,the value onX-axis is negative. Whenb>0,the value onX-axis is negative in Northern Hemisphere. The value onZ-axis is difference-value of altitude betweenAandB. In this model,the value ofBis minuend. And spatial distance ofAandBis

where Δhis the difference-value in height betweenAandB.
The constraints were calculated from the datasheet of the general aviation aircraft named RX1E,shown in Table 1.

Table 1 Datasheet of RX1E

From Table 1,the constraintsC(x) were gotten as follows wherevi(i=0,1,2,…,n) are the velocities of the aircraft,Hi(i=0,1,2,…,n) the altitude of the aircraft,andti(i=0,1,2,…,n)the time when the data were received.
The acceleration produced by gravity was decomposed to three directions includingX-axis,Yaxis,Z-axis,shown as

whereGis a matrix,θrepresents the angle between the direction of the flight speed and the positive direction of theX-axis,andφthe angle between the direction of the flight speed and the positive direction ofY-axis. According to the Newton’s second law and the instantaneous,accelerations of gravity were calculated as

whereAis a matrix,and the maximum acceleration(ak)of the aircraft has been obtained according to the data sheet of the aircraft. Thus,the theoretical maximum values of accelerations including(aXX,aYY,aZZ)were calculated as

whereA′ is a matrix. According to the velocity of the aircraft at the moment ofi+1,the actual accelerations ofaiX,aiY,aiZin three directions are less than the theoretical maximum,that is

whereA″ is a matrix,aiX,aiY,aiZare the accelerations at time ofi+1(i=0,1,2,…,n),and Δtis the time difference between time ofi+1 andi. The actual accelerations in theX-axis,Y-axis,andZ-axis were calculated according to the flight data. And the constraints were calculated at time ofi+1 as

wherei=0,1,2,…,n. IfaiX,aiY,aiZconflict with the constraints,the velocity is the abnormal data at time ofi+1.
After removing the outlier in the flight data by using the constraint condition of acceleration,there may also be a continuous sequence of anomaly in the flight data that cannot be detected. In order to exclude outliers from these data,the aircraft data sheet and kinematics were used for detecting anomaly by calculating the theoretical position of the aircraft

whereXi+1,Yi+1,Zi+1are the theoretical positions of the aircraft on theX-axis,Y-axis,andZ-axis,such as

wherei=1,2,3,…,n,xi+1is the actual position of the aircraft at the time ofi+1. If the above constraints are not met,the positions of the aircraft on theX-axis,Y-axis andZ-axis are abnormal data.
There are two dimensions in ADS-B data:time series and position series. When ADS-B data are arranged in two dimensions on a plane,it is different from the data arranged in one dimension on a line. In order to provide more information and more variation,the section will be used for exacting the position series of the ADS-B position data.
The setT={T1,T2,…,Tn} was composed by consecutive time series which include the points of tracks from 0 ton,andnis the length of the time series.Tiis a vector where the length of sequence isn,shown as

There is only one feature for each dimension.The contents for each vector,including position,course angle,height,etc.,were broadcasted by ADS-B,shown as

wherex1,y1,z1,v1,a1are the positions onX-axis,Y-axis,Z-axis,velocity and angle of aircraft on time oft1.
In order to extract the position sequence fromTi,multiple cross sections are extracted from the position sequence,and sample observations are selected on these cross sections to form the sample data.And each section starts at the same time.

whereX,Y,Z,V,Aare the position sections. In order to make the algorithm invariant to data scaling and reduce the influence of data changes on series,the method of establishing data Gaussian space to form data pyramid structure will be used.
Data pyramid is a series of data from the same original section which is arranged in pyramid shape and gradually reduces resolution. It is the basis of multi-scale calculation based on the gradient data of the original section,the up-sampled data and the down-sampled data adjacent to the gradient data,shown in Fig.2. The down-sampled data are 1/4 of the original data,and the up-sampled data are 4 times of the original data.
The data details play an irreplaceable role in recognition of anomaly data. Anomalies are usually reflected in the details of the data,which are the result of the accumulation of local details. The local details of the section will be calculated by DoG. As an enhancement algorithm,DOG can be used to increase the visibility of edges and other details.
The level of detail for position sections will be reduced according to the function of Gaussian smoothing. Different scales of Gaussian smoothing have different smoothing effects on position sections. Each layer of data pyramid is smoothed data pyramid,forming three-layer Gaussian smoothing position sections.

whereGi=exp [(x2+y2)/2σi],?denotes the convolution operation,Xi(n,m) is the section ofX,nthe time,andmthe value of dimension.
As shown in Fig.3,L0is taken as input for Gaussian smoothing. The data are formed four layers with the three-layer smooth data obtained after the data processing of this layer,and the DoG data are obtained by subtracting from the adjacent filtered data.


Fig.2 Block diagram of pyramid data

Fig.3 Block diagram of DoG
The pyramid data of each layer produce threelayer DoG data. The data obtained by Gaussian difference are a local detail description of the data at different scales. The detail description of the data can improve the expression ability of the position data section to the details. Pyramid data on each layer are superimposed with three-layer DoG data of this layer section. To prevent overflow,the average value of data superposition is taken to form the detail superposition data,as shown in Fig.4. The value after superposition isL,shown as

After enhancing the details of the position section data,the details of the information are more abundant,and the extracted local features are more representative.

Fig.4 Block diagram of superposition
The seq2seq model,which was used for conversion of sequence,is a model with the frame of Encoder-Decoder and often used in automatic abstracting,machine translation,deep learning and so forth. Encoder,decoder and state vector were included by seq2seq. State vector is a connection between encode and decoder. Recurrent neural network(RNN)is a unit used by encoder and decoder[19].
The sequence of input will be learned by encoder and then be encoded into a fixed size state vector,which is then transmitted to the decoder part. Finally,the decoder learns the state vector to carry out the relevant output.
Because of the advantage of short-term memory for RNN,it is often used as the first choice neural network for the training of time sequence data.With the length of training data for RNN increasing,the gradients for training data will fade away.The gradients will disappear and the process of training will be broken while the length of training data is long enough. So,the limit time interval is essential for training data when RNN is used. The length of the position data which is broadcasted by ADS-B in flights is so long that the feature can not be obtained by RNN. In order to resolve this problem,the improved long short-term memory (LSTM) units were used in seq2seq,which have excellent effects on many problems and now have been widely used,shown in Fig.5. The LSTM model can replace the RNN cells in the hidden layer with LSTM cells,so as to provide the hidden layer with long-term memory[10]. The long-term dependency could be avoided effectively by the particular design of LSTM. It is easy for LSTM to acquire the feature of the time sequences with long interval.

Fig.5 Chart for LSTM struction
In Fig.5,each line represents the transfer of the output of one node to the input of another node of vectors. Circles represent element-level operations of vectors,such as addition or multiplication.The boxes represent the layers of the neural network. A line merge represents a join of vectors,and a line bifurcation represents a copy of a vector.xiis the input,andhithe output.
The forgetting gateft=σ(Ws·[ht-1,xt]+b),whereWsis the weight matrix of the forgetting gate,σthe function of sigmoid,bthe bias term of forgetting gate,htthe output at time oft,xtthe input at time oft,anditandare all apart of input of gate,shown as

where tanh is the hyperbolic function,bcthe weight matrix ofandbithe weight matrix ofit.The unit statectat the current time is the product of the input of the forgetting gate and the state at the previousct-1time adds the product of the two parts of the input gate(itand),shown as

Finally,the sigmoid function is used to calculate information that needs to be outputot,and then the output is obtained by multiplying the value which was calculated by hyperbolic function for current cell state.

So,there are three gates:forgetting gate,input gate and output gate. In this model,LSTM was used. Core model consists of three parts for LSTM units of encoder and decoder,as well as the state vector(eocoder state),respectively. In the training process,the training data are input to the encoder and encoded to a fixed state vector. Then,the vector is transferred to the decoder,and its sequence is reconstructed by the decoder.
The reconstruction error between the realistic time sequence and the predicted time sequence will be detected by using Cosine Similarity,that is

wherenis the dimension,andthe reconstruction sequence. So the abnormal-probabilitypacan be de calculated as

There are five main steps in this improved model(Fig.6).
Step 1Data(latitude,longitude,height)of ADS-B in WGS-84 were converted to the data in a new coordinate,the origin of which is defined by the springboard.
Step 2The constraint was created by the datasheet of the flight and the kinematical equation.And the data which meet the constraints will be selected by the comparison between constraint and data.The constraints and the outliers will be detected.
Step 3Extract position sections of the ADSB data. And DoG will be used for detail enhancement of section.
Step 4Parameters in seq2seq model were initialized. The original time series length is 12,the loss function used in the training is mean square error(MSE),the number of LMST unit is 256 with the batch-size of 128,and the ratio of dropout is 25%. The seq2seq model will be used for predicting the data. And the error between the realistic time sequence and the predicted time sequence will be detected by using this model.
Step 5The abnormal data will be detected by Cosine similarity. After training,the threshold of the training sets is calculated. For actual flight data,the value of the anomalous score is defined with a threshold of 97.5%.

Fig.6 Main steps in this improved model
The data come from the ADS-B which is broadcasted by common aero vehicle named RX1E.Some anomalies like outliers were produced while the data broadcasting. Outliers were detected by constraints.
Some anomalies were deliberately injected such as network delay,route replacement,random noise and velocity drift,and almost all of them will be detected by seq2seq. Table 2 shows the strategies for the following injecting anomalies.

Table 2 Methods for injecting anomalies
(1)Network delay
In the selected ADS-B sequence,for every 10 consecutive messages,only the first message is retained and the last nine messages are deleted. This situation simulates the reality that the receiver can only receive valid message if the time interval is large enough.
(2)Route replacement
Given a certain route massage,another correct route massage is injected to replace the selected ADS-B sequence.
(3)Random noise
Part of the original ADS-B data are mulitiplied by a number between 0 and 2 randomly. Seven trajectory points of random noise will be injected per 180 track points.
(4)Velocity drift
The speed characteristic information contained in ADS-B message is gradually changed as a multiple of five sections. Specifically,in the selected ADS-B sequence, the speed characteristic contained in the first vector is increased by five sections,and the second by 10 sections.
In the course of the experiment,after the operations of normalization,the complete data of 10 flights from take-off to landing for seven consecutive days were selected as training samples,and the data of each flight were between 1 500 and 2 000. The data of flight for 20 min and 180 points in track were used for test set. And the test sets were taken from the flight records of Ruixiang RX1E aircraft in Faku General Aviation Airport of Shenyang at 10 o’clock on November 13,2019.Its track is shown in Fig.7.

Fig.7 Flight track of Ruixiang RX1E aircraft
After the operations for outliers,the model of seq2seq was used for anomalies in altitude. Fig.8 is the detection result used for the anomalies in the data of altitude. The anomaly type is the random noise. At the points 24,59,65,78,90,101 and 114,random noise is injected. Due to the use of DoG approach,the details of the data are obvious.The threshold value of abnormal-probability for anomalies is 2.5%,and the time dependence of the data features is more easily to detect. At point 24,an anomaly just above the threshold was detected.Thus,this model is sensitive to anomalies. Another six anomalous points were detected by the seq2seq model.

Fig.8 Anomaly processed by seq2seq for altitude
Similarly,Fig.9 is the detection result used for the anomalies in the data onX-axis. The network delay was injected on data ofX-axis. At the points 11—16,42,58,62,94,104,11 and 128—145,anomalies of network delay are injected.The anomalies of the network delay are injected,resulting in the increase of the abnormal-probability of anomalies,thus the anomalies of the abnormal network delay are detected. As shown in Fig.9,the threshold value of abnormal-probability for anomaly is 3%.All injected anomalies were detected by the seq2seq model.

Fig.9 Anomaly processed by seq2seq for X-axis
After the operation for outliers,the model of seq2seq is used for abnormal ofY-axis. The detection result is shown in Fig.10. At the points 9,11,14,31,59,119,165 and 175,another route replaces the trajectory of the sample. As shown in Fig.10,the threshold value of abnormal scores is 3%,and the points above the threshold value are anomalies. The anomalies are detected by the seq2seq model.
Fig.11 is the detection result used for the outliers and anomalies in the data of track. As shown in Fig.11,the dotted line is the outliers detected by the constraint conditions. The outliers are also identified by the seq2seq model. The injected type is velocity drift. According to the threshold value of abnormal-probability,at the points 51,70 and 159,the anomalies were detected by seq2seq model instead of constraints. Thus,the seq2seq model is more sensitive than constraints.

Fig.10 Anomaly processed by seq2seq for Y-axis

Fig.11 Anomaly processed by seq2seq for track
4.3.1 Analysis of running time
Running time is the time of converting ADS-B data to the corresponding coordinate system. Table 3 shows the running time in different coordinate systems. Since WGS-84 is the coordinate system with the center of the earth as the origin,the data value is very large,which makes the running time longest. The transition from the object coordinate system to the inertial coordinate system requires rotation,and the transition from the inertial coordinate system to the world coordinate system requires translation,thus resulting in a longer running time than the coordinate system in this model. The coordinate system in this model reduces the redundant bits in ADS-B data,and the calculation formula is simple,so the running time is decreased 0.14 s. Reducing the complexity of computation can reduce the running time and make the performance of realtime better,which is beneficial to the real-time processing.

Table 3 Running time in different coordinate systems s
4.3.2 Analysis of average precision
Precision indicates the number of correctly classified samples to the total number of samples for a given data set,shown as

where TP is the true positive,FN the false negative,FP the false positive,and TN the true negative. In this paper,precision refers to the proportion of the correctly detected abnormal ADS-B samples to the actual detected abnormal samples. Table 4 shows the average precision in different neural networks.

Table 4 Average precision in different neural networks %
As shown in Table 4,due to the gradient disappears severely,accuracy of RNN as the unit of seq2seq model is low,especially for the abnormal pattern of network delay when other conditions remain unchanged. In order to remember the longterm state,LSTM adds one input and one output to the RNN,which is a cell state,thus LSTM is more sensitive to ADS-B data which is a long time sequence. In Table 4,the average precision of LSTM is better than that of RNN,and for network delay,the average precision of LSTM is 3.1% higher than RNN. For the route replacement,random noise and velocity drift,the performance of LSTM is better than that of RNN due to disadvantages of gradient disappearing. Compared with RNN,the average precision has increased 2.7% for LSTM.
Aim to analyze the average precision in different models,the Bayes,SVM and 4D-flight track prediction are used for comparison. After converting the data to the coordinate system of this model,the same training and test sets are selected to train and test the SVM model. These models use the known track characteristics to predict the track points at future moments. Table 5 shows the average precision in different models.

Table 5 Average precision in different models %
(1)SVM
The kernel function of SVM is radial kernel function(RKF). The parameters of the model were optimized by using the grid search method. The kernel width parameterσ=0.02 and the regularization parameterγ=100 are determined.
(2)Bayes
The 180 track points constitute a set of observation data. Distribution parameters are calculated from training sets. The posterior probability can be calculated by Bayesian formula using prior probability and likelihood function.
(3)4D-flight track prediction
The maximum likelihood law and Newton-Raphson iterative algorithm are used to identify the corrected airspeed in the model. Finally,the identification results and equiangular track prediction model are used to calculate the aircraft’s track. The parameter identification is calculated by using maximum likelihood estimation. After 40 iterations of the calculation,the parameter identification is determined to be 221.6.
Bayes and SVM are prediction models based on classification and have strict requirements on training set samples. Therefore,when the actual flight samples are used as training sets,the accuracy is lower than 4D-flight track prediction. In the process,the model can effectively capture the characteristics of a track,so its accuracy is higher than that of Bayes and SVM models. The seq2seq model in which neural network is composed of LSTM units has a good abnormal detection effect for ADS-B time series. And it takes into account the target change information in the adjacent time of the current point,so the detection performance is high and the accuracy can be improved by locking the target into a sequence. When the data are processed by network delay,route replacement,random noise,and velocity drift,there is a better performance in this model than in Bayes,SVM,and 4D-flight track prediction models.
The ADS-B anomaly data detection model based on deep learning and DoG approach is proposed. In order to reduce the complexity of this model,the kinematics theory is used for removing the outliers. The reconstruction error which is produced by seq2seq is used to detect the anomaly data.Through the simulation and verification,compared with SVM,Bayes,4D-track models,the average precision is improved,and the running time is decreased.
Transactions of Nanjing University of Aeronautics and Astronautics2020年4期