,
College of Computer Science and Technology/College of Artificial Intelligence,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,P.R.China
(Received 15 May 2020;revised 15 June 2020;accepted 20 July 2020)
Abstract: Focusing on controlling the press-assembly quality of high-precision servo mechanism,an intelligent early warning method based on outlier data detection and linear regression is proposed. Linear regression is used to deal with the relationship between assembly quality and press-assembly process,then the mathematical model of displacement-force in press-assembly process is established and a qualified press-assembly force range is defined for assembly quality control. To preprocess the raw dataset of displacement-force in the press-assembly process,an improved local outlier factor based on area density and P weight( LAOPW)is designed to eliminate the outliers which will result in inaccuracy of the mathematical model. A weighted distance based on information entropy is used to measure distance,and the reachable distance is replaced with P weight. Experiments show that the detection efficiency of the algorithm is improved by 5.6 ms compared with the traditional local outlier factor(LOF)algorithm,and the detection accuracy is improved by about 2% compared with the local outlier factor based on area density(LAOF)algorithm. The application of LAOPW algorithm and the linear regression model shows that it can effectively carry out intelligent early warning of press-assembly quality of high precision servo mechanism.
Key words:quality early warning;outlier data detection;linear regression;local outlier factor based on area density and P weight(LAOPW);information entropy;P weight
High-precision servo mechanism is widely used in intelligent machinery,which requires high quality and reliability. However the structure of high-precision servo mechanism is very complex and the assembly process is extremely complicated,which results in difficulties in assembly quality control. In this paper,the technologies of outlier data mining and linear regression are applied to quality control for the press-assembly process of high-precision servo mechanism,in which the raw dataset of displacement-force can be collected. An outlier data detection method is designed to preprocess the raw data of displacement-force in the process,and linear regression is used to figure out the relationship between assembly quality and press-assembly process.A displacement-force mathematical model is established and a qualified press-assembly force range is defined to monitor the assembly quality. Then an intelligent early warning method of press-assembly quality is proposed.
Quality control is the process of organizing related activities in accordance with quality requirements. Many scholars have conducted relevant studies on the quality control problems in the manufacturing process. In the field of quality control on intelligent manufacturing,the technology of dynamic monitoring and quality early warning for production machining workshop based on Android mobile terminal has been studied by Yin et al[1]. In addition,the multi-point real-time intelligent neural network prediction model and algorithm were researched to achieve dynamic monitoring which finally realized early warning[1]. BP neural network was used to establish an early warning model for abnormal events in aircraft assembly workshop and classified early warning was realized in Ref.[2]. Wu et al.[3]raised a quality control method for the assembly process of complex products based on digital twin. The Markov method was applied to predict the quality data and provide early warning based on predicted value[3].
The studies above focus on the existing quality control methods and the outlier data of quality control were not considered,which will result in the inaccuracy of quality control. Meanwhile,local outlier factor(LOF)is an effective outlier detection algorithm. It can be used to judge the outlier of an object according to the LOF measurement. Therefore,the idea of improving the LOF algorithm as a preprocessing for quality warning is considered in this paper,and then a method of outlier data detection and quality early warning according to local outlier factor based on area density andPweight(LAOPW)is designed and normal distribution is proposed to find out a more reasonable quality data control range of high-precision servo mechanism.
Outliers are also named as anomalies,novelties,deviations and exceptions[4-6]. In general,outlier data mining is a process of finding an effective method to mine data objects that meet the definition of outliers. It is widely used in medical insurance detection[7],credit card fraud detection[8],abnormal weather detection[9],etc. The concept of LOF,in which relative density is used to measure the degree of outliers of data objects,was first proposed by Breunig et al[10]. Fast outlier detection based on local density score (FLDS) was put forward in Ref.[11],in whichk-nearest neighbors were used.Thek-means algorithm was designed to segment the dataset to find out singular points,by which detection efficiency was improved. Through this method,the time complexity was reduced fromO(n2)toO(n1.5),and the calculation time is about 20 times faster than LOF. Data field theory and the concept of an average potential difference were applied to improve the detection accuracy of outlier detection algorithms in Ref.[12]. Detection quality was improved by enhancing the density-based spatial clustering of application with noise(DBSCAN)[13]clustering algorithm and local outlier factor based on area density(LAOF)to determine the outlier degree in Ref.[14].
In this paper,linear regression is used to deal with the relationship between the press-assembly quality and process of high-precision servo mechanism,then a displacement-force mathematical model of the press-assembly process is established and a qualified press-assembly force range is defined for early warning of assembly quality control. However,the analysis of linear regression will be seriously affected by outliers. Therefore,before establishing a regression model,the outlier data detection method for preprocessing should be applied to eliminate outliers so that a more reasonable quality control range is defined.
Information entropy[15-16]is used to measure the uncertainty of random variables. The greater the amount of information is,the smaller the uncertainty and the entropy are. Otherwise,the smaller the amount of information is,the greater the uncertainty and the entropy are. Therefore,the outlier degree of a certain data object can be evaluated by the entropy value. Lets(x)be the set of random variablex,andp(x)represents the probability,then the information entropyH(x)will be defined as

LetA={A1,A2,…,An}be the attribute set of data object,Ai(i=1,2,…,n)dividesAinto{Ai}andA-{Ai}which denoted asP1={Ai},P2={A1,A2,…,Ai-1,Ai+1,…,An}andP={P1,P2}. Then the calculation formula of the increment of information entropyΔ(Ai)[14]is shown in Eq.(2),that is

whereΔ(Ai) represents the information entropy change of setAafter removingAi. The largerΔ(Ai)is,the more the uncertainty of the dataset is reduced.
In order to enhance the effect of outlier attributes in distance measurement,attribute-weighted distance is used. Given two data objectsp={pk|k∈[1,n]}andq={qk|k∈[1,n]},wherenis the number of attributes andkis the index of attributes,then the weighted distance between them is

In traditional LOF algorithm,there are some definitions as follows.
(1)Kdistance. Thekdistancedk(p)of objectprefers to the distance betweenpand the object which is thekth nearest to it.
(2)Kdistance neighborhood. Thekdistance neighborhood of objectpis a set of all objects whose distance between itself andpis less than or equal toNk(p).It can be expressed as

(3)Reachable distance. The reachable distance ofprelative toqis defined as

The calculation formulas of local reachable density and LOF are


It can be noted that the sparseness of different data objects is not considered in LOF algorithm.Therefore,Pweight of the data object is used as reachable distance in the LAOPW algorithm,and set as the area radius to obtain the regional area instead of the distance sum. The maintenance of algorithm running efficiency and the improvement of detection effect are gained as a result of the redefinition of local density and local outliers.
(4)Pweight. ThePweightWk(p)of objectpequals to the reachable distancek(p,q),which is the sum of the distances betweenpand itskneighborhoods.It is defined as

(5)Local density. For a circle,let the data objectpbe the center andk(p,q)be the radius,then the number of data points per unit area is defined as the local density ofp,which is expressed as

(6)Local outlier factor. The local outlier factor LAOPWk(p)ofpis defined as

For a certain data objectp,the smaller its LAOk(p)is,the greater the LAOPWk(p)is,and the higher the outlier ofpis,the more likely it is the outlier.
The outlier detection algorithm based on LAOPW is described in Algorithm 1 and its flow chart is presented in Fig.1.
Algorithm 1LAOPW
Input:raw datasetD,k
Output:outliers of data objects



Fig.1 Flow chart of LAOPW algorithm
The detection method of outliers based on the normal distribution is a method based on statistics.Assuming that the given dataset is accordant with the normal distribution,and the data objects inconsistent with the model are identified as outlier data.If an attribute of a normal object is in accordance with the normal distributionN(μ,σ2)(whereμandσare the mean and standard deviation,respectively),it can be converted to the standard normal distributionN(0,1)by transformingz=(x- μ)/σ,whereμandσare unknown and can be estimated by the sample mean and standard deviation[17].
It can be seen from the law of large numbers that the normal distribution can be used to approximate other distributions when there are many samples. As shown in Fig.2,this theory can be applied to quality control. The middle solid black lineμis the predicted value of the observed value.μ±2σcorresponds to the upper and lower warning lines,andμ±3σstands for the upper and lower control lines.If the distance between a sample and its meanμexceeds 3σ,this value is identified as an outlier[17].

Fig.2 Schematic diagram of quality control
For a samplex,if there existsμ-2σ<x<μ+2σ,it means that the measurement process is under control and the production process is effective;Ifxmeets the condition(μ+2σ≤x<μ+3σ)||(μ-3σ≤x<μ-2σ),it indicates that the quality is starting to deteriorate and tending to be“out of control”,so a necessary inspection should be carried out. If(x≥μ+3σ)||(x≤μ-3σ),it stands for“out of control”of the production process,the samplexis invalid or the product assembled is scrapped,therefore it should be checked and corrected immediately[17]. In this way,quality early warning for the intelligent press-assembly process can be realized.
To verify the outlier detection performance of LAOPW algorithm which is used to preprocess press-assembly data,two UCI datasets are used to compare and analyze several algorithms from multiple perspectives,they are the LOF algorithm,the LAOF algorithm proposed in Ref.[14] and the LAOPW algorithm. All algorithms are implemented in Matlab with the experimental environment of Win10,and the processor is Intel(R)Core(TM)i5-8400 @ 2.80 GHz 2.81GHz.
In the problem of outlier data detection,high detection effect is our pursuit and the detection performance of outlier data mining methods can be described by the confusion matrix[18]shown in Table 1.

Table 1 Confusion matrix
According to the relevant parameters of the confusion matrix,several indexes for evaluating the performance of outlier data mining algorithms are introduced.
(1)Accuracy. It represents the proportion of all samples that are correctly predicted,and stands for the overall prediction accuracy of the dataset. The larger the value is,the better it is. It can be expressed as

(2)Precision. It can be understood as how many of the data points are correctly predicted among those prediction results of normal categories.It is defined as

(3)Recall. It is the rate of points that correctly predicted in all normal points.It can be written as

(4)F-score. The formula ofF-score is shown as

Among the evaluation indexes above,Accuracy is used to measure the ability to make correct choices. Precision and Recall reflect the performance of outlier detection algorithms. AndF-score is a comprehensive evaluation index of the two.
Iris dataset,which has 150 pieces of data and four attributes,is used for experiments of outlier data detection. The data objects are divided into three categories,including Setosa,Versicolour and Virginica. Twenty sample points belonging to Setosa and Versicolour are taken out as clusters and five Virginica sample points are selected as outlier data. Detection results of various algorithms are shown in Fig.3. The five black triangles in Fig.3(a)are outliers,and Figs.3(b)—(d)represent different detection results of comparison algorithms.
Based on Table 1,the confusion matrix of Iris for outlier data detection is listed in Table 2 and the histogram shown in Fig.4 is formed. It can be clearly seen from Table 2 and Fig.4 that the Accuracy,Precision andF-value of the algorithm for outlier detection in this paper are higher than those of LOF and LOAF algorithms. The Accuracy of LOF,which is 0.933,is the same as that of LOAF.Among them,the Recall values of the LAOF algorithm and the LAOPW algorithm are 0.975,which are slightly lower than that of the LOF. According to these indexes,the detection performance of the proposed algorithm is the best among these algorithms.
In order to compensate for the defect that the experimental data number in Section 3.2 is too small to fully demonstrate the effectiveness of the algorithm,Aggregation dataset is selected for comparative experiments. The data objects in this dataset are divided into seven categories,which are composed of 788 samples with a total of 2-D attributes.To carry out the experiment,600 pieces of data from four categories are extracted as cluster data,and 10 data points are selected from other three categories as outliers. The experimental results are shown in Fig.5. The detection performance of the algorithm is listed in Table 3 and Fig.6.
In Table 3,the number of outliers detected by the LOF algorithm is the smallest,only 6 and 2 of them are detected by mistake. By LAOF or the algorithm proposed,eight outliers can be detected,but the false detection rate of the latter is lower.The Accuracy of LAOPW is 0.993,which is higher than that 0.987 of LOF algorithm and LAOF algorithm. From Accuracy,Precision,Recall andFscore in Table 3 and Fig.6,the comprehensive detection performance of LAOPW is the best. This shows that the detection effect of the proposed algorithm is better than that of either the two algorithms.
In Aggregation dataset,100,200,300,400,and 500 data points are taken respectively to calculate the running time of the three methods. The results are shown in Fig.7. Among them,the LAOF algorithm has the highest operating efficiency,which is followed by the LAOPW algorithm,and the LOF algorithm is the lowest. The running time of the proposed LAOPW algorithm is slightly longer than that of the LAOF algorithm because thePweight of each object has to be calculated.

Fig.3 Experiment comparison based on Iris

Table 2 Confusion matrix for outlier data detection based on Iris

Fig.4 Evaluation index of different outlier data detection algorithms based on Iris

Fig.5 Experiment comparison based on Aggregation

Table 3 Confusion matrix for outlier data detection based on Aggregation

Fig.6 Evaluation index of different outlier data detection algorithms based on Aggregation

Fig.7 Comparison of running time among three different methods based on Aggregations
For the application of intelligent early warning method of press-assembly quality,the detection accuracy is more critical than the detection efficiency.The displacement-force raw data are collected from high-precision servo mechanism,as shown in Table 4.
The size of the displacement-force dataset is 200,and the number of attributes is 16. To eliminate the outlier data before linear regression model of displacement-force in press-assembly process is established and a qualified press-assembly force range is defined,the local outlier data detection algorithm LAOPW designed in this paper is applied to preprocess the raw dataset and eight outliers are detected,which are consistent with actual quality inspection results. In the new dataset without outliers,for each displacement values,a univariate outlier detection method based on the normal distribution is used. Relevant statistical data is shown in Table 5.
The dependent variablesμ,μ+2σ,μ-2σ,μ+3σ,andμ-3σhave an approximately linear relationship with the independent variables,so linear regression models can be established according to Table 5. Fig.8(a)represents the quality control chart from the raw displacement-force dataset,and Fig.8(b)stands for the quality control chart by removing outliers with LAOPW algorithm. Since the collected displacement-force data points are too many,the symbol“×”is just used to identify the maximum and minimum forces under different displacements.Compared Fig.8(a)with Fig.8(b),it is obvious that the latter has a smaller quality control range and covers all data points. So it can be concluded that for a displacement-force dataset,a more accurate quality control range can be defined after removing outliers by the LAOPW algorithm,which can provide more reasonable control for the high-precision servo press-assembly process.

Table 4 Displacement-force dataset of intelligent pressassembly

Table 5 Statistical data on displacement-force of intelligent press-assembly

Fig.8 Quality control chart of intelligent press-assembly process
Quality early warning can be realized by applying this control chart model to press-assembly process of high-precision servo mechanism. In such a process,s= 20 mm and the corresponding force isF. IfFis in the range between the upper warning line and lower warning line corresponding tos,it indicates that the press-assembly process works well.If it is within the two areas of the upper warning line and the upper control line,the lower warning line and the lower control line,it reveals that there is a problem with the press-assembly quality,and a necessary inspection and corresponding measures should be carried out. OnceFis beyond the range between the upper control line and the lower control line,it means that the press-assembly process is abnormal,which may results in scraps. In the way above,the quality early warning for the high-precision servo mechanism press-assembly process can be realized.
An intelligent early warning method of pressassembly quality based on outlier data detection and linear regression is presented in this paper. Firstly,an improved outlier data detection algorithm LAOPW is designed for the preprocessing of pressassembly data. The experiments indicate that the proposed LAOPW algorithm has better comprehensive detection performance than LOF and LAOF algorithms. Then,the algorithm is used to preprocess the displacement-force data in the press-assembly process and the data objects with larger outlier factors are eliminated. Finally,the outlier detection method based on the normal distribution is applied to define the quality control range of the process,which is used as standard value of early warning for press-assembly quality. It can be used to monitor press-assembly process by collecting the force corresponding to different displacements. In this way,the problems of assembly quality can be found out in time and early warning can be given,then intelligent quality control will be further realized.
Transactions of Nanjing University of Aeronautics and Astronautics2020年4期