999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Variable Importance Measure System Based on Advanced Random Forest

2021-08-26 09:37:38ShufangSongRuyangHeZhaoyinShiandWeiyaZhang

Shufang Song,Ruyang He,Zhaoyin Shi and Weiya Zhang

1School of Aeronautics,Northwestern Polytechnical University,Xi’an,710072,China

2AECC Sichuan Gas Turbine Establishment,Mianyang,621700,China

ABSTRACT The variable importance measure(VIM)can be implemented to rank or select important variables,which can effectively reduce the variable dimension and shorten the computational time.Random forest(RF)is an ensemble learning method by constructing multiple decision trees.In order to improve the prediction accuracy of random forest,advanced random forest is presented by using Kriging models as the models of leaf nodes in all the decision trees.Referring to the Mean Decrease Accuracy(MDA)index based on Out-of-Bag(OOB)data,the single variable,group variables and correlated variables importance measures are proposed to establish a complete VIM system on the basis of advanced random forest.The link of MDA and variance-based sensitivity total index is explored,and then the corresponding relationship of proposed VIM indices and variance-based global sensitivity indices are constructed,which gives a novel way to solve variance-based global sensitivity.Finally,several numerical and engineering examples are given to verify the effectiveness of proposed VIM system and the validity of the established relationship.

KEYWORDS Variable importance measure;random forest;variance-based global sensitivity;Kriging model

Nomenclature

1 Introduction

Sensitivity analysis can reflect the influence of input variables on the output response.The sensitivity analysis includes local sensitivity and global sensitivity analysis[1].The local sensitivity can respond to the influence of input variables on the characteristics of output at the nominal value.The global sensitivity analysis,known as the importance measure analysis,can estimate the influence of input variables in the whole distribution region on the characteristics of output[2–4].There are three kinds of importance measures:non-parametric measure,variance-based global sensitivity and moment-independent importance measure[1].The variance-based global sensitivity is the most widely applied measure because it is generality and holistic,and it can give the contribution of group variables and the cross influence of different variables.There are plenty of methods to calculate variance-based global sensitivity indices,such as Monte Carlo(MC)simulation[5],high dimensional model representation(HDMR)[6],state-dependent parameter(SDP)procedure[7]and so on.MC simulation can estimate the approximate exact solution of total and main sensitivity indices simultaneously,but the amount of calculation is generally large,especially for high dimensional engineering problems.HDMR and SDP can calculate the main sensitivity indices by solving all order components of input-output surrogate models.

Random forest(RF)is composed by multiple decision trees(DTs),it is an ensemble learning method proposed by Breiman[8].RF has many advantages,such as strong robustness,good tolerance to outliers and noise.RF has a wide range of application prospects,such as geographical energy[9],chemical industry[10],health insurance[11]and data science competitions.RF can not only deal with classification and regression problems but also analyze the critical measure.RF provides two kinds of importance measures:Mean Decrease Impurity(MDI)based on the Gini index and Mean Decrease Accuracy(MDA)based on Out-of-Bag(OOB)data[12].MDI index is the average reduction of Gini impurity due to a splitting variable in the decision tree across RF[13].MDI index is sensitive to variables with different scales of measurement and shows artificial inflation for variables with various categories.For correlated variables,the MDI index is related to the selection sequence of variables.Once a variable is selected,the impurity will be reduced by the first selected variable.It is difficult for the other correlated variables to reduce the same magnitude of impurity,so the importance of the other correlated variables will be decline.MDA index is the average reduction of prediction accuracy after randomly permuting OOB data[14,15].Since MDA index can measure the impact of each variable on the prediction accuracy of RF model and have no biases,it has been widely used in many scientific areas.Although there are importance measures based on RF to distinguish the important features,there is no complete importance measure system to deal with nonlinearity and correlation among variables[16,17].In addition,the similarity analysis process of MDA based on OOB data and Monte Carlo simulation of variance-based global sensitivity can be used as a breakthrough point to find their link[18].With the help of variance-based sensitivity index system,the construction of variable importance measure system based on RF can be realized.

By comparing the procedure of estimating the total sensitivity indices and the MDA index based on OOB data,a complete VIM system is established based on advanced RF by using Kriging models,including single variable,group variables and correlated variables importance measure indices.The proposed VIM system combines the advantages of random forest and Kriging model.The VIM system can indicate the contribution of input variables to output response and rank important variables,and also give a novel way to solve variance-based global sensitivity with small samples.

This paper is organized as follows:Section 2 reviews the basic concept of variance-based global sensitivity.Section 3 reviews random forest firstly,presents MDA index and then proposes single variable,group variables and correlated variables importance measures respectively.Section 4 finds the link between MDA index and total variance-based global sensitivity index,and the relationship between VIM indices and variance-based global sensitivity indices is derived.In Section 5,several numerical and engineering examples are provided before the conclusions in Section 6.

2 Variance-Based Global Sensitivity

The variance-based global sensitivity,proposed by Sobol[19],reflects the influence of input variables in the whole distribution region on the variance of model output.The variance-based global sensitivity indices not only have strong model generality,but also can discuss the importance of group variables and quantify the interaction between input variables.ANOVA(Analysis of Variance)decomposition is the basic of variance-based global sensitivity analysis.

2.1 ANOVA Decomposition

Response functionY=g(X)exists a unique ANOVA decomposition as follows:

wherenis the dimension of input variables,g0is the expectation ofg(X),g0=andfXi(xi)is the probability density function of variableXi.The components in Eq.(1)are:

2.2 Variance-Based Global Sensitivity Indices

The variance of response function can be expressed as:

Since the decomposition terms are orthogonal,the variance of the response function is the sum of variances of all individual decomposition terms:

where

Then the ratio of each variance component to variance of response function can reflect the variance contribution of each component,i.e.,Si=Vi/V,Sij=Vij/V···

Si=Vi/Vis the first order sensitivity index of variableXi(also nameSias main sensitivity index),it can reflect the influence of variableXion the responseY.Sij=Vij/Vis the second order sensitivity index,it can reflect the interaction influence of variablesXiandXjon the responseY.The total sensitivity indexcan be obtained by summing all the influence related to variableXi:

According to probability theory,the variance-based global sensitivity indices can be expressed as[20]:

whereX~iindicates variable vector withoutXi.

2.3 Simulation of Variance-Based Global Sensitivity Indices

Due to the enormous computational load,the traditional double-loop Monte Carlo simulation is not suitable for complex engineering problems[21].The computational procedures of single-loop Monte Carlo simulation are listed as follows:

Step 1:Randomly generate two sample matricesAandBbased on the probability distribution of variablesX.

Step 2:Construct sample matrixCi,where theith column ofCicomes from theith column ofA,and the other columns come from the corresponding columns ofB.

Step 3:The main and total sensitivity indices can be expressed as follows:

3 Variable Importance Measure System Based on Random Forest

RF is an ensemble statistical learning method to deal with classification and regression problems[22].Bootstrap sampling technique is firstly carried out to extract training samples from the original data,and these training samples are used to build a decision tree;the rest Out-of-Bag data are used to verify the accuracy of established decision tree.

There areMestablished decision trees by employing Bootstrap sampling techniqueMtimes.All decision trees are used to compose a random forest(shown in Fig.1).And the final prediction results of RF are obtained by voting in the classification model or taking the mean in the regression model[23].And the prediction precision of RF can be expressed by mean square error square error(MSE)between predicted values and true values of OOB data.

Figure 1:Random forest

Bootstrap technique can extract training points to build a decision treehm(m=1,2,...,M)and the corresponding OOB data of inputXOOBand outputy.The decision treehmis used to predict the forecast responseymofXOOB.The MSE of decision treeObtain the MSEs of all decision treesεm(m=1,2,...,M),the average will be the total predicted error of RF model[24]:

In order to improve the prediction precision of RF,a high-precision Kriging model is used as the model of leaf nodes in the decision tree,replacing the original average or linear regression.Next,a nonlinear discontinuous function is used to verify the prediction accuracy of Kriging model and linear regression model of decision tree.

where the input variableXis uniformly distributed on[?π,π].

A comparison of Kriging based decision tree(abbreviated as Kriging-DT)and linear regression based decision tree(abbreviated as Linear-DT)for prediction data are shown in Fig.2.With the increase of training samples,the predicted errors of Kriging-DT and linear-DT are shown in Fig.3.And it can be found that Kriging-DT can better approximate the original function.For the same training samples,Kriging-DT has higher prediction accuracy and faster decline rate of predicted error than Linear-DT.Kriging-DT inherits the advantages of Kriging model and has good applicability for nonlinear piecewise function.

There are two kinds of importance measures based on RF:Mean Decrease Impurity(MDI)based on Gini index and Mean Decrease Accuracy(MDA)based on OOB data.MDA index is widely used to rank important variables on the prediction accuracy of RF model[12].

Figure 2:Comparsion of Kriging-DT,Linear-DT and predict data with 64 training samples

Figure 3:Predicted errors of Kriging-DT and Linear-DT vs.size of training samples

3.1 Mean Decrease Accuracy Index of Random Forest

MDA index is the average reduction of prediction accuracy after randomly permuting OOB data.Permuting the order of variable in OOB data,the corresponding relationship between the OOB sample and output will be destroyed.The prediction accuracy will be calculated after each permutation.The MSE between the paired predictions is taken as the importance measure.

3.2 Single Variable Importance Measure of Random Forest

3.3 Group Variable Importance Measure of Random Forest

3.4 Correlated Variable Importance Measure of Random Forest

With the past years,several techniques based on RF are proposed to measure the importance of the correlated variables[25,26].However,these researches directly use the independent importance measure techniques to estimate the importance of the correlated variables,which is not reasonable.Reference[27,28]divided the variance-based sensitivity indices into correlated contribution and independent contribution.Moreover,sparse grid integration(SGI)is carried out to perform importance analysis for correlated variables[29].In the paper,the correlation of correlated variables is considered in the process of the RF importance measure.The necessary procedure of a single decision tree of the RF model for estimating the VIM consists of the following steps:

The importance measure indices in correlated space and independent space are all given based on RF,which will establish the complete VIM system.

4 Link between VIM of RF and Variance-Based Global Sensitivity

The similarity analysis process of MDA indexbased on OOB data and single-loop Monte Carlo simulation of variance-based global sensitivity can be used as a breakthrough point to find their link.The relationship between MDA index and variance-based global sensitivity can be explored firstly.

5 Examples and Discussion

5.1 Numerical Example 1:Ishigami Function

Ishigami function is considered:

Figure 4:The convergence trends of the important measures with sample size(a)The convergence trend of MC simulation(b)The convergence trend of RF model

Table 1:The single variable VIMs of Ishigami function

Table 2:The group variables VIMs of Ishigami function

5.2 Numerical Example 2:Linear Function with Correlated Variables

A linear model is considered[28]:

Y=X1+X2+X3

There are 500 decision trees and 600 samples used to analyze the importance measures.Fig.5 shows the importance measures of the correlated input variables with differentρs.Tab.3 shows the importance measures of independent and correlated variables cases atσ=2.Additionally,the analytical solutions are also presented for comparison.

Figure 5:The importance measures of correlated input variables at different correlation coefficients(a)Importance measures vs.correlation coefficients(b) vs.correlation coefficients

Table 3:The single variable VIMs of Example 5.2

5.3 Numerical Example 3:Nonlinear Function with Correlated Variables

SetμX=[0,0,250,400]and standard variance vectorσ=[4,2,200,300].There are 500 decision trees and 3000 samples to construct the RF model.Tab.4 shows the VIMs results of group variables for the independent variable.The Pearson correlation coefficients areρ12=0.3 andρ34=?0.3.Tab.5 shows the importance measures of single variable in the case of correlated and independent variable space.

Table 4:The group variables VIMs of Example 5.3

Table 5:The single variable VIMs of Example 5.3

Tabs.4 and 5 show that analytical values and numerical simulation of VIMs have good consistency.In independent variable space,the third and fourth order sensitivity indices are all equal to zero,so the relationship of important measures of single variable and group variables are also.

5.4 Engineering Example 4:Series and Parallel Electronic Models

Since the reliability of an electronic instrument in design stages has attracted much attention.Two simple electronic circuit models from reference[31]are used to get the VIMs.The series and parallel structures(shown in Fig.6)are all considered in the importance measures.Each of the electronic circuit models contains four elements.The lifetimeTiindependently obeys exponential distribution.The failure rate parameters areλ=[1,1/4.5,1/9,1/99],and the lifetimeTof the models can be respectively expressed as:

Series model:T=min(T1,T2,T3,T4)

Parallel model:T=max(T1,T2,T3,T4)

Figure 6:The series and parallel electronic circuit structures(a)Series model(b)Parallel model

Tabs.6 and 7 show the computational results of the importance measures by RF model,there are 500 decision trees and 15000 samples in the RF model.Due to the electronic circuit structures are discontinuous,more samples are needed to acquire the precise surrogate model and the importance measures.Additionally,the MC simulation results with 6×225random samples are presented as approximate exact solutionsandfor comparison.From the comparison,the RF importance measures are also appropriate for the discontinuous model.The main sensitivity indices are almost equal to the total indices in the parallel model,while they have a significant difference in the series model(seen from Tab.6).The second-order indices of series model are not equal to zero(seen from Tab.7),which causes the VIMs difference between parallel model and series model.

Table 6:The single variable VIMs of electronic models

Table 7:The group variables VIMs of series model

5.5 Engineering Example 5:A Cantilever Tube Model

A cantilever tube model(shown in Fig.7)is used to analyze the variable importance measures.The model is a nonlinear model with six random variables.The input variables are outer diameterd,thicknesst,external forcesF1,F2,Pand torsionT,respectively.

The tensile stressσxand the torsion stressτzxcan be analyzed:

where the sectional areaA,the bending momentMand the inertia momentIcan be calculated by the following formula:

Figure 7:The cantilever tube model

And the maximum stress of the cantilever can be calculated asAll input variablest,d,F1,F2,PandTare normally distributed with parameters shown in Tab.8.The Pearson correlation coefficients areρtd=0.3 andρF1F2=0.5.There are 500 decision trees and 7000 samples in the RF model.Tab.9 gives the variable importance measures by RF method and the single-loop Monte Carlo simulation method.The cost of the MC method is 8×223points for each case.

Table 8:Distribution parameters of input variables

For the independent variables,the main and total sensitivity indices of input variables are very close(seen from Tab.9),which suggests that the influence of these variables to the output response mainly come from unique variables and the interaction contribution is very small.The external forcePis the most important variable in the independent space;the importance of the other input variables has a slight difference.

Furthermore,the importance measures are different in the correlated variable space.For the correlated input variablest,d,F1andF2the sensitivity indicesthe influence on the output response mainly originates from the correlated contribution by Pearson correlation coefficients.For the input variablesPandT,they are independent with other variables,so the first order indices are almost equal to total sensitivity indices.Therefore,the proposed variable RF importance measure system not only reflects the important variables but also provides useful information to identify the structure of the engineering model,which will provide useful guidance for the engineering design and optimization.

Table 9:The VIMs of cantilever tube model

5.6 Engineering Example 6:Solar Wing Mast of Space Station

The solar wing mast of space station is a truss structure in 3D space based on triangular structure,shown in Fig.8.

Figure 8:Solar wing mast structure[32]

The solar wing mast is made of titanium alloy.The material properties(including densityρ,Elastic modulusE,Poisson’s rationν),external load(including dynamic loadF1and static loadF2)and sectional area of trussAare random variables,the corresponding distribution parameters are listed in Tab.10.

Software CATIA is used to establish the geometry and finite element model,and then taking the maximum stress as the output response,ABAQUS was repeatedly called to analyze the finite element model.And finally 210 samples were obtained.Random forest is used to analyze the variable importance measures,the results of VIMs are listed in Tab.11.

Table 10:Distribution parameters of input variables

Table 11:The VIMs of solar wing mast

According to the results of variable importance measures,the main sensitivity index of Poisson’s rationνis almost zero,and the total sensitivity index is also the minimum one.In order to simplify the model,the Poisson’s rationνcan be considered as a constant.The sectional area of trussAis the key design variable,sinceAhas the largest main sensitivity to output.There is a large interaction between densityρand Elastic modulusE,and the interaction sensitivity index can be indirectly solvedSρE≈0.4623.For external load,F1andF2can be regarded as secondary variables.The variable importance measures can give designer reasonable suggestions to allocate optimization spaces of design variables more effectively and reduce the optimization dimension.

6 Conclusions

The Kriging regression model is used as the leaf node model of decision tree to improve the prediction accuracy of RF.The single variable,group variables and correlated variables importance measures based on RF are presented,which constitute the complete RF variable importance measure system.Additionally,a novel approach for solving variance-based global sensitivity indices is presented,and the novel meaning of these VIM indices is also introduced.The results of the numerical and engineering examples testify that the VIM indices of RF can further derive the variance sensitivity indices with higher computational efficiency compared with single-loop MC simulation.

For some incomplete probability information,such as linear correlated non-normal variables,non-linear correlated variables and discrete input-output samples and so on,the proposed importance measure analysis method has some limitations in applicability.In future work,the importance measures under incomplete probability information will be studied based on equivalent transformation or Copula function.

Authors’Contributions:Conceptualization and methodology by Song,S.F.,validation and writing by He,R.Y.,examples and computation by Shi,Z.Y.,examples and writing by Zhang,W.Y.

Funding Statement:The authors received no specific funding for this study.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

主站蜘蛛池模板: 国产在线八区| 国产真实乱人视频| 亚洲成AV人手机在线观看网站| 亚洲综合精品第一页| 免费在线a视频| 亚洲成人77777| 国产欧美又粗又猛又爽老| 亚洲欧美色中文字幕| 在线中文字幕网| 日韩第一页在线| 久久亚洲天堂| 色综合中文综合网| 亚洲三级片在线看| 久久精品这里只有国产中文精品| 精品一区二区无码av| 国产www网站| 久久精品国产91久久综合麻豆自制| 自拍偷拍欧美| 亚洲综合色婷婷中文字幕| 亚洲成人网在线观看| 国产成人亚洲精品色欲AV| www.亚洲一区二区三区| 很黄的网站在线观看| 亚洲AⅤ综合在线欧美一区| 一级全免费视频播放| 五月婷婷综合色| 久久精品人人做人人爽电影蜜月| 特级毛片8级毛片免费观看| 亚洲AⅤ综合在线欧美一区| 五月激情婷婷综合| 538精品在线观看| 亚洲天堂日韩在线| 蜜臀av性久久久久蜜臀aⅴ麻豆| 欧美激情伊人| 凹凸精品免费精品视频| 中文字幕啪啪| 国产幂在线无码精品| 国产凹凸视频在线观看| 亚洲美女AV免费一区| 国产白浆视频| 日韩av无码精品专区| 日韩小视频在线播放| 国产在线拍偷自揄观看视频网站| 国产成人久视频免费| 亚洲综合色在线| 国产一区二区福利| 国产在线麻豆波多野结衣| 欧美一级黄色影院| 五月综合色婷婷| 欧美日韩精品一区二区视频| 曰AV在线无码| 麻豆AV网站免费进入| 亚洲男人的天堂视频| 青青操国产视频| 成人午夜视频网站| 99在线观看国产| 国产在线精品人成导航| 国产成人精品一区二区不卡| 精品一区二区无码av| 日韩天堂视频| 中文字幕伦视频| 手机精品视频在线观看免费| 亚洲香蕉久久| 国产成年女人特黄特色大片免费| 欧美日韩国产在线播放| 亚洲成a人片在线观看88| 蜜桃视频一区二区三区| 制服丝袜在线视频香蕉| 国产在线观看91精品亚瑟| 日韩国产黄色网站| 黄色网站在线观看无码| 91在线高清视频| 中文字幕亚洲综久久2021| 99成人在线观看| 日韩免费成人| jijzzizz老师出水喷水喷出| 亚洲视频欧美不卡| 又污又黄又无遮挡网站| 97青青青国产在线播放| 在线观看国产网址你懂的| 亚洲天堂免费| 精品无码一区二区三区电影|