999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Two Paradoxes in Linear Regression Analysis

2016-12-09 08:30:48GeFENGJingPENGDongkeTUJuliaZHENGChangyongFENG
上海精神醫學 2016年6期
關鍵詞:程序理論分析

Ge FENG, Jing PENG, Dongke TU, Julia Z. ZHENG, Changyong FENG,3*

?Biostatistics in psychiatry (36)?

Two Paradoxes in Linear Regression Analysis

Ge FENG1, Jing PENG2, Dongke TU4, Julia Z. ZHENG5, Changyong FENG2,3*

Forward selection, backward elimination, univariate regression; multiple regression

1. Introduction

Linear regression is the most widely used statistical model in data analysis.[1]Wide availability and ease of use of statistical software packages, such as SAS, SPSS and R make the linear regression accessible to people without any formal statistical training. Although wise use of statistical methods such as linear regression helps us, even novices, develop a better understand of data and guide our decisions, it also causes confusion in interpretation of results and paradoxical findings.For example, we are often asked by our biomedical collaborators questions like “When I run the univariate regression of Y on the predictor , the p-value is very small. However, if I add some other predictors in the model, is not signif i cant anymore. Why?” The same problem also occurs in logistic regression for binary outcome[2], log-linear regression for counting data[2],and Cox proportional hazards regression for survival data.[3]

A simple answer to this question is the different assumptions between the univariate and multiple regression models. However, this is not so meaningful for non-statisticians. This is discussed in Section 2.

In many medical studies, regression analysis involves a large of number of independent variables,or predictors. Model selection is required to find the predictors that are signif i cantly associated with an outcome, or dependent variable, of interest. Here is how the model selection was done in a recent paper published in JAMA Surgery[4]:

“The administrative database was then evaluated by means of univariate and multivariate logistic regression. First we identified variables that were associated (P < .20) with readmission, the dependent variable. These potential confounders were then entered in multivariate stepwise (backward elimination) logistic regression, with readmission as the dependent variable.A logistic regression model was constructed to identify patient factors associated with readmission.”

This forward selection procedure as the fi rst step to weed out “non-signif i cant” predictors has been become almost the gold standard for variable selection and has been used in many papers published in top medical journals.[5-24]The key idea of this method is fi rst to run a univariate regression on each predictor. If the p-value is less than some pre-specif i ed level, for example 0.1,then the predictor is used in the multiple regression.Otherwise, the predictor is assumed to have no signif i cant effect on the outcome. This method seems quite logical and intuitively meaningful. Indeed, it has been used and is still being used by the biomedical and other research communities. Is this a valid procedure?

In this paper we use linear regression analysis to show two paradoxes in regression analysis. In Section 2 we use some very basic theory to show how the univariate regression and multiple regression make different assumptions on the models. We use examples and simulation studies to show two paradoxes in regression analysis in Section 3. Section 4 brief l y discusses the transitivity of correlation. Our results clearly invalidate the model selection procedure widely used in biomedical research.

2. Basic theory

Let (Y, X1, ..., Xp) be a random vector, where X1, ..., Xpare called the covariates (independent variables),and Y is called the outcome (dependent variables).The regression of Y on (X1, ..., Xp) is the conditional expectation of Y given (X1, ..., Xp), denoted by E[Y|X1, ...,Xp] which is a measurable function of (X1, ..., Xp). Denote the function by g(X1, ..., Xp). Without knowing the joint distribution of (X1, ..., Xp, Y ), in general, the form of g(X1, ..., Xp) is unknown. In statistical analysis, we usually assume some mathematically tractable forms of g(X1, ..., Xp). For example, the linear regression analysis[1]assumes that

In the logistic regression analysis with 0-1 outcome[2],we assume that

In this paper we assume the outcome Y is continuous.Let

It is obvious that E[Y|X1, ..., Xp] = 0. We consider a stronger form of the liner regression model

and assume that given X1, ..., Xp, the variance of ε

which does not depend on (X1, ..., Xp). This assumption is also used in most statistical literature on linear model.[1]We further assume that Xk, k = 1, . . . , p, have finite second moments.

From (1) we have

Let Zk= E[Xk|X1] , k = 1, . . . , p. (It is clear that Zk= Xk).Then the regression of Y on X1is

which still has a linear form. Let Then

Although (3) has the same form as (1), they are fundamentally different in the error terms. Note that E[η|X1] = 0, Cov( Zk, η) = 0, k = 1, . . . , p. However, the conditional variance of η given X1is

Therefore, the conditional variance of η given X1is no longer a constant. This violates the fundamental assumption used in linear regression model.[1]

The univariate linear regression of on assumes the following form of the model

From (3) we know that generall

Suppose (Y, Xi1, ..., Xip), i = 1, . . . , n, is a random sample from (1). Let Letbe the least square estimate of the univariate regression of Yion X1iin (4). Then

and

3. Two paradoxes in linear regression analysis

In this section we show why the estimates of the coefficient of some covariates in the univariate regression and in the multiple regression do not match.More specif i cally, we show that in some cases, the estimate from the univariate regression is signif i cant,but the result from the multiple regression is not. On the other hand, in some cases, the result is signif i cant for the multiple regression but not for the univariate regression.

Suppose (1) is the true multiple regression model.The univariate regression model uses model (4) by assuming that= 0. This assumption is generally wrong unless E[Xk|X1] is a constant (k = 2, . . . , p). Hence,with a correct multiple regression model, the estimate of the univariate analysis is based on a wrong model.This is the reason why the results from univariate regression and multiple regression do not match.Furthermore, result (5) shows that there is no clear interpretation of the estimate in the univariate analysis.

We discuss two paradoxes related to univariate and multiple regressions through both theoretical derivations and simulation studies.

3.1 Signif i cant covariate effect in multiple regression but not in univariate regression

Let X2, X3, X4and ε be independent random variables with standard normal distributions. Consider the following model

which is 0 if and only if

From (5) we know that if (7) is true, the least square estimatorof the coefficient of the univariate regression of Y on X1will not be signif i cant, even though X1is necessary in specifying model (6).

Example 1.Let α1= -3/5, α2= 3, α3= 4, β1= 1, β2= 2 in (6).The true model is

Table 1 shows the simulation result of the estimates and standard deviations of the coefficient of X1in both univariate and multiple regressions after 10,000 replications. For a wide range of sample sizes, the least square estimator of the coefficient of X1in the multiple regression is very close to the true value, and the standard deviation decreases signif i cantly with the sample size. However, the estimate of coefficient in the univariate analysis is very close to 0 in all cases.

According to the practice in medical publications[4-24], X1will not enter the multiple regression. Table 2 shows the result of the least square estimates of the coefficients of X2and X3after X1is removed in (8). It is easy to see that the estimate of the coefficient of X2is dramatically biased in the multiple regression after X1is removed due to the univariate analysis.

3.2 Signif i cant covariate effect in univariate regression but not in multiple regression

Suppose X1, X2, X3and ε are independent standard normal random variables, and X4= β1X1+β2X2,where

Table 1. Estimate of the regression coefficientof X1

Table 2. Estimates of the regression coefficients of X2 and X3 with X1 being removed

Consider the following true model is

If (9) is expanded to include X4and the expanded model still satisf i es the conditions of the linear regression, then the regression equation becomes

From (9) and (10) we have

or

Example 2.Let α0= 0, α1= 1, α2= 2 in (9) and β1= β2=1, Table 3 shows the least square estimates of the coefficient of X4in both univariate and multiple linear regressions after 10,000 replications. For all sample sizes, the univariate regression shows that X4has very signif i cant effect on Y. However, in the multiple regression, the effect is not signif i cant.

4. Transitivity of correlation

Another issue around the regression analysis is the transitivity of the correlation in the interpretation.For example, some people may say like that: “Since factor A is highly correlated with outcome Y, and factor A and factor B are highly correlated, then B should be correlated with Y.” It seems very intuitive and reasonable that correlation is transitive. Unfortunately,this is not true. Here is a theoretical example. Suppose X and Z are independent standard normal random variables and Y=X+Z. It’s clear that the correlation between X and Y, and between Y and Z are both 0.707.However, the correlation between X and Z is 0.

Table 3. Estimate of the regression coefficient of X4

In our Example 2, the correlations between X4and X1and Y are 0.707 and 0.408, respectively. However,we proved in Section 3.2 shows that X4has no role in the multiple regression if X1and X2are in the model although X4is not a linear combination of X1and X2.

5. Discussion

Regression analysis in medical research usually involves many predictors (independent variables). The model selection is needed to pick covariates having signif i cant effect on the outcome. A widely used method in medical publications[4-24]is first to screen those covariates through univariate analysis. If a covariate is not significant in the univariate regression analysis,it will not enter the multiple regression analysis. The underlying assumption of this method is that is a covariate is significant in the multiple regression only if it is significant in the univariate regression analysis.Our results indicate that this assumption is wrong.A covariate may be very signif i cant in the univariate regression but has no role in the multiple regression (see Example 2 in Section 3). On the other hand, a covariate is a necessary part of a multiple regression but may be not correlated with the outcome (see Example 1 in Section 3). The initial univariate screening method totally ignores the correlation among covariates.There is no theoretical work to support this method.Our simulation results clearly show that the multiple regression results after the univariate screening may be dramatically biased and misleading. The biomedical community should stop using this procedure in their research and publications.

Funding

None

Conflict of interest statement

The authors report no conflict of interest related to this manuscript.

Author’s contribution

Ge Feng and Changyong Feng: theoretical derivation and revision

Jing Peng, Dongke Tu, and Julia Z. Zheng: Simulation and manuscript drafting

1. Seber GAF, Lee AJ. Linear regression analysis (2nd ed).Hoboken, NJ: Wiley; 2003

2. Agresti A. Categorical data analysis (2nd ed). Hoboken, NJ:Wiley; 2002

3. Cox DR. Regression models and life-tables (with discussion).J R STAT SOC. 1972; B. 34:187-220. doi: http://dx.doi.org/10.2307/2985181

4. McIntyre LK, Arbabi S, Robinson EF, Maier RV. Analysis of Risk Factors for Patient Readmission 30 Days Following Discharge From General Surgery. JAMA Surgery. 2016; (Epub ahead of print). doi: http://dx.doi.org/10.1001/jamasurg.2016.1258

5. Bardia A, Sood A, Mahmood F, Orhurhu V, Mueller A,Montealegre-Gallegos M, et al. Combined epiduralgeneral anesthesia vs general anesthesia alone for elective abdominal aortic aneurysm repair. JAMA Surgery. 2016;(Epub ahead of print). doi: http://dx.doi.org/10.1001/jamasurg.2016.2733

6. Barlesi F, Mazieres J, Merlio JP, Debieuvre D, Mosser J, Lena H,et al. Routine molecular prof i ling of patients with advanced non-small-cell lung cancer: results of a 1-year nationwide programme of the French Cooperative Thoracic Intergroup(IFCT). Lancet. 2016; 387: 1415-1426. doi: http://dx.doi.org/10.1016/S0140-6736(16)00004-0

7. Brooks GA, Kansagra AJ, Rao SR, Weitzman JI, Linden EA,Jacobson JO. A clinical prediction model to assess risk for chemotherapy-related hospitalization in patients initiating palliative chemotherapy. JAMA Oncology. 2015; 1(4): 441-447; doi: http://dx.doi.org/10.1001/jamaoncol.2015.0828

8. Cronin PR, DeCoste L, Kimball AB. A multivariate analysis of dermatology missed appointment predictors. JAMA Dermatology. 2013; 149(12): 1435-1437. doi: http://dx.doi.org/10.1001/jamadermatol.2013.5771

9. Fivez T, Kerklaan D, Mesotten D, Verbruggen S, Wouters PJ,Vanhorebeek I, et al. Early versus late parenteral nutrition in critically Ill children. N Engl J Med. 2016; 374(12): 1111-1122. doi: http://dx.doi.org/10.1056/NEJMoa1514762

10. Geng E, Kreiswirth B, Burzynski J, Schluger NW. Clinical and radiographic correlates of primary and reactivation tuberculosis: a molecular epidemiology study. JAMA.2005; 293(22): 2740-2745. doi: http://dx.doi.org/10.1001/jama.293.22.2740

11. Hole J, Hirsch M, Ball E, Meads C. Music as an aid for postoperative recovery in adults: a systematic review and meta-analysis. Lancet. 2015; 386: 1659-1671. doi: http://dx.doi.org/10.1016/S0140-6736(15)60169-6

12. International CLL-IPI working group. An international prognostic index for patients with chronic lymphocytic leukaemia (CLL-IPI): A meta-analysis of individual patient data. Lancet Oncology. 2016; 17(6): 779-790. doi: http://dx.doi.org/10.1016/S1470-2045(16)30029-8

13. Leon MB, Smith CR, Mack MJ, Makkar RR, Svensson LG,Kodali SK, et al. Transcatheter or surgical aortic-valve replacement in intermediate-risk patients. N Engl J Med.2016; 374(17): 1609-1620. doi: http://dx.doi.org/10.1056/NEJMoa1514616

14. Li Y, Stocchi L, Cherla D, Liu X, Remzi FH. Association of preoperative narcotic use with postoperative complications and prolonged length of hospital stay in patients with crohn disease. JAMA Surgery. 2016; 151(8): 726-734. doi: http://dx.doi.org/10.1001/jamasurg.2015.5558

15. Lorant V, Deli?ge D, Eaton W, Robert A, Philippot P, Ansseau M. Socioeconomic Inequalities in Depression: A Meta-Analysis. Am J Epidemiol. 2003; 157(2): 98-112. doi: http://dx.doi.org/10.1093/aje/kwf182

16. van der Meer AJ, Veldt BJ, Feld JJ, Wedemeyer H, Dufour JF,Lammert F, et al. Association between sustained virological response and all-cause mortality among patients with chronic hepatitis C and advanced hepatic fi brosis. JAMA.2012; 308(24): 2584-2593. doi: http://dx.doi.org/10.1001/jama.2012.144878

17. Mingrone G, Panunzi S, De Gaetano A, Guidone C, Iaconelli A, Nanni G, et al. Bariatricmetabolic surgery versus conventional medical treatment in obese patients with type 2 diabetes: 5 year follow-up of an open-label, single-centre,randomized controlled trial. Lancet. 2015; 386: 964-973. doi:http://dx.doi.org/10.1016/S0140-6736(15)00075-6

18. Nelson KB, Ellenberg JH. Antecedents of cerebral palsy:I. univariate analysis of risks. Am J Dis Child. 1985;139(10): 1031-1038. doi: http://dx.doi.org/10.1001/archpedi.1985.02140120077032

19. Nelson KB, Ellenberg JH. Antecedents of cerebral palsy:Multivariate analysis of risk. N Engl J Med. 1986; 315(2): 81-86. doi: http://dx.doi.org/10.1056/NEJM198607103150202

20. NICE-SUGAR Study Investigators. Hypoglycemia and risk of death in critically ill patients. N Engl J Med. 2012; 367(12):1108-1118. doi: http://dx.doi.org/10.1056/NEJMoa1204942

21. Pag?s F, Berger A, Camus M, Sanchez-Cabo F, Costes A,Molidor R, et al. Effector memory T cells, early metastasis,and survival in colorectal cancer. N Engl J Med. 2005;353(25): 2654-2666. doi: http://dx.doi.org/10.1056/NEJMoa051424

22. Schwed AC, Boggs MM, Pham XD, Watanabe DM,Bermudez MC, Kaji AH, et al. Association of admission laboratory values and the timing of endoscopic retrograde cholangiopancreatography with clinical outcomes in acute cholangitis. JAMA Surgery. 2016; (Epub ahead of print). doi:http://dx.doi.org/10.1001/jamasurg.2016.2329

23. Templin C, Ghadri JR, Diekmann J, Napp LC, Bataiosu DR, Jaguszewski M, et al. Clinical features and outcomes of takotsubo (stress) cardiomyopathy. N Engl J Med.2015; 373(10): 929-938. doi: http://dx.doi.org/10.1056/NEJMoa1406761

24. Wood GC, Benotti PN, Lee CJ, Mirshahi T, Still CD, Gerhard GS, Lent MR. Evaluation of the association between preoperative clinical factors and long-term weight loss after roux-en-y gastric bypass. JAMA Surgery. 2016;(Epub ahead of print). doi: http://dx.doi.org/10.1001/jamasurg.2016.2334

Ge Feng is a graduate student in the School of Geophysics and Oil Resources at Yangtze University,Wuhan, Hubei, China. His research interest includes statistical analysis in rock physics.

線性回歸分析中的兩個悖論

Feng G, Peng J, Dongke TU, Zheng JZ, Feng C

向前選擇,向后消除,單變量回歸,多元回歸

Regression is one of the favorite tools in applied statistics. However, misuse and misinterpreta-tion of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.

[Shanghai Arch Psychiatry. 2016; 28(6): 355-360.

http://dx.doi.org/10.11919/j.issn.1002-0829.216084]

1School of Geophysics and Oil Resource, Yangtze University, Wuhan, China

2Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY, USA

3Department of Anesthesiology, University of Rochester, Rochester, NY, USA

4School of Philosophy, Wuhan University, Wuhan, China

5Department of Microbiology and Immunology, McGill University, Montreal, QC, Canada

*correspondence: Dr. Changyong Feng. Mailing address: Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Ave., Box 630, Rochester, NY, USA. Postcode: NY 14642. E-mail: Changyong_feng@urmc.rochester.edu

概述:回歸是應用統計學中最受歡迎的工具之一。然而,回歸分析結果的誤用和誤解在生物醫學研究中是常見的。本文運用統計理論和模擬研究來說明有關這種普遍使用的統計方法的一些悖論。我們還特別指出在頂級醫學期刊發表的很多文章中廣泛使用的模型選擇程序事實上是錯誤的。模型選擇使用哪一種步驟化程序需基于可靠的統計理論。

猜你喜歡
程序理論分析
堅持理論創新
當代陜西(2022年5期)2022-04-19 12:10:18
神秘的混沌理論
理論創新 引領百年
隱蔽失效適航要求符合性驗證分析
相關于撓理論的Baer模
試論我國未決羈押程序的立法完善
人大建設(2019年12期)2019-05-21 02:55:44
電力系統不平衡分析
電子制作(2018年18期)2018-11-14 01:48:24
“程序猿”的生活什么樣
英國與歐盟正式啟動“離婚”程序程序
環球時報(2017-03-30)2017-03-30 06:44:45
電力系統及其自動化發展趨勢分析
主站蜘蛛池模板: 激情综合激情| 国产网友愉拍精品| 中文字幕在线欧美| 国产你懂得| 精品乱码久久久久久久| 精品午夜国产福利观看| 99视频在线精品免费观看6| 福利在线不卡| 99热亚洲精品6码| 国产美女自慰在线观看| 亚洲美女久久| 成人综合在线观看| a级毛片免费看| 久久香蕉国产线看观看亚洲片| 精品三级在线| 国产AV无码专区亚洲精品网站| 成人亚洲国产| 亚洲成人动漫在线| 国产理论精品| 亚洲日韩精品伊甸| 久久a毛片| 国产不卡在线看| 国产福利在线免费| 欧美日韩综合网| 国产在线91在线电影| 欧美亚洲国产精品第一页| 国产成人精品免费视频大全五级| 欧美一区二区丝袜高跟鞋| 尤物午夜福利视频| 成人在线不卡视频| 波多野结衣在线se| 国产亚洲美日韩AV中文字幕无码成人| 久久人人97超碰人人澡爱香蕉| 国产一区二区视频在线| 性色在线视频精品| 欧美亚洲香蕉| 色综合中文| 成人免费视频一区二区三区| 婷婷亚洲最大| 久久精品视频亚洲| 国产在线自乱拍播放| 欧美视频在线播放观看免费福利资源| 國產尤物AV尤物在線觀看| 免费高清毛片| 永久在线精品免费视频观看| 欧美69视频在线| 波多野吉衣一区二区三区av| 色妞www精品视频一级下载| 日韩欧美国产三级| 久久亚洲国产一区二区| 欧美a级在线| 亚洲三级色| 中文字幕无码av专区久久| 国产精品第三页在线看| 成人福利在线免费观看| 国产AV无码专区亚洲精品网站| 综合人妻久久一区二区精品| 久久青草精品一区二区三区 | 国产午夜小视频| 人人艹人人爽| 一级毛片免费不卡在线| 好久久免费视频高清| 午夜国产理论| 精品一区二区三区无码视频无码| 久久久久人妻一区精品| 成人在线不卡视频| 亚洲第一精品福利| 国产成人精品三级| 国产精品美乳| 91在线中文| 欧洲成人在线观看| 精品亚洲国产成人AV| 超碰91免费人妻| 亚洲日本中文字幕天堂网| 国产一区二区三区在线无码| 午夜欧美理论2019理论| 99re经典视频在线| 香蕉伊思人视频| 97成人在线视频| 91在线视频福利| 国产高清毛片| 在线精品亚洲国产|