今天,計(jì)量經(jīng)濟(jì)圈引薦一下“confounder(混淆者)”和“collider(對撞者)”。簡單講,混淆變量就是那些同時會影響政策處理變量和結(jié)果變量的變量,在英文中也叫common factors。這樣的例子很多,比如研究接受職業(yè)培訓(xùn)項(xiàng)目是否會提高個體找到工作的概率?其中,可能的混淆變量包括但不限于:年齡和性別等。因?yàn)椋衲挲g會同時影響個體是否參與職業(yè)培訓(xùn)項(xiàng)目以及是否能找到工作。要識別出政策效應(yīng),咱們要做的就是盡量分離出混淆因素,因此,在實(shí)證中需要做的就是控制住混淆變量。具體地,咱們需要將混淆變量作為控制變量放到回歸方程中。那如果有些混淆變量不可觀測,此時,該如何減輕他們對政策效應(yīng)估計(jì)的影響呢?盡量找到代理變量,比如用教育水平作為個體能力的代理變量。 接下來,計(jì)量經(jīng)濟(jì)圈講一講collider。他與混淆變量是相反的,但也對政策效應(yīng)估計(jì)產(chǎn)生重要影響。collider是對撞的意思,即政策處理變量和結(jié)果變量都會影響這個變量。比如,在上面的例子中,收入可能就是一個collider,因?yàn)閰⑴c職業(yè)培訓(xùn)和找工作的概率都會影響到個體的收入。此時,咱們?nèi)粼趥€體工作回歸方程中誤加收入變量作為控制變量,那會導(dǎo)致政策效應(yīng)估計(jì)出現(xiàn)錯誤(collider bias)。有時候,甚至?xí)寖蓚€本不相關(guān)的變量或者負(fù)(正)相關(guān)的變量,產(chǎn)生相關(guān)關(guān)系或正(負(fù))相關(guān)關(guān)系。其中,以伯克森悖論(Berkson's Paradox)最為著名。也就是說,凡是collider變量,咱們都不要把他們放到回歸方程中去。 伯克森悖論,指的是兩個本來無關(guān)的變量之間體現(xiàn)出貌似強(qiáng)烈的相關(guān)關(guān)系。舉個例子來說,假設(shè)某學(xué)校在招收學(xué)生時,要求學(xué)生要么學(xué)習(xí)成績好,要么體育成績好。所有的報(bào)考學(xué)生需要參加兩門考試:文化(語數(shù)外),和體育(跑跳投)。最后,學(xué)校僅錄取在任一考試中考到90分以上的報(bào)考學(xué)生。所以能夠被學(xué)校錄取的學(xué)生,要么在文化考試中考到90分以上,或者在體育考試中考到90分以上,或者在兩門考試中都考到90分以上。現(xiàn)在如果我們分析這些被入取學(xué)生的成績分布,會發(fā)現(xiàn)一個學(xué)生的學(xué)習(xí)成績,和體育成績是負(fù)相關(guān)的。因?yàn)槟切w育成績最好的學(xué)生(比如體育100分),他們的文化平均分為50分(假設(shè)他們的文化考試呈現(xiàn)正態(tài)分布)。而體育成績最差的學(xué)生(比如體育成績10分),其文化平均成績?yōu)?5分(因?yàn)橹挥谐^90分的學(xué)生才被錄?。?。因此,分析人員可能會得出結(jié)論:體育越好,文化成績越差。文化成績越好,體育越差。但這個結(jié)論顯然是錯誤的。 下面是一個比較簡潔有效的英文介紹,里面使用了DAG圖形來區(qū)分confounder與collider。用'因果關(guān)系圖'來進(jìn)行因果推斷的新技能。 BackgroundWhen an exposure and an outcome independently cause a third variable, that variable is termed a ‘collider’. Inappropriately controlling for a collider variable, by study design or statistical analysis, results in collider bias. Controlling for a collider can induce a distorted association between the exposure and outcome, when in fact none exists. This bias predominantly occurs in observational studies. Because collider bias can be induced by sampling, selection bias can sometimes be considered to be a form of collider bias. The diagram below contrasts bias through confounding and collider bias. Example A clear example of collider bias was provided by Sackett in his 1979 paper. He analysed data from 257 hospitalized individuals and detected an association between locomotor disease and respiratory disease (odds ratio 4.06). The association seemed plausible at the time – locomotor disease could lead to inactivity, which could cause respiratory disease. But Sackett repeated the analysis in a sample of 2783 individuals from the general population and found no association (odds ratio 1.06). The original analysis of hospitalized individuals was biased because both diseases caused individuals to be hospitalized. By looking only within the stratum of hospitalized individuals, Sackett had observed a distorted association. In contrast, in the general population (including a mix of hospitalized and non-hospitalized individuals) locomotor disease and respiratory disease are not associated. In 1979, Sackett termed this phenomenon “admission rate bias”. With the help of causal diagrams (also known as directed acyclic graphs [DAGs]), this phenomenon can be explained by collider bias (Figure 1). In this example, locomotor disease and respiratory disease are independent causes of hospitalization – the collider (since the two arrowheads collide into hospitalization). If the collider is controlled for by study design (selection bias), a distorted association will arise between locomotor and respiratory disease. This is what we see in Sackett’s 1979 example. Hypothetically, if he had statistically controlled for hospitalization in the general population dataset, he would have induced collider bias again, not through selection, but statistical error. Figure 1. A causal diagram demonstrating collider bias. Controlling for hospitalization induces a distorted association between locomotor disease and respiratory disease. A more recent example of the collider bias can be seen in the ‘obesity paradox’ (Figure 2). This paradox describes an apparent preventive effect of obesity on mortality in individuals with chronic conditions such as cardiovascular disease (CVD). In fact, obesity increases mortality rates in the general population. The collider bias occurs when an investigator conditions on CVD (by design or analysis), resulting in a distorted association between obesity and unmeasured other factors. This distorted association is what distorts the effect of obesity on mortality. Consequently, in a sample that includes only patients with CVD, obesity falsely appears to protect against mortality, whereas in the wider population (with and without CVD), obesity increases the risk of early death. There is some debateabout whether collider bias completely explains the obesity paradox. Figure 2. A causal diagram demonstrating how the obesity paradox can be explained by collider bias. ImpactCollider bias can have major effects. In Sackett’s example, collider bias inflated a null effect (unbiased odds ratio 1.06) to a positive effect (biased odds ratio 4.06). In the obesity paradox example, collider bias switched an unbiased harmful effect of obesity on mortality into a biased protective effect. This was shown in an analysis of the third US National Health and Nutrition Examination Survey (NHANES III). In the unbiased analysis, the mortality risk ratio for the entire cohort was 1.24 [95% CI = 1.11, 1.39] (harmful). In the biased analysis, the stratum-specific mortality risk ratio was 0.79 [95% CI = 0.68, 0.91] (protective) in patients with CVD. The impact of collider bias – published examples Preventive stepsCollider bias can be prevented by carefully applying appropriate inclusion criteria – making sure that the exposure and outcome of interest do not drive inclusion or selective retention in a study. Causal diagrams (DAGs) can help identify colliders and non-colliders (or confounders). By using these techniques in the design and analysis of observational studies, researchers can identify colliders that should be left uncontrolled and confounders that should be controlled. |
|