廣義線性模型

醫(yī)學(xué)數(shù)據(jù)科學(xué) 2020-09-28

展開全文

廣義線性模型

廣義線性模型使用glm（）函數(shù)進(jìn)行擬合。glm函數(shù)的形式是

glm（formula ，family = familytype （link = linkfunction ），data =）

家庭	默認(rèn)鏈接功能
二項(xiàng)式	（link =“l(fā)ogit”）
高斯	（link =“身份”）
伽瑪	（link =“inverse”）
inverse.gaussian分布	（鏈接=“1 / mu ^ 2”）
泊松	（link =“l(fā)og”）
準(zhǔn)	（link =“identity”，variance =“constant”）
quasibinomial	（link =“l(fā)ogit”）
quasipoisson	（link =“l(fā)og”）

查看幫助（glm）了解其他建模選項(xiàng)。查看每個(gè)家庭的其他允許鏈接功能的幫助（家庭）。這里將介紹三種廣義線性模型的亞型：邏輯回歸，泊松回歸和生存分析。

Logistic回歸

當(dāng)您從一組連續(xù)預(yù)測(cè)變量中預(yù)測(cè)二元結(jié)果時(shí)，Logistic回歸很有用。由于其限制性較小的假設(shè)，它通常優(yōu)于判別函數(shù)分析。

# Logistic Regression # where F is a binary factor and # x1-x3 are continuous predictors fit <- glm(F~x1+x2+x3,data=mydata,family=binomial()) summary(fit) # display results confint(fit) # 95% CI for the coefficients exp(coef(fit)) # exponentiated coefficients exp(confint(fit)) # 95% CI for exponentiated coefficients predict(fit, type="response") # predicted values residuals(fit, type="deviance") # residuals

您可以使用anova （fit1 ，fit2 ，test =“Chisq”）來(lái)比較嵌套模型。此外，cdplot（?F ?X ，數(shù)據(jù)= MYDATA ）將顯示二元結(jié)果的條件密度情節(jié)?F在連續(xù)X變量。

點(diǎn)擊查看

泊松回歸

當(dāng)預(yù)測(cè)表示來(lái)自一組連續(xù)預(yù)測(cè)變量的計(jì)數(shù)的結(jié)果變量時(shí)，泊松回歸很有用。

# Poisson Regression # where count is a count and # x1-x3 are continuous predictors fit <- glm(count ~ x1+x2+x3, data=mydata, family=poisson()) summary(fit) display results
如果你有過(guò)度分散（看剩余偏差是否比自由度大得多），你可能想用quasipoisson（）而不是poisson（）。

生存分析

生存分析（也稱為事件歷史分析或可靠性分析）涵蓋了用于對(duì)事件時(shí)間建模的一組技術(shù)。數(shù)據(jù)可能被正確審查 – 該事件可能在研究結(jié)束時(shí)未發(fā)生，或者我們可能對(duì)觀察信息的信息不完整，但知道在某段時(shí)間內(nèi)沒(méi)有發(fā)生事件（例如，參與者在一周內(nèi)輟學(xué)10但當(dāng)時(shí)還活著）。

雖然廣義線性模型通常使用glm（）函數(shù)進(jìn)行分析，但生存分析通常是使用生存包中的函數(shù)進(jìn)行的。生存包可以處理一個(gè)和兩個(gè)樣本問(wèn)題，參數(shù)加速故障模型和Cox比例風(fēng)險(xiǎn)模型。

通常以格式開始時(shí)間，停止時(shí)間和狀態(tài)輸入數(shù)據(jù)（1 =發(fā)生事件，0 =事件未發(fā)生）。或者，數(shù)據(jù)格式可以是事件和狀態(tài)的時(shí)間（1 =發(fā)生事件，0 =事件未發(fā)生）。狀態(tài)= 0表示觀察結(jié)果正確。在進(jìn)一步分析之前，數(shù)據(jù)通過(guò)Surv（）函數(shù)綁定到Surv對(duì)象中。

survfit（）用于估計(jì)一個(gè)或多個(gè)組的生存分布。
survdiff（）測(cè)試兩個(gè)或更多組之間生存分布的差異。
coxph（）模擬一組預(yù)測(cè)變量的危險(xiǎn)函數(shù)。

# Mayo Clinic Lung Cancer Data library(survival) # learn about the dataset help(lung) # create a Surv object survobj <- with(lung, Surv(time,status)) # Plot survival distribution of the total sample # Kaplan-Meier estimator fit0 <- survfit(survobj~1, data=lung) summary(fit0) plot(fit0, xlab="Survival Time in Days", ylab="% Surviving", yscale=100, main="Survival Distribution (Overall)") # Compare the survival distributions of men and women fit1 <- survfit(survobj~sex,data=lung) # plot the survival distributions by sex plot(fit1, xlab="Survival Time in Days", ylab="% Surviving", yscale=100, col=c("red","blue"), main="Survival Distributions by Gender") legend("topright", title="Gender", c("Male", "Female"), fill=c("red", "blue")) # test for difference between male and female # survival curves (logrank test) survdiff(survobj~sex, data=lung) # predict male survival from age and medical scores MaleMod <- coxph(survobj~age+ph.ecog+ph.karno+pat.karno, data=lung, subset=sex==1) # display results MaleMod # evaluate the proportional hazards assumption cox.zph(MaleMod)

點(diǎn)擊查看

有關(guān)更多信息，請(qǐng)參閱Thomas Lumley 關(guān)于生存包的 R新聞文章。其他很好的資源包括麥周的使用R軟件進(jìn)行生存分析和模擬以及MJ克勞利關(guān)于生存分析的章節(jié)。

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：醫(yī)學(xué)數(shù)據(jù)科學(xué) > 《數(shù)據(jù)科學(xué)》

舉報(bào)/認(rèn)領(lǐng)