python機器學習生物信息學系列課(博主錄制):http://dwz.date/b9vw
測試腳本 測試數(shù)據(jù) T is an array of durations, E is a either boolean or binary array representing whether the “death” was observed (alternatively an individual can be censored).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | import lifelines
from lifelines.datasets import load_waltons
df = load_waltons() # returns a Pandas DataFrame
T = df[ 'T' ]
E = df[ 'E' ]
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
kmf.fit(T, event_observed = E) # more succiently, kmf.fit(T,E)
kmf.survival_function_
'''
Out[7]:
KM_estimate
timeline
0.0 1.000000
6.0 0.993865
7.0 0.987730
9.0 0.969210
13.0 0.950690
15.0 0.938344
17.0 0.932170
19.0 0.913650
22.0 0.888957
26.0 0.858090
29.0 0.827224
32.0 0.821051
33.0 0.802531
36.0 0.790184
38.0 0.777837
41.0 0.734624
43.0 0.728451
45.0 0.672891
47.0 0.666661
48.0 0.616817
51.0 0.598125
'''
kmf.median_
'''
Out[8]: 56.0
'''
kmf.plot()
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | import lifelines
from lifelines.datasets import load_waltons
from lifelines import KaplanMeierFitter
df = load_waltons() # returns a Pandas DataFrame
kmf = KaplanMeierFitter()
T = df[ 'T' ]
E = df[ 'E' ]
groups = df[ 'group' ]
ix = (groups = = 'miR-137' )
kmf.fit(T[~ix], E[~ix], label = 'control' )
ax = kmf.plot()
kmf.fit(T[ix], E[ix], label = 'miR-137' )
kmf.plot(ax = ax)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import numpy as np
import matplotlib.pyplot as plt
from lifelines.plotting import plot_lifetimes
from numpy.random import uniform, exponential
N = 25
current_time = 10
actual_lifetimes = np.array([[exponential( 12 ), exponential( 2 )][uniform()< 0.5 ] for i in range (N)])
observed_lifetimes = np.minimum(actual_lifetimes,current_time)
observed = actual_lifetimes < current_time
plt.xlim( 0 , 25 )
plt.vlines( 10 , 0 , 30 ,lw = 2 , linestyles = '--' )
plt.xlabel( 'time' )
plt.title( 'Births and deaths of our population, at $t=10$' )
plot_lifetimes(observed_lifetimes, event_observed = observed)
print 'Observed lifetimes at time %d:\n' % (current_time), observed_lifetimes
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import pandas as pd
import lifelines
from lifelines import KaplanMeierFitter
import matplotlib.pyplot as plt
data = lifelines.datasets.load_dd()
kmf = KaplanMeierFitter()
T = data[ 'duration' ]
C = data[ 'observed' ]
kmf.fit(T, event_observed = C )
plt.title( 'Survival function of political regimes' )
kmf.survival_function_.plot()
kmf.plot()
kmf.median_
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | import pandas as pd
import lifelines
from lifelines import KaplanMeierFitter
import matplotlib.pyplot as plt
data = lifelines.datasets.load_dd()
kmf = KaplanMeierFitter()
T = data[ 'duration' ]
C = data[ 'observed' ]
kmf.fit(T, event_observed = C )
plt.title( 'Survival function of political regimes' )
kmf.survival_function_.plot()
kmf.plot()
ax = plt.subplot( 111 )
dem = (data[ 'democracy' ] = = 'Democracy' )
kmf.fit(T[dem], event_observed = C[dem], label = 'Democratic Regimes' )
kmf.plot(ax = ax, ci_force_lines = True )
kmf.fit(T[~dem], event_observed = C[~dem], label = 'Non-democratic Regimes' )
plt.ylim( 0 , 1 );
plt.title( 'Lifespans of different global regimes' )
kmf.plot(ax = ax, ci_force_lines = True )
|
應(yīng)用于保險業(yè),病人治療,信用卡詐騙 信用卡拖欠
具體文檔 http://lifelines./en/latest/ https://wenku.baidu.com/view/577041d3a1c7aa00b52acb2e.html?from=search https://wenku.baidu.com/view/a5adff8b89eb172ded63b7d6.html?from=search https://github.com/thomas-haslwanter/statsintro_python/tree/master/ISP/Code_Quantlets/10_SurvivalAnalysis/lifelinesDemo 測試代碼 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | # -*- coding: utf-8 -*-
# Import standard packages
import numpy as np
import matplotlib.pyplot as plt
from numpy.random import uniform, exponential
import os
# additional packages
from lifelines.plotting import plot_lifetimes
import sys
sys.path.append(os.path.join( '..' , '..' , 'Utilities' ))
try :
# Import formatting commands if directory 'Utilities' is available
from ISP_mystyle import setFonts
except ImportError:
# Ensure correct performance otherwise
def setFonts( * options):
return
# Generate some dummy data
np.set_printoptions(precision = 2 )
N = 20
study_duration = 12
# Note: a constant dropout rate is equivalent to an exponential distribution!
actual_subscriptiontimes = np.array([[exponential( 18 ), exponential( 3 )][uniform()< 0.5 ] for i in range (N)])
observed_subscriptiontimes = np.minimum(actual_subscriptiontimes,study_duration)
observed = actual_subscriptiontimes < study_duration
# Show the data
setFonts( 18 )
plt.xlim( 0 , 24 )
plt.vlines( 12 , 0 , 30 , lw = 2 , linestyles = '--' )
plt.xlabel( 'time' )
plt.title( 'Subscription Times, at $t=12$ months' )
plot_lifetimes(observed_subscriptiontimes, event_observed = observed)
print ( 'Observed subscription time at time %d:\n' % (study_duration), observed_subscriptiontimes)
|
where k > 0 is the shape parameter and > 0 is the scale parameter of the distribution. (It is one of the rare cases where we use a shape parameter different from skewness and kurtosis.) Its complementary cumulative distribution function is a stretched exponential function. If the quantity x is a “time-to-failure,” theWeibull distribution gives a distribution for which the failure rate is proportional to a power of time. The shape parameter, k, is that power plus one, and so this parameter can be interpreted directly as follows: · Avalueofk < 1 indicates that the failure rate decreases over time. This happens if there is significant “infant mortality,” or defective items failing early and the failure rate decreasing over time as the defective items are weeded out of the population. · Avalueofk D 1 indicates that the failure rate is constant over time. This might suggest random external events are causing mortality, or failure. · Avalueofk > 1 indicates that the failure rate increases with time. This happens if there is an “aging” process, or parts that are more likely to fail as time goes on. An example would be products with a built-in weakness that fail soon after the warranty expires. In the field of materials science, the shape parameter k of a distribution of strengths is known as the Weibull modulus. 威布爾分布(Weibull distribution),又稱韋伯分布或韋布爾分布,是 可靠性分析和壽命檢驗的理論基礎(chǔ)。 威布爾分布:在可靠性工程中被廣泛應(yīng)用,尤其適用于機電類產(chǎn)品的磨損累計失效的分布形式。由于它可以利用概率值很容易地推斷出它的分布參數(shù),被廣泛應(yīng)用于各種壽命試驗的數(shù)據(jù)處理。 隨機變量分布之一。威布爾分布(Ⅲ型 極值分布)記為W(k,a,b)。
瑞典工程師威布爾從30年代開始研究軸承壽命,以后又研究結(jié)構(gòu)強度和疲勞等問題。他采用了“鏈式”模型來解釋結(jié)構(gòu)強度和壽命問題。這個模型假設(shè)一個結(jié)構(gòu)
是由若干小元件(設(shè)為n個)串聯(lián)而成,于是可以形象地將結(jié)構(gòu)看成是由n個環(huán)構(gòu)成的一條鏈條,其強度(或壽命)取決于最薄弱環(huán)的強度(或壽命)。單個鏈的強
度(或壽命)為一隨機變量,設(shè)各環(huán)強度(或壽命)相互獨立,分布相同,則求鏈強度(或壽命)的概率分布就變成求極小值分布問題,由此給出威布爾分布函數(shù)。
由于零件或結(jié)構(gòu)的疲勞強度(或壽命)也應(yīng)取決于其最弱環(huán)的強度(或壽命),也應(yīng)能用威布爾分布描述。 根據(jù)1943年蘇聯(lián)格涅堅科的研究結(jié)果,不管隨機變量的原始分布如何,它的極小值的漸近分布只能有三種,而威布爾分布就是第Ⅲ種極小值分布。 由于威布爾分布是根據(jù)最弱環(huán)節(jié)模型或串聯(lián)模型得到的,能充分反映材料缺陷和應(yīng)力集中源對材料疲勞壽命的影響,而且具有遞增的失效率,所以,將它作為材料或零件的壽命分布模型或給定壽命下的疲勞強度模型是合適的。
威布爾分布有多種形式,包括一參數(shù)威布爾分布、二參數(shù)威布爾分布、三參數(shù)威布爾分布或混合威布爾分布。三參數(shù)的威布爾分布由形狀、尺度(范圍)和位置三
個參數(shù)決定。其中形狀參數(shù)是最重要的參數(shù),決定分布密度曲線的基本形狀,尺度參數(shù)起放大或縮小曲線的作用,但不影響分布的形狀。通過改變形狀參數(shù)可以表示
不同階段的失效情況;也可以作為許多其他分布的近似,如,可將形狀參數(shù)設(shè)為合適的值以近似正態(tài)、對數(shù)正態(tài)、指數(shù)等分布。 二參數(shù)的威布爾分布主要用于滾動軸承的壽命試驗以及高應(yīng)力水平下的材料疲
勞試驗,三參數(shù)的威布爾分布用于低應(yīng)力水平的材料及某些零件的壽命試驗,一般而言,它具有比對數(shù)正態(tài)分布更大的適用性。但是,威布爾分布參數(shù)的分析法估計
較復雜,區(qū)間估計值過長,實踐中常采用概率紙估計法,從而降低了參數(shù)的估計精度.這是威布爾分布目前存在的主要缺點,也限制了它的應(yīng)用 [1] 歷史編輯1. 1927年,F(xiàn)réchet(1927)首先給出這一分布的定義。 2. 1933年,Rosin和Rammler在研究碎末的分布時,第一次應(yīng)用了韋伯分布(Rosin, P.; Rammler, E. (1933), 'The Laws Governing the Fineness of Powdered Coal', Journal of the Institute of Fuel 7: 29 - 36.)。 3. 1951年,瑞典工程師、數(shù)學家Waloddi Weibull(1887-1979)詳細解釋了這一分布,于是,該分布便以他的名字命名為Weibull Distribution。 應(yīng)用編輯2.工業(yè)制造 研究生產(chǎn)過程和運輸時間關(guān)系 4.預(yù)測天氣 6.雷達系統(tǒng) 對接受到的雜波信號的依分布建模 7.擬合度 無線通信技術(shù)中,相對指數(shù)衰減頻道模型,Weibull衰減模型對衰減頻道建模有較好的 擬合度8.量化壽險模型的重復索賠 9.預(yù)測技術(shù)變革 10.風速 由于曲線形狀與現(xiàn)實狀況很匹配,被用來描述風速的分布 4
|