統(tǒng)計學_生存分析/Weibull Distribution韋布爾分布（python代碼實現(xiàn)）

半扇窗 2020-08-07

展開全文

python機器學習生物信息學系列課（博主錄制）：http://dwz.date/b9vw

測試腳本

測試數(shù)據(jù)

T is an array of durations, E is a either boolean or binary array representing whether the “death” was observed (alternatively an individual can be censored).

import lifelines

from lifelines.datasets import load_waltons

df = load_waltons() # returns a Pandas DataFrame

T = df['T']

E = df['E']

from lifelines import KaplanMeierFitter

kmf = KaplanMeierFitter()

kmf.fit(T, event_observed=E) # more succiently, kmf.fit(T,E)

kmf.survival_function_

'''

Out[7]:

KM_estimate

timeline

0.0 1.000000

6.0 0.993865

7.0 0.987730

9.0 0.969210

13.0 0.950690

15.0 0.938344

17.0 0.932170

19.0 0.913650

22.0 0.888957

26.0 0.858090

29.0 0.827224

32.0 0.821051

33.0 0.802531

36.0 0.790184

38.0 0.777837

41.0 0.734624

43.0 0.728451

45.0 0.672891

47.0 0.666661

48.0 0.616817

51.0 0.598125

'''

kmf.median_

'''

Out[8]: 56.0

'''

kmf.plot()

import lifelines

from lifelines.datasets import load_waltons

from lifelines import KaplanMeierFitter

df = load_waltons() # returns a Pandas DataFrame

kmf = KaplanMeierFitter()

T = df['T']

E = df['E']

groups = df['group']

ix = (groups == 'miR-137')

kmf.fit(T[~ix], E[~ix], label='control')

ax = kmf.plot()

kmf.fit(T[ix], E[ix], label='miR-137')

kmf.plot(ax=ax)

import numpy as np

import matplotlib.pyplot as plt

from lifelines.plotting import plot_lifetimes

from numpy.random import uniform, exponential

N = 25

current_time = 10

actual_lifetimes = np.array([[exponential(12), exponential(2)][uniform()<0.5] for i in range(N)])

observed_lifetimes = np.minimum(actual_lifetimes,current_time)

observed= actual_lifetimes < current_time

plt.xlim(0,25)

plt.vlines(10,0,30,lw=2, linestyles='--')

plt.xlabel('time')

plt.title('Births and deaths of our population, at $t=10$')

plot_lifetimes(observed_lifetimes, event_observed=observed)

print 'Observed lifetimes at time %d:\n'%(current_time), observed_lifetimes

import pandas as pd

import lifelines

from lifelines import KaplanMeierFitter

import matplotlib.pyplot as plt

data = lifelines.datasets.load_dd()

kmf = KaplanMeierFitter()

T = data['duration']

C = data['observed']

kmf.fit(T, event_observed=C )

plt.title('Survival function of political regimes')

kmf.survival_function_.plot()

kmf.plot()

kmf.median_

import pandas as pd

import lifelines

from lifelines import KaplanMeierFitter

import matplotlib.pyplot as plt

data = lifelines.datasets.load_dd()

kmf = KaplanMeierFitter()

T = data['duration']

C = data['observed']

kmf.fit(T, event_observed=C )

plt.title('Survival function of political regimes')

kmf.survival_function_.plot()

kmf.plot()

ax = plt.subplot(111)

dem = (data['democracy'] == 'Democracy')

kmf.fit(T[dem], event_observed=C[dem], label='Democratic Regimes')

kmf.plot(ax=ax, ci_force_lines=True)

kmf.fit(T[~dem], event_observed=C[~dem], label='Non-democratic Regimes')

plt.ylim(0,1);

plt.title('Lifespans of different global regimes')

kmf.plot(ax=ax, ci_force_lines=True)

應(yīng)用于保險業(yè)，病人治療，信用卡詐騙

信用卡拖欠

具體文檔

http://lifelines./en/latest/

https://wenku.baidu.com/view/577041d3a1c7aa00b52acb2e.html?from=search

https://wenku.baidu.com/view/a5adff8b89eb172ded63b7d6.html?from=search

https://github.com/thomas-haslwanter/statsintro_python/tree/master/ISP/Code_Quantlets/10_SurvivalAnalysis/lifelinesDemo

測試代碼

# -*- coding: utf-8 -*-

# Import standard packages

import numpy as np

import matplotlib.pyplot as plt

from numpy.random import uniform, exponential

import os

# additional packages

from lifelines.plotting import plot_lifetimes

import sys

sys.path.append(os.path.join('..', '..', 'Utilities'))

try:

# Import formatting commands if directory 'Utilities' is available

from ISP_mystyle import setFonts

except ImportError:

# Ensure correct performance otherwise

def setFonts(*options):

return

# Generate some dummy data

np.set_printoptions(precision=2)

N = 20

study_duration = 12

# Note: a constant dropout rate is equivalent to an exponential distribution!

actual_subscriptiontimes = np.array([[exponential(18), exponential(3)][uniform()<0.5] for i in range(N)])

observed_subscriptiontimes = np.minimum(actual_subscriptiontimes,study_duration)

observed= actual_subscriptiontimes < study_duration

# Show the data

setFonts(18)

plt.xlim(0,24)

plt.vlines(12, 0, 30, lw=2, linestyles='--')

plt.xlabel('time')

plt.title('Subscription Times, at $t=12$ months')

plot_lifetimes(observed_subscriptiontimes, event_observed=observed)

print('Observed subscription time at time %d:\n'%(study_duration), observed_subscriptiontimes)

where k > 0 is the shape parameter and > 0 is the scale parameter of the
distribution. (It is one of the rare cases where we use a shape parameter different
from skewness and kurtosis.) Its complementary cumulative distribution function is
a stretched exponential function.
If the quantity x is a “time-to-failure,” theWeibull distribution gives a distribution
for which the failure rate is proportional to a power of time. The shape parameter, k,
is that power plus one, and so this parameter can be interpreted directly as follows:
· Avalueofk < 1 indicates that the failure rate decreases over time. This happens
if there is significant “infant mortality,” or defective items failing early and the
failure rate decreasing over time as the defective items are weeded out of the
population.
· Avalueofk D 1 indicates that the failure rate is constant over time. This might
suggest random external events are causing mortality, or failure.
· Avalueofk > 1 indicates that the failure rate increases with time. This happens
if there is an “aging” process, or parts that are more likely to fail as time goes on.
An example would be products with a built-in weakness that fail soon after the
warranty expires.
In the field of materials science, the shape parameter k of a distribution of
strengths is known as the Weibull modulus.

威布爾分布（Weibull distribution），又稱韋伯分布或韋布爾分布，是可靠性分析和壽命檢驗的理論基礎(chǔ)。

威布爾分布：在可靠性工程中被廣泛應(yīng)用，尤其適用于機電類產(chǎn)品的磨損累計失效的分布形式。由于它可以利用概率值很容易地推斷出它的分布參數(shù)，被廣泛應(yīng)用于各種壽命試驗的數(shù)據(jù)處理。

隨機變量分布之一。威布爾分布(Ⅲ型極值分布)記為W(k,a,b)。
　　瑞典工程師威布爾從30年代開始研究軸承壽命，以后又研究結(jié)構(gòu)強度和疲勞等問題。他采用了“鏈式”模型來解釋結(jié)構(gòu)強度和壽命問題。這個模型假設(shè)一個結(jié)構(gòu) 是由若干小元件(設(shè)為n個)串聯(lián)而成，于是可以形象地將結(jié)構(gòu)看成是由n個環(huán)構(gòu)成的一條鏈條，其強度(或壽命)取決于最薄弱環(huán)的強度(或壽命)。單個鏈的強度(或壽命)為一隨機變量，設(shè)各環(huán)強度(或壽命)相互獨立，分布相同，則求鏈強度(或壽命)的概率分布就變成求極小值分布問題，由此給出威布爾分布函數(shù)。由于零件或結(jié)構(gòu)的疲勞強度(或壽命)也應(yīng)取決于其最弱環(huán)的強度(或壽命)，也應(yīng)能用威布爾分布描述。
　　根據(jù)1943年蘇聯(lián)格涅堅科的研究結(jié)果，不管隨機變量的原始分布如何，它的極小值的漸近分布只能有三種，而威布爾分布就是第Ⅲ種極小值分布。
　　由于威布爾分布是根據(jù)最弱環(huán)節(jié)模型或串聯(lián)模型得到的，能充分反映材料缺陷和應(yīng)力集中源對材料疲勞壽命的影響，而且具有遞增的失效率，所以，將它作為材料或零件的壽命分布模型或給定壽命下的疲勞強度模型是合適的。
　　威布爾分布有多種形式，包括一參數(shù)威布爾分布、二參數(shù)威布爾分布、三參數(shù)威布爾分布或混合威布爾分布。三參數(shù)的威布爾分布由形狀、尺度（范圍）和位置三個參數(shù)決定。其中形狀參數(shù)是最重要的參數(shù)，決定分布密度曲線的基本形狀，尺度參數(shù)起放大或縮小曲線的作用，但不影響分布的形狀。通過改變形狀參數(shù)可以表示不同階段的失效情況；也可以作為許多其他分布的近似，如，可將形狀參數(shù)設(shè)為合適的值以近似正態(tài)、對數(shù)正態(tài)、指數(shù)等分布。

二參數(shù)的威布爾分布主要用于滾動軸承的壽命試驗以及高應(yīng)力水平下的材料疲勞試驗，三參數(shù)的威布爾分布用于低應(yīng)力水平的材料及某些零件的壽命試驗，一般而言，它具有比對數(shù)正態(tài)分布更大的適用性。但是，威布爾分布參數(shù)的分析法估計較復雜，區(qū)間估計值過長，實踐中常采用概率紙估計法，從而降低了參數(shù)的估計精度．這是威布爾分布目前存在的主要缺點，也限制了它的應(yīng)用^[1]

歷史

編輯

1. 1927年，F(xiàn)réchet(1927)首先給出這一分布的定義。

2. 1933年，Rosin和Rammler在研究碎末的分布時，第一次應(yīng)用了韋伯分布（Rosin, P.; Rammler, E. (1933), 'The Laws Governing the Fineness of Powdered Coal', Journal of the Institute of Fuel 7: 29 - 36.）。

3. 1951年，瑞典工程師、數(shù)學家Waloddi Weibull（1887-1979）詳細解釋了這一分布，于是，該分布便以他的名字命名為Weibull Distribution。