透視表 pivot table
透視表(pivot table)是常見的數(shù)據(jù)匯總工具,它根據(jù)一個(gè)或多個(gè)鍵對(duì)數(shù)據(jù)進(jìn)行聚合,根據(jù)行和列上的分組鍵將數(shù)據(jù)分配到矩形區(qū)域中。 pandas中使用pivot_table方法創(chuàng)建透視表, pd.pivot_table(data,values=None,index=None,columns=None,aggfunc='mean' ,fill_value=None,margins=False,dropna=True,margins_name='ALL') data:DataFrame對(duì)象 values:要聚合的列或列的列表 index:數(shù)據(jù)透視的index,從原始數(shù)據(jù)的列中篩選 columns:數(shù)據(jù)透視表的columns,從原始數(shù)據(jù)的列中篩選 aggfunc:用于聚合的函數(shù),默認(rèn)為numpy,mean,支持numpy計(jì)算方法 ------------原數(shù)據(jù)------------ date key values 0 2019-01-01 a 6.372699 1 2019-01-02 b 0.649605 2 2019-01-03 c 4.897285 3 2019-01-01 d 7.758373 4 2019-01-02 a 1.576888 5 2019-01-03 b 8.217029 6 2019-01-01 c 5.454403 7 2019-01-02 a 5.072132 8 2019-01-03 b 2.875602 ------------透視表1------------
print(pd.pivot_table(df,values = 'values',index = ['date'],columns='key',aggfunc=np.sum)) key a b c d date 2019-01-01 6.372699 NaN 5.454403 7.758373 2019-01-02 6.649020 0.649605 NaN NaN 2019-01-03 NaN 11.092630 4.897285 NaN ------------透視表2------------
#分別以date,key共同做數(shù)據(jù)透視,值為values:統(tǒng)計(jì)不同(date,key)情況下values的計(jì)數(shù)
#aggfunc=len(或者count):計(jì)數(shù)
print(pd.pivot_table(df,values = 'values',index = ['date','key'],aggfunc=len)) values date key 2019-01-01 a 1.0 c 1.0 d 1.0 2019-01-02 a 2.0 b 1.0 2019-01-03 b 2.0 c 1.0
交叉表:crosstab
默認(rèn)情況下,crosstab計(jì)算因子的頻率,比如用于str的數(shù)據(jù)透視分析
pd.crosstab(index,columns,values=None,rownames=None ,colnames=None,aggfunc=None,margins=False,dropna=True,normalize=False)
------------原數(shù)據(jù)------------ A B C 0 1 3 1.0 1 2 3 1.0 2 2 4 NaN 3 2 4 1.0 4 2 4 1.0 ------------交叉表1------------
print(pd.crosstab(df['A'],df['B'])) B 3 4 A 1 1 0 2 1 3 ------------交叉表2------------
print(pd.crosstab(df['A'],df['B'],normalize=True))#以頻率的方式顯示 B 3 4 A 1 0.2 0.0 2 0.2 0.6 ------------交叉表3------------
print(pd.crosstab(df['A'],df['B'],values=df['C'],aggfunc=np.sum))
#values:根據(jù)因子聚合的值數(shù)組
#aggfunc:如果未傳遞values數(shù)組,則計(jì)算頻率表,如果傳遞數(shù)組,則按照指定計(jì)算
#這里相當(dāng)于以A和B界定分組,計(jì)算出每組中第三個(gè)系列C的值 B 3 4 A 1 1.0 NaN 2 1.0 2.0 ------------交叉表4------------
print(pd.crosstab(df['A'],df['B'],values=df['C'],aggfunc=np.sum,margins=True))
#margins:布爾值,默認(rèn)值False,添加行/列邊距(小計(jì)) B 3 4 All A 1 1.0 NaN 1.0 2 1.0 2.0 3.0 All 2.0 2.0 4.0
REF
https://blog.csdn.net/pythoncsdn111/article/details/98240358
|