1 回答

TA貢獻1784條經驗 獲得超8個贊
DataFrame.swaplevel
與 一起使用DataFrame.sort_index
,還添加了另一個解決方案reindex
:
rng = pd.date_range('2015', '2017', freq='YS').year
c = df['city'].unique()
d = df['district'].unique()
mux = pd.MultiIndex.from_product([c, d, rng], names=['city','district','year'])
df = df.set_index(['city','district','year']).reindex(mux)
df['pct'] = df.sort_values('year').groupby(['city', 'district']).value.pct_change()
df = df.pivot_table(columns='year',
index=['city','district'],
values=['value', 'pct'],
fill_value='NaN')
df = df.swaplevel(0,1, axis=1).sort_index(axis=1, level=0)
print (df)
year 2015 2016 2017
pct value pct value pct value
city district
bj c NaN 4.0 0.0 NaN -0.25 3
sh a NaN 2.0 0.5 3 0.00 NaN
b NaN 5.0 -0.4 3 0.00 NaN
編輯:錯誤:
ValueError:無法處理非唯一的多索引!
表示每個傳遞給 groupby 的列都有重復項,所以這里是 by ['city','district','year']。解決方案是創建唯一值 - 例如通過聚合平均值:
print (df)
# city district value year
#0 sh a 2 2015
#0 sh a 20 2015
#1 sh a 3 2016
#2 sh b 5 2015
#3 sh b 3 2016
#4 bj c 4 2015
#5 bj c 3 2017
rng = pd.date_range('2015', '2017', freq='YS').year
c = df['city'].unique()
d = df['district'].unique()
mux = pd.MultiIndex.from_product([c, d, rng], names=['city','district','year'])
print (df.groupby(['city','district','year'])['value'].mean())
city district year
bj c 2015 4
2017 3
sh a 2015 11
2016 3
b 2015 5
2016 3
Name: value, dtype: int64
df = df.groupby(['city','district','year'])['value'].mean().reindex(mux)
print (df)
#city district year
#sh a 2015 11.0
# 2016 3.0
# 2017 NaN
# b 2015 5.0
# 2016 3.0
# 2017 NaN
# c 2015 NaN
# 2016 NaN
# 2017 NaN
#bj a 2015 NaN
# 2016 NaN
# 2017 NaN
# b 2015 NaN
# 2016 NaN
# 2017 NaN
# c 2015 4.0
# 2016 NaN
# 2017 3.0
#Name: value, dtype: float64
添加回答
舉報