1 回答

TA貢獻1883條經驗 獲得超3個贊
NaN如果需要每組的第一個非值,請使用GroupBy.first:
df1 = df.groupby([0,1], as_index=False).first()
print (df1)
0 1 2 3 4 5 6
0 x x 1.0 5.0 7.0 4.0 9.0
1 x y 1.0 9.0 4.0 5.0 10.0
2 y y 4.0 4.0 4.0 4.0 4.0
3 y z 5.0 2.0 7.0 4.0 0.0
print (df)
0 1 2 3 4 5 6
0 x x 10.0 NaN NaN NaN NaN
1 x x 20.0 NaN NaN NaN NaN
2 x x 1.0 NaN NaN NaN NaN
3 x y 1.0 NaN NaN NaN NaN
4 y y 4.0 4.0 4.0 4.0 4.0
5 y z 5.0 2.0 7.0 4.0 0.0
6 x x NaN 5.0 7.0 4.0 9.0
7 x x NaN 50.0 70.0 4.0 9.0
8 x y NaN 9.0 4.0 5.0 10.0
如果每個組可能有更多沒有 NaN 的行,則會丟失一些數據:
df1 = df.groupby([0,1], as_index=False).first()
print (df1)
0 1 2 3 4 5 6
0 x x 10.0 5.0 7.0 4.0 9.0
1 x y 1.0 9.0 4.0 5.0 10.0
2 y y 4.0 4.0 4.0 4.0 4.0
3 y z 5.0 2.0 7.0 4.0 0.0
具有自定義功能的可能解決方案:
def f(x):
df1 = pd.DataFrame({y: pd.Series(x[y].dropna().values) for y in x})
return (df1)
df = df.set_index([0,1]).groupby([0,1]).apply(f).reset_index(level=2, drop=True).reset_index()
print (df)
0 1 2 3 4 5 6
0 x x 10.0 5.0 7.0 4.0 9.0
1 x x 20.0 50.0 70.0 4.0 9.0
2 x x 1.0 NaN NaN NaN NaN
3 x y 1.0 9.0 4.0 5.0 10.0
4 y y 4.0 4.0 4.0 4.0 4.0
5 y z 5.0 2.0 7.0 4.0 0.0
添加回答
舉報