我想要一個基于 id 的 grouby 和 sum,但顯示所有列作為結果。示例代碼import pandas as pdimport numpy as npmre = [ ["2018-1", "Sold", 109000.0, "Appartement", 73.0, 4.0], ["2018-1", "Sold", 109000.0, "Appartement", "NaN", 0.0], ["2018-2", "Sold", 239300.0, "House", 163.0, 4.0], ["2018-2", "Sold", 239300.0, "House", 51.0, 2.0], ["2018-2", "Sold", 239300.0, "House", 51.0, 2.0]]df = pd.DataFrame(mre)# Rename columnsdf.columns = ["_idMutation", "typeOfSearch", "price", "typeOfBuilding", "surface", "nbRoom"]df["surface"] = df["surface"].astype(float)print(df)基礎數據框 _idMutation typeOfSearch price typeOfBuilding surface nbRoom0 2018-1 Sold 109000.0 Appartement 73.0 4.01 2018-1 Sold 109000.0 Appartement NaN 0.02 2018-2 Sold 239300.0 House 163.0 4.03 2018-2 Sold 239300.0 House 51.0 2.04 2018-2 Sold 239300.0 House 51.0 2.0預期成績是groupby基于_idMutation,它對surface和 進行求和nbRoom,但不影響其他行。我想顯示所有列,刪除重復項_idMutation并顯示結果groupby _idMutation typeOfSearch price typeOfBuilding surface nbRoom0 2018-1 Sold 109000.0 Appartement 73.0 4.01 2018-2 Sold 239300.0 House 265.0 8.0當前代碼以下解決方案產生預期結果。我有 1460 萬行,而我提出的解決方案看起來并沒有優化。# Groupby on _idMutation & sum ["surface", "nbRoom"]gb_df = df[["surface", "nbRoom"]].groupby(df["_idMutation"]).sum()# Delete duplicates _idMutationdf.drop_duplicates(subset=["_idMutation"], inplace=True)# Set _idMutation as df indexdf.set_index("_idMutation", inplace=True)# Concat df with gb_dfdf = pd.concat( [df[["typeOfSearch", "price", "typeOfBuilding"]], gb_df], axis=1)
1 回答

BIG陽
TA貢獻1859條經驗 獲得超6個贊
我們可以使用GroupBy.agg字典來使用和設置每列所需的聚合方法。在這種情況下,我們只需要first和sum:
dfg = df.groupby("_idMutation", as_index=False).agg({
"typeOfSearch": "first",
"price": "first",
"typeOfBuilding": "first",
"surface": "sum",
"nbRoom": "sum"
})
_idMutation typeOfSearch price typeOfBuilding surface nbRoom
0 2018-1 Sold 109000.0 Appartement 73.0 4.0
1 2018-2 Sold 239300.0 House 265.0 8.0
添加回答
舉報
0/150
提交
取消