2 回答

TA貢獻1825條經驗 獲得超4個贊
這對我有用:
# First: group df by child id
grouped = df_input.groupby(['id_child'], as_index=True).apply(lambda a: a[:])
# Second: Create a new output dataframe
OUTPUT = pd.DataFrame(columns=['id_parent','id_child'])
# Third: Fill it with the unique childs ids and the minimun id for their parent in case of more than one.
for i,id_ch in enumerate(df_input.id_child.unique()):
OUTPUT.loc[i] = [min(grouped.loc[id_ch].id_parent), id_ch]

TA貢獻1820條經驗 獲得超9個贊
我可以使用得到結果drop_duplicates
In [6]: df
Out[6]:
id_parent id_child
0 1100 1090
1 1100 1080
2 1100 1070
3 1090 1080
4 1090 1070
5 1080 1070
In [9]: df.drop_duplicates(subset=['id_parent']).reset_index(drop=True)
Out[9]:
id_parent id_child
0 1100 1090
1 1090 1080
2 1080 1070
添加回答
舉報