首頁猿問 Pandas 數據框中的分組匹配

Pandas 數據框中的分組匹配

Python

函數式編程 2021-12-21 16:31:03

我有一個有兩列的熊貓數據框。第一列代表name該項目，第二列代表它的一些編碼為整數的屬性。一個項目可以有多個屬性。這是一個示例 name ids0 A 147 616 8131 B 51 616 13 8132 C 7763 D 51 671 13 813 10924 E 13 404 492 903 1093有 300 個這樣的獨特屬性編碼為整數，然后以id列中的字符串表示。我想要達到的目標：對于每個 id 找到它出現的行。例如，為了檢查id13，我將獲取行1, 3 and 4。在我們的數據集中與此 ID 對應的所有唯一 ID 是什么？例如，我會說，對于 id13: [51, 616, 813, 671, 1092, 404, 492, 903, 1093]一旦我們為每個 id 分組了行，我如何比較給定的 id 是否在該組中？例如，我想檢查 id52是否曾經與 id 一起出現13，如果是，在哪里以及發生了多少次？我一直在考慮這么久，但無法找到一種有效的方法來獲得前兩個和有效的方法以及 3）的 DS。請幫忙！

查看完整描述

2 回答

阿晨1998

TA貢獻2037條經驗獲得超6個贊

不使用任何 for 循環的解決方案

import pandas as pd

import numpu as np

df = pd.DataFrame({'name':'A B C D E'

.split(),'ids':['147 616 813','51 616 13 813','776','51 671 13 813 1092','13 404 492 903 1093']})

#Every input of i_d to functions in int

#to get indexes where id occurs

def rows(i_d):

i_d = str(i_d)

pattern1 = "[^0-9]" +i_d+"[^0-9]"

pattern2 = i_d+"[^0-9]"

pattern3 = "[^0-9]" +i_d

mask = df.ids.apply(lambda x: True if (len(re.findall(pattern1,x)) > 0) | (len(re.findall(pattern2,x))) | (len(re.findall(pattern3,x)) > 0) else False)

return df[mask].index.tolist()

#to get other ids occuring with the id in discussion

def colleagues(i_d):

i_d = str(i_d)

df.loc[rows(i_d),'temp'] = 1

k =list(set(df.groupby('temp').ids.apply(lambda x: ' '.join(x)).iloc[0].split()))

k.remove(i_d)

df.drop('temp',axis=1,inplace=True)

return k

#to get row indexes where 2 ids occur together

def third(i_d1,i_d2):

i_d1 = str(i_d1)

i_d2 = str(i_d2)

common_rows = list(np.intersect1d(rows(i_d1),rows(i_d2)))

if len(common_rows) > 0:

return print('Occured together at rows ',common_rows)

else:

return print("Didn't occur together")

反對回復 2021-12-21

Qyouu

TA貢獻1786條經驗獲得超11個贊

這是三個功能的建議：

import pandas as pd

# first we create the data

data = pd.DataFrame({'name': ['A','B','C','D','E'],

'ids': ['147 616 813','51 616 13 813','776','51 671 13 813

1092','13 404 492 903 1093']})

def func1(num, series):

# num must be an int

# series a Pandas series

tx = series.apply(lambda x: True if str(num) in x.split() else False)

output_list = series.index[tx].tolist()

return output_list

def func2(num, series):

# num must be an int

# series a Pandas series

series = series.iloc[func1(num, series)]

series = series.apply(lambda x: x.split()).tolist()

output_list = set([item for sublist in series for item in sublist])

output_list.remove(str(num))

return list(output_list)

def func3(num1,num2,series):

# num1 must be an int

# num2 must be an int

# series a Pandas series

if str(num1) in func2(num2, series):

num1_index = func1(num1, series)

num2_index = func1(num2, series)

return list(set(num1_index) & set(num2_index))

else:

return 'no match'

然后你可以測試它們：

func1(13, data['ids'])

func2(13, data['ids'])

func3(13,51,data['ids'])

反對回復 2021-12-21

2 回答
0 關注
186 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Pandas 數據框中的分組匹配

Pandas 數據框中的分組匹配

2 回答

添加回答