3 回答

TA貢獻1810條經驗 獲得超4個贊
set.intersection
壓縮兩列后看起來像:
[len(set(a).intersection(set(b))) for a,b in zip(df[0],df[1])] #[3, 2, 2, 2, 3]

TA貢獻1804條經驗 獲得超2個贊
如果您比較具有相同多個字符的名稱,例如,其他解決方案將失敗。AAL0和AAL0X24。這里的結果應該是 4。
from collections import Counter
df = pd.DataFrame(data=[["AL0","CP1","NM3","PK9","RM2", "AAL0"],
["AL0X24", "CXP44", "MLN", "KKRR9", "22MMRRS", "AAL0X24"]]).T
def num_shared_chars(char_counter1, char_counter2):
shared_chars = set(char_counter1.keys()).intersection(char_counter2.keys())
return sum([min(char_counter1[k], char_counter2[k]) for k in shared_chars])
df_counter = df.applymap(Counter)
df['shared_chars'] = df_counter.apply(lambda row: num_shared_chars(row[0], row[1]), axis = 'columns')
結果:
0 1 shared_chars
0 AL0 AL0X24 3
1 CP1 CXP44 2
2 NM3 MLN 2
3 PK9 KKRR9 2
4 RM2 22MMRRS 3
5 AAL0 AAL0X24 4

TA貢獻1825條經驗 獲得超6個贊
堅持數據幀數據結構,你可以這樣做:
>>> def count_common(s1, s2):
... return len(set(s1) & set(s2))
...
>>> df["result"] = df.apply(lambda x: count_common(x[0], x[1]), axis=1)
>>> df
0 1 result
0 AL0 AL0X24 3
1 CP1 CXP44 2
2 NM3 MLN 2
3 PK9 KKRR9 2
4 RM2 22MMRRS 3
添加回答
舉報