2 回答

TA貢獻1816條經驗 獲得超4個贊
讓我們試試這個:
df = pd.DataFrame({'standard_supplier_name':['ibl america', 'b.v. shie van'],
'index':['aa, human, tag, bachulovius,slam, family, member, aa , human,tag',
'aanbrengen, looproute, bij']})
df = df.set_index('index')
#input df
df.reset_index()\
.set_index('standard_supplier_name')['index'].str.split(',')\
.explode().str.strip().value_counts()
輸出:
human 2
tag 2
aa 2
looproute 1
bij 1
aanbrengen 1
member 1
family 1
bachulovius 1
slam 1
Name: index, dtype: int64

TA貢獻1830條經驗 獲得超3個贊
這是我對您要達到的目標的最佳猜測。
將來,嘗試提供數據集和最小的可重現示例。
desc = ["aa, bc, cd, cd, aa, bb", "xy, jk, yb"]
comp = ["abc", "xyz"]
df = pd.DataFrame({"comp": comp, "desc":desc})
#split words into tokens
df["desc"] = df.desc.str.split(", ")
#stack words per group
stacked = pd.DataFrame(df.desc.tolist(), index=df.comp).stack().reset_index()
stacked.columns = ["comp", "drop", "token"]
#group by comp and token and count occurances
stacked.groupby(["comp","token"]).size().reset_index()
添加回答
舉報