3 回答

TA貢獻1796條經驗 獲得超4個贊
您可以嘗試這個示例來加快速度:
df1 = pd.DataFrame({'Word':['Introduction', 'database', 'country', 'search']})
df2 = pd.DataFrame({'Text':['Introduction to python', 'sql is a database', 'Introduction to python in our country', 'search for a python teacher in our country']})
tmp = pd.DataFrame(df2['Text'].str.split().explode()).set_index('Text').assign(c=1)
tmp = tmp.groupby(tmp.index)['c'].sum()
print( df1.merge(tmp, left_on='Word', right_on=tmp.index) )
印刷:
Word c
0 Introduction 2
1 database 1
2 country 2
3 search 1

TA貢獻1890條經驗 獲得超9個贊
Series.str.split
與Series.explode
for 系列單詞一起使用:
s = df2['Text'].str.split().explode()
#oldier pandas versions
#s = df2['Text'].str.split(expand=True).stack()
然后僅按Series.isin
和過濾匹配的值boolean indexing
,按Series.value_counts
和 最后一次使用進行計數DataFrame.join
:
df1 = df1.join(s[s.isin(df1['Word'])].value_counts().rename('Count'), on='Word')
print (df1)
? ? ? ? ? ?Word? Count
0? Introduction? ? ? 2
1? ? ? database? ? ? 1
2? ? ? ?country? ? ? 2
3? ? ? ? search? ? ? 1

TA貢獻1848條經驗 獲得超6個贊
這是簡單的解決方案
world_count = pd.DataFrame(
{'words': Word['Word'].tolist(),
'count': [Text['Text'].str.contains(w).sum() for w in words],
}).rename_axis('ID')
輸出:
world_count.head()
'''
words count
ID
0 Introduction 2
1 database 1
2 country 2
3 search 1
'''
逐步解決方案:
# Convert column to list
words = Word['Word'].tolist()
# Get the count
count = [Text['Text'].str.contains(w).sum() for w in words]
world_count = pd.DataFrame(
{'words': words,
'count': count,
}).rename_axis('ID')
提示:
我建議您轉換為小寫,這樣您就不會因為大/小寫而錯過任何計數
import re
import pandas as pd
world_count = pd.DataFrame(
{'words': Word['Word'].str.lower().str.strip().tolist(),
'count': [Text['Text'].str.contains(w,flags=re.IGNORECASE, regex=True).sum() for w in words],
}).rename_axis('ID')
添加回答
舉報