1 回答

TA貢獻1805條經驗 獲得超10個贊
您可以使用nltk(df作為您共享的輸入數據框):
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
ps = PorterStemmer()
df["Stem"] = df["Word"].apply(ps.stem)
res = df.groupby("Stem")["Frequency"].sum()
輸出(對于您分享的作品):
Stem
10 6309
bad 5331
cat 4244
charact 16926
dog 17054
end 8406
feel 4833
game 52055
gameplay 6195
good 6496
graphic 4372
great 3466
kill 12279
laura 24953
like 12792
love 3059
luke 21133
never 2965
new 2963
peopl 7933
play 8420
reveng 5922
stori 20739
time 4272
Name: Frequency, dtype: int64
添加回答
舉報