1 回答

TA貢獻1876條經驗 獲得超5個贊
您可以將所有列表轉換為集合,并將它們的并集作為最終集合。然后只需要檢查你的單詞在集合中的成員資格。像下面這樣的東西會起作用:
# existing code
from nltk.corpus import stopwords
days=['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
# need to put into lower case
months=['January','February','March', 'April','May','June','July','August','September','October','November','December']
# need to put into lower case
# add these lines
stop_words = set(stopwords.words('english'))
lowercase_days = {item.lower() for item in days}
lowercase_months = {item.lower() for item in months}
exclusion_set = lowercase_days.union(lowercase_months).union(stop_words)
# now do the final check
cleaned = [w for w in remove_punc.split() if w.lower() not in exclusion_set and not w.isdigit()]
添加回答
舉報