1 回答

TA貢獻1836條經驗 獲得超3個贊
我正在使用Nltk從句子中刪除停用詞。
例如。"I would love to fly again via American Airlines"
結果:"Love to fly American Airlines"
我曾嘗試過以下代碼:
# Tokenizing the text
txt = "I love to fly with American Airlines"
stopWords = set(stopwords.words("english"))
words = word_tokenize(txt)
# Creating a frequency table to keep the
# score of each word
freqTable = dict()
for word in words:
word = word.lower()
if word in stopWords:
continue
if word in freqTable:
freqTable[word] += 1
else:
freqTable[word] = 1
# Creating a dictionary to keep the score
# of each sentence
sentences = sent_tokenize(txt)
sentenceValue = dict()
for sentence in sentences:
for word, freq in freqTable.items():
if word in sentence.lower():
if sentence in sentenceValue:
sentenceValue[sentence] += freq
else:
sentenceValue[sentence] = freq
sumValues = 0
for sentence in sentenceValue:
sumValues += sentenceValue[sentence]
# Average value of a sentence from the original text
average = int(sumValues / len(sentenceValue))
# Storing sentences into our summary.
summary = ''
for sentence in sentences:
if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)):
summary += " " + sentence
print("Summary: " + summary)
這個結果是一個空字符串,因為我認為這個句子太短而無法Nltk工作。只是研究是否有更簡單的方法,我打算為此訓練一個模型。
添加回答
舉報