首頁猿問使用 nltk 之類的...

使用 nltk 之類的 Python 庫縮短句子

Python

拉風的咖菲貓 2023-02-15 16:51:42

我正在使用Nltk從句子中刪除停用詞。例如。"I would love to fly again via American Airlines"結果："Love to fly American Airlines"我曾嘗試過以下代碼：# Tokenizing the text txt = "I love to fly with American Airlines"stopWords = set(stopwords.words("english")) words = word_tokenize(txt) # Creating a frequency table to keep the # score of each word freqTable = dict() for word in words: word = word.lower() if word in stopWords: continue if word in freqTable: freqTable[word] += 1 else: freqTable[word] = 1# Creating a dictionary to keep the score # of each sentence sentences = sent_tokenize(txt) sentenceValue = dict() for sentence in sentences: for word, freq in freqTable.items(): if word in sentence.lower(): if sentence in sentenceValue: sentenceValue[sentence] += freq else: sentenceValue[sentence] = freq sumValues = 0for sentence in sentenceValue: sumValues += sentenceValue[sentence] # Average value of a sentence from the original text average = int(sumValues / len(sentenceValue)) # Storing sentences into our summary. summary = '' for sentence in sentences: if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)): summary += " " + sentence print("Summary: " + summary)這個結果是一個空字符串，因為我認為這個句子太短而無法Nltk工作。只是研究是否有更簡單的方法，我打算為此訓練一個模型。

查看完整描述

1 回答

米脂

TA貢獻1836條經驗獲得超3個贊

我正在使用Nltk從句子中刪除停用詞。

例如。"I would love to fly again via American Airlines"

結果："Love to fly American Airlines"

我曾嘗試過以下代碼：

# Tokenizing the text

txt = "I love to fly with American Airlines"

stopWords = set(stopwords.words("english"))

words = word_tokenize(txt)

# Creating a frequency table to keep the

# score of each word

freqTable = dict()

for word in words:

word = word.lower()

if word in stopWords:

continue

if word in freqTable:

freqTable[word] += 1

else:

freqTable[word] = 1

# Creating a dictionary to keep the score

# of each sentence

sentences = sent_tokenize(txt)

sentenceValue = dict()

for sentence in sentences:

for word, freq in freqTable.items():

if word in sentence.lower():

if sentence in sentenceValue:

sentenceValue[sentence] += freq

else:

sentenceValue[sentence] = freq

sumValues = 0

for sentence in sentenceValue:

sumValues += sentenceValue[sentence]

# Average value of a sentence from the original text

average = int(sumValues / len(sentenceValue))

# Storing sentences into our summary.

summary = ''

for sentence in sentences:

if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)):

summary += " " + sentence

print("Summary: " + summary)

這個結果是一個空字符串，因為我認為這個句子太短而無法Nltk工作。只是研究是否有更簡單的方法，我打算為此訓練一個模型。

反對回復 2023-02-15

1 回答
0 關注
138 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

使用 nltk 之類的 Python 庫縮短句子

使用 nltk 之類的 Python 庫縮短句子

1 回答

添加回答