亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

使用 nltk 之類的 Python 庫縮短句子

使用 nltk 之類的 Python 庫縮短句子

拉風的咖菲貓 2023-02-15 16:51:42
我正在使用Nltk從句子中刪除停用詞。例如。"I would love to fly again via American Airlines"結果:"Love to fly American Airlines"我曾嘗試過以下代碼:# Tokenizing the text txt = "I love to fly with American Airlines"stopWords = set(stopwords.words("english")) words = word_tokenize(txt) # Creating a frequency table to keep the  # score of each word freqTable = dict() for word in words:     word = word.lower()     if word in stopWords:         continue    if word in freqTable:         freqTable[word] += 1    else:         freqTable[word] = 1# Creating a dictionary to keep the score # of each sentence sentences = sent_tokenize(txt) sentenceValue = dict() for sentence in sentences:     for word, freq in freqTable.items():         if word in sentence.lower():             if sentence in sentenceValue:                 sentenceValue[sentence] += freq             else:                 sentenceValue[sentence] = freq sumValues = 0for sentence in sentenceValue:     sumValues += sentenceValue[sentence] # Average value of a sentence from the original text average = int(sumValues / len(sentenceValue)) # Storing sentences into our summary. summary = '' for sentence in sentences:     if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)):         summary += " " + sentence print("Summary: " + summary)這個結果是一個空字符串,因為我認為這個句子太短而無法Nltk工作。只是研究是否有更簡單的方法,我打算為此訓練一個模型。
查看完整描述

1 回答

?
米脂

TA貢獻1836條經驗 獲得超3個贊

我正在使用Nltk從句子中刪除停用詞。


例如。"I would love to fly again via American Airlines"


結果:"Love to fly American Airlines"


我曾嘗試過以下代碼:


# Tokenizing the text 

txt = "I love to fly with American Airlines"

stopWords = set(stopwords.words("english")) 

words = word_tokenize(txt) 


# Creating a frequency table to keep the  

# score of each word 


freqTable = dict() 

for word in words: 

    word = word.lower() 

    if word in stopWords: 

        continue

    if word in freqTable: 

        freqTable[word] += 1

    else: 

        freqTable[word] = 1


# Creating a dictionary to keep the score 

# of each sentence 

sentences = sent_tokenize(txt) 

sentenceValue = dict() 


for sentence in sentences: 

    for word, freq in freqTable.items(): 

        if word in sentence.lower(): 

            if sentence in sentenceValue: 

                sentenceValue[sentence] += freq 

            else: 

                sentenceValue[sentence] = freq 




sumValues = 0

for sentence in sentenceValue: 

    sumValues += sentenceValue[sentence] 


# Average value of a sentence from the original text 


average = int(sumValues / len(sentenceValue)) 


# Storing sentences into our summary. 

summary = '' 

for sentence in sentences: 

    if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)): 

        summary += " " + sentence 


print("Summary: " + summary)

這個結果是一個空字符串,因為我認為這個句子太短而無法Nltk工作。只是研究是否有更簡單的方法,我打算為此訓練一個模型。


查看完整回答
反對 回復 2023-02-15
  • 1 回答
  • 0 關注
  • 138 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號