首頁猿問如何將文本從文本文件轉換為具有詞頻...

如何將文本從文本文件轉換為具有詞頻值的庫鍵？

Python

冉冉說 2022-07-12 09:50:06

我正在嘗試從具有多個關鍵字的四個不同文本文件中提取信息。我想提取這些關鍵字并將詞頻附加到關鍵字上。文本文件如下所示：test1 = apple banana lemontest2 = apple bananatest3 = lemon apple lemontest4 = apple lemon grape我認為粗體代碼（第二段）存在問題，我不確定應該如何構建初始字典。test1= [line.rstrip('\n') for line in open("test1.txt")]test2= [line.rstrip('\n') for line in open("test2.txt")]test3= [line.rstrip('\n') for line in open("test3.txt")]test4= [line.rstrip('\n') for line in open("test4.txt")]**text_file = test1, test2, test3, test4word_frequencies = 0text_collection = {}**def dictionary(text): keywords = re.split(r'\W', text) print(text) word_frequencies = dict() for word in keyword: if word in word_frequences: word_frequences[word] += 1 else: word_frequencies[word] = 1 return word_frequenciesfor all in text_file: file = open(all) text = file.read() print(file) text_collection[all] = dictionary(text)print(text_collection)期望的輸出：{'test1.txt': {'apple': 1, 'banana': 1, 'lemon': 1},'test2.txt': {'apple': 1, 'banana': 1},'test3.txt': {'apple': 1, 'lemon': 2},'test4.txt': {'apple': 1, 'lemon': 1, 'grape': 1}}我寧愿不使用導入的庫作為答案。這段代碼更多的是為了練習而不是效率:)

查看完整描述

1 回答

慕沐林林

TA貢獻2016條經驗獲得超9個贊

重用代碼來自Efficiently count word frequency in python with little modify

from collections import Counter

from itertools import chain

import pprint

def file_word_counts(filename):

" Word count of file "

# Use intertools.Counter to count words

# Convert counter result to regular dict (i.e. dict(Counter(..))

with open(filename) as f:

return dict(Counter(chain.from_iterable(map(str.split, f))))

def file_counts(files):

" Aggregate word count of muiltiple files into dictionary "

return {filename:file_word_counts(filename) for filename in files}

# Show Results

pp = pprint.PrettyPrinter(indent=4)

pp.pprint(file_counts(['test1.txt', 'test2.txt', 'test3.txt', 'test4.txt']))

輸出

{ 'test1.txt': {'apple': 1, 'banana': 1, 'lemon': 1},

'test2.txt': {'apple': 1, 'banana': 1},

'test3.txt': {'apple': 1, 'lemon': 2},

'test4.txt': {'apple': 1, 'grape': 1, 'lemon': 1}}

選擇

在不使用其他模塊的情況下生產相同的產品

def file_counts(files):

" Aggregate word count of muiltiple files into dictionary "

return {filename:file_word_counts(filename) for filename in files}

def file_word_counts(filename):

" Word count of file "

count_ = {}

with open(filename) as f:

for line in f:

for i in line.rstrip().split():

count_.setdefault(i, 0)

count_[i] += 1

return count_

def file_counts(files):

" Aggregate word count of muiltiple files into dictionary "

return {filename:file_word_counts(filename) for filename in files}

print(file_counts(['test1.txt', 'test2.txt', 'test3.txt', 'test4.txt']))

反對回復 2022-07-12

1 回答
0 關注
121 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何將文本從文本文件轉換為具有詞頻值的庫鍵？

如何將文本從文本文件轉換為具有詞頻值的庫鍵？

1 回答

添加回答

如何將文本從文本文件轉換為具有詞頻值的庫鍵？

如何將文本從文本文件轉換為具有詞頻值的庫鍵？