亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

如何將文本從文本文件轉換為具有詞頻值的庫鍵?

如何將文本從文本文件轉換為具有詞頻值的庫鍵?

冉冉說 2022-07-12 09:50:06
我正在嘗試從具有多個關鍵字的四個不同文本文件中提取信息。我想提取這些關鍵字并將詞頻附加到關鍵字上。文本文件如下所示:test1 = apple banana lemontest2 = apple bananatest3 = lemon apple lemontest4 =  apple lemon grape我認為粗體代碼(第二段)存在問題,我不確定應該如何構建初始字典。test1= [line.rstrip('\n') for line in open("test1.txt")]test2= [line.rstrip('\n') for line in open("test2.txt")]test3= [line.rstrip('\n') for line in open("test3.txt")]test4= [line.rstrip('\n') for line in open("test4.txt")]**text_file = test1, test2, test3, test4word_frequencies = 0text_collection = {}**def dictionary(text):    keywords = re.split(r'\W', text)    print(text)    word_frequencies = dict()    for word in keyword:        if word in word_frequences:            word_frequences[word] += 1        else:            word_frequencies[word] = 1    return word_frequenciesfor all in text_file:    file = open(all)    text = file.read()    print(file)    text_collection[all] = dictionary(text)print(text_collection)期望的輸出:{'test1.txt': {'apple': 1, 'banana': 1, 'lemon': 1},'test2.txt': {'apple': 1, 'banana': 1},'test3.txt': {'apple': 1, 'lemon': 2},'test4.txt': {'apple': 1, 'lemon': 1, 'grape': 1}}我寧愿不使用導入的庫作為答案。這段代碼更多的是為了練習而不是效率:)
查看完整描述

1 回答

?
慕沐林林

TA貢獻2016條經驗 獲得超9個贊

重用代碼來自Efficiently count word frequency in python with little modify


from collections import Counter

from itertools import chain

import pprint


def file_word_counts(filename):

    " Word count of file "

    # Use intertools.Counter to count words

    # Convert counter result to regular dict (i.e. dict(Counter(..))

    with open(filename) as f:

        return dict(Counter(chain.from_iterable(map(str.split, f))))


def file_counts(files):

  " Aggregate word count of muiltiple files into dictionary "

  return {filename:file_word_counts(filename) for filename in files}


# Show Results

pp = pprint.PrettyPrinter(indent=4)


pp.pprint(file_counts(['test1.txt', 'test2.txt', 'test3.txt', 'test4.txt']))

輸出


{   'test1.txt': {'apple': 1, 'banana': 1, 'lemon': 1},    

    'test2.txt': {'apple': 1, 'banana': 1},    

    'test3.txt': {'apple': 1, 'lemon': 2},

    'test4.txt': {'apple': 1, 'grape': 1, 'lemon': 1}}

選擇


在不使用其他模塊的情況下生產相同的產品


def file_counts(files):

  " Aggregate word count of muiltiple files into dictionary "

  return {filename:file_word_counts(filename) for filename in files}


def file_word_counts(filename):

    " Word count of file "

    count_ = {}

    with open(filename) as f:

      for line in f:

        for i in line.rstrip().split():

          count_.setdefault(i, 0)

          count_[i] += 1

      return count_


def file_counts(files):

  " Aggregate word count of muiltiple files into dictionary "

  return {filename:file_word_counts(filename) for filename in files}


print(file_counts(['test1.txt', 'test2.txt', 'test3.txt', 'test4.txt']))


查看完整回答
反對 回復 2022-07-12
  • 1 回答
  • 0 關注
  • 121 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號