首頁猿問 Python，循環遍歷某個目錄下的...

Python，循環遍歷某個目錄下的文件，統計詞頻，輸出結果到txt

Python

手掌心 2023-06-06 16:42:20

我拼湊了一些打開文本文件的工作 python，將其轉換為小寫，消除停用詞，并輸出文件中最常用詞的列表：from collections import Counterfrom nltk.corpus import stopwordsfrom nltk.tokenize import word_tokenizestop_words = set(stopwords.words('english'))file1 = open("ocr.txt")line = file1.read()words = line.split()words = [word.lower() for word in words]for r in words: if not r in stop_words: appendFile = open('cleaned_output.txt','a') appendFile.write(" "+r) appendFile.close()with open("cleaned_output.txt") as input_file: count = Counter(word for line in input_file for word in line.split())print(count.most_common(10), file=open('test.txt','a'))我想修改它以對目錄中的所有文件執行相同的操作，并將結果輸出到唯一的文本文件或作為 csv 中的行。我知道這os.path可能可以在這里使用，但我不確定如何使用。我真的很感激一些幫助。先感謝您！

查看完整描述

2 回答

吃雞游戲

TA貢獻1829條經驗獲得超7個贊

我將您的代碼片段轉換為一個函數，該函數將包含輸入文件的文件夾的路徑作為參數。以下代碼獲取指定文件夾中的所有文件，并為該文件夾中的每個文件生成 cleaned_output.txt 和 test.txt 到新創建的輸出目錄。輸出文件在末尾附加了它們生成的輸入文件的名稱，以便更容易區分它們，但您可以更改它以滿足您的需要。

from collections import Counter

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

import os

path = 'input/'

def clean_text(path):

try:

os.mkdir('output')

except:

pass

out_path = 'output/'

files = [f for f in os.listdir(path) if os.path.isfile(path+f)]

file_paths = [path+f for f in files]

file_names = [f.strip('.txt') for f in files]

for idx, f in enumerate(file_paths):

stop_words = set(stopwords.words('english'))

file1 = open(f)

line = file1.read()

words = line.split()

words = [word.lower() for word in words]

print(words)

for r in words:

if not r in stop_words:

appendFile = open(out_path + 'cleaned_output_{}.txt'.format(file_names[idx]),'a')

appendFile.write(" "+r)

appendFile.close()

with open(out_path + 'cleaned_output_{}.txt'.format(file_names[idx])) as input_file:

count = Counter(word for line in input_file

for word in line.split())

print(count.most_common(10), file=open(out_path + 'test_{}.txt'.format(file_names[idx]),'a'))

clean_text(path)

這是你要找的嗎？

反對回復 2023-06-06

達令說

TA貢獻1821條經驗獲得超6個贊

您可以使用os.listdir獲取目錄中的所有文件。這將返回一個目錄中所有項目的路徑列表作為您可以迭代的字符串。

反對回復 2023-06-06

2 回答
0 關注
190 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Python，循環遍歷某個目錄下的文件，統計詞頻，輸出結果到txt

Python，循環遍歷某個目錄下的文件，統計詞頻，輸出結果到txt

2 回答

添加回答

Python，循環遍歷某個目錄下的文件，統計詞頻，輸出結果到txt

Python，循環遍歷某個目錄下的文件，統計詞頻，輸出結果到txt