首頁猿問 Python -...

Python - 使用字典和元組查找單詞和字母的唯一計數

Python

慕娘9325324 2022-09-27 14:55:54

我目前正在嘗試創建一個腳本，該腳本允許我運行文件中包含的文本并計算單詞數，不同單詞，列出前10個最頻繁的單詞和計數，并將字符頻率從最頻繁到最不頻繁進行排序。以下是我到目前為止所擁有的：import sysimport osos.getcwd()import stringpath = ""os.chdir(path)#Prompt for user to input filename:fname = input('Enter the filename: ')try: fhand = open(fname)except IOError: #Invalid filename error print('\n') print("Sorry, file can't be opened! Please check your spelling.") sys.exit()#Initialize char counts and word counts dictionarycounts = {}worddict = {}#For character and word frequency countfor line in fhand: #Remove leading spaces line = line.strip() #Convert everything in the string to lowercase line = line.lower() #Take into account punctuation line = line.translate(line.maketrans('', '', string.punctuation)) #Take into account white spaces line = line.translate(line.maketrans('', '', string.whitespace)) #Take into account digits line = line.translate(line.maketrans('', '', string.digits)) #Splitting line into words words = line.split(" ") for word in words: #Is the word already in the word dictionary? if word in worddict: #Increase by 1 worddict[word] += 1 else: #Add word to dictionary with count of 1 if not there already worddict[word] = 1 #Character count for word in line: #Increase count by 1 if letter if word in counts: counts[word] += 1 else: counts[word] = 1#Initialize dictionarieslst = []countlst = []freqlst = []#Count up the number of lettersfor ltrs, c in counts.items(): lst.append((c,ltrs)) countlst.append(c)#Sum up the counttotalcount = sum(countlst)#Calculate the frequency in each dictionaryfor ec in countlst: efreq = (ec/totalcount) * 100 freqlst.append(efreq)#Sort lists by count and percentage frequencyfreqlst.sort(reverse=True)lst.sort(reverse=True)

查看完整描述

2 回答

揚帆大魚

TA貢獻1799條經驗獲得超9個贊

line = line.translate(line.maketrans('', '', string.whitespace))

您正在刪除包含此代碼的行中的所有空格。刪除它，它應該按預期工作。

反對回復 2022-09-27

躍然一笑

TA貢獻1826條經驗獲得超6個贊

您的代碼會刪除空格以按空格拆分 - 這沒有意義。由于您希望從給定的文本中提取每個單詞，我建議您將所有單詞彼此相鄰地對齊，并在兩者之間使用一個空格 - 這意味著您不僅要刪除新行，不必要的空格，特殊/不需要的字符和數字，還要刪除控制字符。

這應該可以解決問題：

import sys

import os

os.getcwd()

import string

path = "/your/path"

os.chdir(path)

# Prompt for user to input filename:

fname = input("Enter the filename: ")

try:

fhand = open(fname)

except IOError:

# Invalid filename error

print("\n")

print("Sorry, file can't be opened! Please check your spelling.")

sys.exit()

# Initialize char counts and word counts dictionary

counts = {}

worddict = {}

# create one liner with undesired characters removed

text = fhand.read().replace("\n", " ").replace("\r", "")

text = text.lower()

text = text.translate(text.maketrans("", "", string.digits))

text = text.translate(text.maketrans("", "", string.punctuation))

text = " ".join(text.split())

words = text.split(" ")

for word in words:

# Is the word already in the word dictionary?

if word in worddict:

# Increase by 1

worddict[word] += 1

else:

# Add word to dictionary with count of 1 if not there already

worddict[word] = 1

# Character count

for word in text:

# Increase count by 1 if letter

if word in counts:

counts[word] += 1

else:

counts[word] = 1

# Initialize dictionaries

lst = []

countlst = []

freqlst = []

# Count up the number of letters

for ltrs, c in counts.items():

# skip spaces

if ltrs == " ":

continue

lst.append((c, ltrs))

countlst.append(c)

# Sum up the count

totalcount = sum(countlst)

# Calculate the frequency in each dictionary

for ec in countlst:

efreq = (ec / totalcount) * 100

freqlst.append(efreq)

# Sort lists by count and percentage frequency

freqlst.sort(reverse=True)

lst.sort(reverse=True)

# Print out word counts sorted

for key in sorted(worddict.keys(), key=worddict.get, reverse=True)[:10]:

print(key, ":", worddict[key])

# Print out all letters and counts:

for ltrs, c, in lst:

print(c, "-", ltrs, "-", round(ltrs / totalcount * 100, 2), "%")

反對回復 2022-09-27

2 回答
0 關注
107 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Python - 使用字典和元組查找單詞和字母的唯一計數

Python - 使用字典和元組查找單詞和字母的唯一計數

2 回答

添加回答