亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

Python - 使用字典和元組查找單詞和字母的唯一計數

Python - 使用字典和元組查找單詞和字母的唯一計數

慕娘9325324 2022-09-27 14:55:54
我目前正在嘗試創建一個腳本,該腳本允許我運行文件中包含的文本并計算單詞數,不同單詞,列出前10個最頻繁的單詞和計數,并將字符頻率從最頻繁到最不頻繁進行排序。以下是我到目前為止所擁有的:import sysimport osos.getcwd()import stringpath = ""os.chdir(path)#Prompt for user to input filename:fname = input('Enter the filename: ')try:    fhand = open(fname)except IOError:    #Invalid filename error    print('\n')    print("Sorry, file can't be opened! Please check your spelling.")    sys.exit()#Initialize char counts and word counts dictionarycounts = {}worddict = {}#For character and word frequency countfor line in fhand:        #Remove leading spaces        line = line.strip()        #Convert everything in the string to lowercase        line = line.lower()        #Take into account punctuation                line = line.translate(line.maketrans('', '', string.punctuation))        #Take into account white spaces        line = line.translate(line.maketrans('', '', string.whitespace))        #Take into account digits        line = line.translate(line.maketrans('', '', string.digits))        #Splitting line into words        words = line.split(" ")        for word in words:            #Is the word already in the word dictionary?            if word in worddict:                #Increase by 1                worddict[word] += 1            else:                #Add word to dictionary with count of 1 if not there already                worddict[word] = 1        #Character count        for word in line:            #Increase count by 1 if letter            if word in counts:                counts[word] += 1            else:                counts[word] = 1#Initialize dictionarieslst = []countlst = []freqlst = []#Count up the number of lettersfor ltrs, c in counts.items():    lst.append((c,ltrs))    countlst.append(c)#Sum up the counttotalcount = sum(countlst)#Calculate the frequency in each dictionaryfor ec in countlst:    efreq = (ec/totalcount) * 100    freqlst.append(efreq)#Sort lists by count and percentage frequencyfreqlst.sort(reverse=True)lst.sort(reverse=True)
查看完整描述

2 回答

?
揚帆大魚

TA貢獻1799條經驗 獲得超9個贊

line = line.translate(line.maketrans('', '', string.whitespace))

您正在刪除包含此代碼的行中的所有空格。刪除它,它應該按預期工作。


查看完整回答
反對 回復 2022-09-27
?
躍然一笑

TA貢獻1826條經驗 獲得超6個贊

您的代碼會刪除空格以按空格拆分 - 這沒有意義。由于您希望從給定的文本中提取每個單詞,我建議您將所有單詞彼此相鄰地對齊,并在兩者之間使用一個空格 - 這意味著您不僅要刪除新行,不必要的空格,特殊/不需要的字符和數字,還要刪除控制字符。


這應該可以解決問題:


import sys

import os


os.getcwd()

import string


path = "/your/path"

os.chdir(path)


# Prompt for user to input filename:

fname = input("Enter the filename: ")


try:

    fhand = open(fname)

except IOError:

    # Invalid filename error

    print("\n")

    print("Sorry, file can't be opened! Please check your spelling.")

    sys.exit()


# Initialize char counts and word counts dictionary

counts = {}

worddict = {}


# create one liner with undesired characters removed

text = fhand.read().replace("\n", " ").replace("\r", "")

text = text.lower()

text = text.translate(text.maketrans("", "", string.digits))

text = text.translate(text.maketrans("", "", string.punctuation))

text = " ".join(text.split())


words = text.split(" ")


for word in words:

    # Is the word already in the word dictionary?

    if word in worddict:

        # Increase by 1

        worddict[word] += 1

    else:

        # Add word to dictionary with count of 1 if not there already

        worddict[word] = 1


# Character count

for word in text:

    # Increase count by 1 if letter

    if word in counts:

        counts[word] += 1

    else:

        counts[word] = 1


# Initialize dictionaries

lst = []

countlst = []

freqlst = []


# Count up the number of letters

for ltrs, c in counts.items():

    # skip spaces

    if ltrs == " ":

        continue

    lst.append((c, ltrs))

    countlst.append(c)


# Sum up the count

totalcount = sum(countlst)


# Calculate the frequency in each dictionary

for ec in countlst:

    efreq = (ec / totalcount) * 100

    freqlst.append(efreq)


# Sort lists by count and percentage frequency

freqlst.sort(reverse=True)

lst.sort(reverse=True)


# Print out word counts sorted

for key in sorted(worddict.keys(), key=worddict.get, reverse=True)[:10]:

    print(key, ":", worddict[key])


# Print out all letters and counts:

for ltrs, c, in lst:

    print(c, "-", ltrs, "-", round(ltrs / totalcount * 100, 2), "%")




查看完整回答
反對 回復 2022-09-27
  • 2 回答
  • 0 關注
  • 107 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號