1 回答

TA貢獻1876條經驗 獲得超7個贊
collections.Counter
使得這相對簡單。我用來re.findall(r'[\w]+', data)
查找單詞(單詞是帶有字母、下劃線和數字的東西)。根據需要進行調整。
import re
from collections import Counter
fn = input('Please enter the full name of the file: ')
with open(fn, 'r') as f:
? ? words = Counter(re.findall(r'[\w]+', f.read()))
? ? # use words = Counter(f.read().split()) if everything split by spaces
? ? # adjust regular expression depending on whether you want or don't want
? ? # stuff like numbers to be counted as "words"
print('Total number of words:', sum(words.values()))
# this is weighted by word occurrence, not sure whether this is correct
print('Average length of words:',?
? ? ? sum(len(w) * o for w, o in words.items()) / sum(words.values()))
print('Word occurrence:', words)
# this only shows letters that actually occur. If you need all letters of?
# the alphabet, you have to add the rest
print('Start letter occurrence', Counter(w[0] for w in words.elements()))
添加回答
舉報