亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

如何將 ngrams 生成器結果保存在文本文件中?

如何將 ngrams 生成器結果保存在文本文件中?

慕碼人8056858 2021-09-25 14:09:19
我正在使用 nltk 和 python 從語料庫中提取 ngram,我需要將生成的 ngram 保存在文本文件中。我試過這段代碼,但沒有結果:import nltk, re, string, collectionsfrom nltk.util import ngrams with open("titles.txt", "r", encoding='utf-8') as file:    text = file.read()tokenized = text.split()Monograms = ngrams(tokenized, 1)MonogramFreq = collections.Counter(Monograms)with open('output.txt', 'w') as f:       f.write(str(MonogramFreq))這是titles.txt的示例:Joli appartement s3 aux jardins de carthage mz823Villa 600m2 haut standing à hammametHammem lifS2 manzah 7Terrain constructible de 252m2 cl?turéTerrain nu a gammarthTerrain agrecole al fahesBureau 17 piècesUsine 5000m2 mannoubaMongramFreq 的簡單打印必須給出如下內容:('atelier',): 17, ('430',): 17, ('jabli',): 17, ('mall',): 17, ('palmeraies',): 17, ('r4',): 17, ('dégagée',): 17, ('fatha',): 17但甚至沒有創建output.txt文件。我更正了我的代碼如下:import nltk, re, string, collectionsfrom nltk.util import ngrams with open("titles.txt", "r", encoding='utf-8') as file:text = file.read()tokenized = text.split()Threegrams = ngrams(tokenized, 3)ThreegramFreq = collections.Counter(Threegrams)for i in ThreegramFreq.elements():with open('output.txt', 'a') as w:w.write(str(i))w.close()但是我需要在 output.txt 文件中包含每個 3-gram 的頻率。怎么做 ?
查看完整描述

2 回答

?
慕桂英3389331

TA貢獻2036條經驗 獲得超8個贊

請至少閱讀評論:


from collections import Counter


from nltk import word_tokenize, ngrams


text='''Joli appartement s3 aux jardins de carthage mz823

Villa 600m2 haut standing à hammamet

Hammem lif

S2 manzah 7

Terrain constructible de 252m2 cl?turé

Terrain nu a gammarth

Terrain agrecole al fahes

Bureau 17 pièces

Usine 5000m2 mannouba'''


# Create a counter object to track ngrams and counts.

ngram_counters = Counter()


# Split the text into sentences, 

# For now, assume '\n' delimits the sentences.

for line in text.split('\n'):

    # Update the counters with ngrams in each sentence,

    ngram_counters.update(ngrams(word_tokenize(line), n=3))


# Opens a file to print out.

with open('ngram_counts.tsv', 'w') as fout:

    # Iterate through the counter object, like a dictionary.

    for ng, counts in ngram_counters.items():

        # Use space to join the tokens in the ngrams before printing.

        # Print the counts in a separate column.

        print(' '.join(ng) +'\t' + str(counts), end='\n', file=fout)


查看完整回答
反對 回復 2021-09-25
  • 2 回答
  • 0 關注
  • 202 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號