首頁猿問我的 for 循環與 yield...

我的 for 循環與 yield 相結合的問題

Python

慕絲7291255 2021-12-09 18:27:45

我有一個連接由星號分隔的單詞的程序。該程序刪除星號并將單詞的第一部分（星號之前的部分）與其第二部分（星號之后的部分）連接起來。除了一個主要問題外，它運行良好：第二部分（星號之后）仍在輸出中。例如，程序連接了 ['presi', '*', 'dent']，但 'dent' 仍在輸出中。我沒有弄清楚我的代碼哪里有問題。代碼如下：from collections import defaultdictimport nltkfrom nltk.tokenize import word_tokenizeimport reimport osimport sysfrom pathlib import Pathdef main(): while True: try: file_to_open =Path(input("\nPlease, insert your file path: ")) with open(file_to_open) as f: words = word_tokenize(f.read().lower()) break except FileNotFoundError: print("\nFile not found. Better try again") except IsADirectoryError: print("\nIncorrect Directory path.Try again") word_separator = '*' with open ('Fr-dictionary2.txt') as fr: dic = word_tokenize(fr.read().lower()) def join_asterisk(ary): for w1, w2, w3 in zip(words, words[1:], words[2:]): if w2 == word_separator: word = w1 + w3 yield (word, word in dic) elif w1 != word_separator and w1 in dic: yield (w1, True) correct_words = [] incorrect_words = [] correct_words = [w for w, correct in join_asterisk(words) if correct] incorrect_words = [w for w, correct in join_asterisk(words) if not correct] text=' '.join(correct_words)我想知道是否有人可以幫我檢測這里的錯誤？輸入示例：共和國總統*的承諾也是鐵路公司領導人的承諾，他爭論Elysee Palace的Grand-Est會議上的各種官員。2017 年 7 月 1 日，共和國總統埃馬紐埃爾·馬克龍（右）與法國國營鐵路公司的老板紀堯姆·佩皮在巴黎蒙帕納斯車站。GEOFFROY VAN DER HASSELT / 法新社SNCF 的用戶有時會因火車取消或服務中斷而感到惱火，這似乎也影響了共和國總統。作為大辯論的一部分，埃馬紐埃爾·馬克龍 (Emmanuel Macron) 于 2 月 26 日星期二在愛麗舍宮 (Elysee Palace) 的民選官員面前，在 12 月 23 日關閉了 Saint-Dié - Epinal 線路的 SNCF 發表了非常嚴厲的言論， 2018 年，而國家元首在 2018 年 4 月在孚日進行的遷移期間承諾，它將繼續運營。

查看完整描述

2 回答

慕的地8271018

TA貢獻1796條經驗獲得超4個贊

這兩個額外的詞（我假設）都在您的字典中，因此在 for 循環的 2 次迭代后第二次產生，因為它們在行中遇到這種情況w1：

elif w1 != word_separator and w1 in dic:

yield (w1, True)

重新設計你的join_asterisk函數似乎是最好的方法，因為任何試圖修改這個函數來跳過這些的嘗試都是非常笨拙的。

以下是重新設計函數的一種方法，以便您可以跳過已包含在由“*”分隔的單詞的后半部分的單詞：

incorrect_words = []

def join_asterisk(array):

ary = array + ['', '']

i, size = 0, len(ary)

while i < size - 2:

if ary[i+1] == word_separator:

if ary[i] + ary[i+2] in dic:

yield ary[i] + ary[i+2]

else:

incorrect_words.append(ary[i] + ary[i+2])

i+=2

elif ary[i] in dic:

yield ary[i]

i+=1

如果您希望它更接近您的原始功能，可以將其修改為：

def join_asterisk(array):

ary = array + ['', '']

i, size = 0, len(ary)

while i < size - 2:

if ary[i+1] == word_separator:

concat_word = ary[i] + ary[i+2]

yield (concat_word, concat_word in dic)

i+=2

else:

yield (ary[i], ary[i] in dic)

i+=1

反對回復 2021-12-09

撒科打諢

TA貢獻1934條經驗獲得超2個贊

我認為這種替代實現join_asterisk符合您的意圖：

def join_asterisk(words, word_separator):

if not words:

return

# Whether the previous word was a separator

prev_sep = (words[0] == word_separator)

# Next word to yield

current = words[0] if not prev_sep else ''

# Iterate words

for word in words[1:]:

# Skip separator

if word == word_separator:

prev_sep = True

else:

# If neither this or the previous were separators

if not prev_sep:

# Yield current word and clear

yield current

current = ''

# Add word to current

current += word

prev_sep = False

# Yield last word if list did not finish with a separator

if not prev_sep:

yield current

words = ['les', 'engagements', 'du', 'prési', '*', 'dent', 'de', 'la', 'républi', '*', 'que', 'sont', 'aussi', 'ceux', 'des', 'dirigeants', 'de', 'la', 'société', 'ferroviaire']

word_separator = '*'

print(list(join_asterisk(words, word_separator)))

# ['les', 'engagements', 'du', 'président', 'de', 'la', 'république', 'sont', 'aussi', 'ceux', 'des', 'dirigeants', 'de', 'la', 'société', 'ferroviaire']

反對回復 2021-12-09

2 回答
0 關注
353 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

我的 for 循環與 yield 相結合的問題

我的 for 循環與 yield 相結合的問題

2 回答

添加回答