2 回答

TA貢獻1796條經驗 獲得超4個贊
這兩個額外的詞(我假設)都在您的字典中,因此在 for 循環的 2 次迭代后第二次產生,因為它們在行中遇到這種情況w1:
elif w1 != word_separator and w1 in dic:
yield (w1, True)
重新設計你的join_asterisk函數似乎是最好的方法,因為任何試圖修改這個函數來跳過這些的嘗試都是非常笨拙的。
以下是重新設計函數的一種方法,以便您可以跳過已包含在由“*”分隔的單詞的后半部分的單詞:
incorrect_words = []
def join_asterisk(array):
ary = array + ['', '']
i, size = 0, len(ary)
while i < size - 2:
if ary[i+1] == word_separator:
if ary[i] + ary[i+2] in dic:
yield ary[i] + ary[i+2]
else:
incorrect_words.append(ary[i] + ary[i+2])
i+=2
elif ary[i] in dic:
yield ary[i]
i+=1
如果您希望它更接近您的原始功能,可以將其修改為:
def join_asterisk(array):
ary = array + ['', '']
i, size = 0, len(ary)
while i < size - 2:
if ary[i+1] == word_separator:
concat_word = ary[i] + ary[i+2]
yield (concat_word, concat_word in dic)
i+=2
else:
yield (ary[i], ary[i] in dic)
i+=1

TA貢獻1934條經驗 獲得超2個贊
我認為這種替代實現join_asterisk符合您的意圖:
def join_asterisk(words, word_separator):
if not words:
return
# Whether the previous word was a separator
prev_sep = (words[0] == word_separator)
# Next word to yield
current = words[0] if not prev_sep else ''
# Iterate words
for word in words[1:]:
# Skip separator
if word == word_separator:
prev_sep = True
else:
# If neither this or the previous were separators
if not prev_sep:
# Yield current word and clear
yield current
current = ''
# Add word to current
current += word
prev_sep = False
# Yield last word if list did not finish with a separator
if not prev_sep:
yield current
words = ['les', 'engagements', 'du', 'prési', '*', 'dent', 'de', 'la', 'républi', '*', 'que', 'sont', 'aussi', 'ceux', 'des', 'dirigeants', 'de', 'la', 'société', 'ferroviaire']
word_separator = '*'
print(list(join_asterisk(words, word_separator)))
# ['les', 'engagements', 'du', 'président', 'de', 'la', 'république', 'sont', 'aussi', 'ceux', 'des', 'dirigeants', 'de', 'la', 'société', 'ferroviaire']
添加回答
舉報