首頁猿問搜索字符串的非蠻力建議解決方案

搜索字符串的非蠻力建議解決方案

Python

滄海一幻覺 2021-09-11 13:29:40

我想知道您是否可以就用于迭代字符串、將其分成兩半并檢查兩半是否存在于字典/哈希中的非蠻力解決方案的算法提出建議？例如，字符串“peanutbutter”被拆分為“peanut”和“butter”（是的，那里還有其他詞，但出于示例目的，我們可以使用這兩個詞）這是我提出的蠻力解決方案以供參考：def break_into_spaces(S): i = 1 while i < len(S): left = S[i:] right = S[:i] if left in DICTIONARY and right in DICTIONARY: print("Found them!") print("{} {}".format(right, left)) i += 1break_into_spaces("peanutbutter")

查看完整描述

2 回答

Smart貓小萌

TA貢獻1911條經驗獲得超7個贊

我的選擇：

wordlist = ['air', 'pack', 'port', 'hard', 'back', 'bag', 'disk', 'ground', 'play']

word = 'playground'

lenw, minlen = len(word), min([len(w) for w in wordlist])

pairs = [(word[:n], word[n:]) for n in range(1,lenw) if (n >= minlen and n < lenw-minlen+1) ]

found = False

for w1, w2 in pairs:

if w1 in wordlist and w2 in wordlist:

print('Found ' + word + ' as: ' + w1 + ' + ' + w2)

found = True

break

if not found: print('No words found')

#=> Found playground as: play + ground

pairs是將詞一分為二的映射，其中兩個子詞不小于詞表中最小的詞。這減少了查找次數。

打印出來看看：

print(pairs)

#=> [('pla', 'yground'), ('play', 'ground'), ('playg', 'round'), ('playgr', 'ound'), ('playgro', 'und')]

如果是大量單詞，我建議按起始字母（作為詞匯表）分組，然后僅查找單詞字母和起始單詞集之間的交集內的單詞。這里不完整的代碼：

letters = set(word)

print(letters) #=> {'r', 'a', 'u', 'g', 'l', 'n', 'd', 'o', 'y', 'p'}

alphabet = {}

for word in wordlist:

alphabet.setdefault(word[0], set()).add(word)

print(alphabet)

#=> {'a': {'air'}, 'p': {'port', 'play', 'pack'}, 'h': {'hard'}, 'b': {'back', 'bag'}, 'd': {'disk'}, 'g': {'ground'}}

所以交集是：{'g', 'p', 'd', 'a'} 然后建立查找列表：

lookuplist = []

for i in intersection:

for word in alphabet[i]:

lookuplist.append(word)

lookuplist #=> ['air', 'disk', 'ground', 'port', 'pack', 'play']

所以用lookuplist代替wordlist

用一些方法在抽屜里訂購

def vocabulary(wordlist):

res = {}

for word in wordlist:

res.setdefault(word[0], set()).add(word)

return res

def lookuplist(vocabulary, word):

vocabulary_alphabet = set(vocabulary.keys())

word_letters = set(word)

intersection = vocabulary_alphabet.intersection(word_letters)

lookuplist = []

for i in intersection:

for word in vocabulary[i]:

lookuplist.append(word)

return lookuplist

def find_word(word, lookuplist):

lenw, minlen = len(word), min([len(w) for w in lookuplist])

pairs = [(word[:n], word[n:]) for n in range(1,lenw) if (n >= minlen and n < lenw-minlen+1) ]

for w1, w2 in pairs:

if w1 in lookuplist and w2 in lookuplist: return (word, w1, w2)

return []

您可以按如下方式使用：

wordlist = ['air', 'pack', 'port', 'hard', 'back', 'bag', 'disk', 'ground', 'play']

word = 'playground'

vocabulary = vocabulary(wordlist) # run once then store the result

lookuplist = lookuplist(vocabulary, word)

found_word = find_word(word, lookuplist)

print(found_word)

#=> ('playground', 'play', 'ground')

反對回復 2021-09-11

開心每一天1111

TA貢獻1836條經驗獲得超13個贊

不是一個完整的解決方案，但一個好主意可能是將單詞存儲在字典中，例如，其中鍵是單詞的長度，值是一組單詞。然后創建一個長度列表來迭代它，而不是迭代輸入詞 ( s)，例如：

words = ['toothpaste',

'hard-to-find',

'economic',

'point',

'food',

'seal',

'outrageous',

'motionless',

'ice',

'tow',

'boot',

'cruel',

'peanut',

'butter']

index = {}

for word in words:

index.setdefault(len(word), set()).add(word)

lengths = sorted(index)

def break_into_spaces(s):

s_length = len(s)

for length in lengths:

if length < s_length:

left = s[length:]

right = s[:length]

if left in index[length] and s_length - length in index and right in index[s_length - length]:

print("{} {}".format(right, left))

print("Found them!")

else:

break

break_into_spaces('peanutbutter')

輸出

peanut butter

Found them!

這樣做是否可以節省時間：

它避免了遍歷整個輸入詞，想象一下輸入詞比字典中的所有詞都短的情況，這將立即打破循環并且不打印任何內容。

通過將單詞存儲在相同長度的集合中，您只需要檢查是否存在相同長度的匹配單詞，而不是檢查所有單詞。請注意，這可能毫無意義，因為字典是哈希表，因此理論上檢查包含情況是O(1).

反對回復 2021-09-11

2 回答
0 關注
203 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

搜索字符串的非蠻力建議解決方案

搜索字符串的非蠻力建議解決方案

2 回答

添加回答