首頁猿問生成具有值出現次數限制的值組合

生成具有值出現次數限制的值組合

Python

喵喔喔 2022-11-01 14:50:40

我的任務是：生成唯一組合列表，其中每個組合都有一定的長度（var com_len）并包含給定列表（var values）中的值（ints），每個組合都是通過從給定列表中獲取隨機值創建的（隨機性非常重要！），每個組合內部必須有唯一的值，組合內部不能重復任何值，必須對組合中的值進行排序，計算整個組合集中每個值的出現（var counter），每個值在整個數據集中出現的次數必須盡可能接近給定的次數（var counter_expected）?！氨M可能接近”的意思是，在腳本運行時計算每個出現的值，如果沒有更多組合要創建，只需結束腳本。例如，我需要生成一個組合列表，其中每個組合的長度為 3，內部具有唯一的排序值，每個值都來自范圍（256），并且每個值出現在所有生成的組合中，直到接近盡可能100次。我的問題是，如何有效地檢測到沒有更多獨特的值組合可以創建來停止循環。當腳本即將結束并且仍然存在可用值并且 len(available_values) > com_len 時會出現問題，但是無法創建尚未出現的任何新的唯一組合。到目前為止創建的代碼：import numpy as npimport randomcom_len = 3length = 256counter = np.zeros(length)values = range(length)exclude_values = []counter_expected = 100done = []mask = np.ones(len(np.array(values)), np.bool)mask[exclude_values] = Falseavailable_values = set(values) - set(exclude_values)available_values = list(available_values)ii = 0while True: """print progress""" ii = ii + 1 if not ii % 1000: print('.', end='') #POSSIBLE CONDITION HERE ex = random.sample(available_values, k=com_len) ex.sort() if ex in done: continue done.append(ex) counter[ex] = counter[ex] + 1 for l_ in ex: if counter[l_] == counter_expected: del available_values[available_values.index(l_)] if len(available_values) < com_len:break if all(counter[mask] == counter_expected): break #OR HERE注意：腳本通常會成功結束，因為 len(available_values) < com_len 或 all(counter[mask] == counter_expected) 條件會破壞 While 循環。嘗試多次運行腳本，在某些時候，您會觀察到腳本進入無限循環，因為 len(available_values) >= com_len 但沒有更多新的唯一條件可供創建，因此計數器不會增加。我需要一個有效的條件來停止腳本。在這里使用 itertools.combinations 不是一個選項，因為 available_values 列表在開始時可能很長，即 10k 個元素。當 len(available_values) 達到一定水平時，蠻力將使用 itertools.combinations 并檢查是否有任何尚未創建的組合，但這是一個丑陋的解決方案?？赡苡懈玫姆椒ú皇俏蚁氤鰜淼模悄憧赡芟氤鰜淼?。我會很感激你的幫助。

查看完整描述

3 回答

慕婉清6462132

TA貢獻1804條經驗獲得超2個贊

更新的最終答案：

我想出了以下符合我需要的代碼。筆記：

該功能在許多方面都不是最好的，但它的工作非常好！
該函數有 3 種數據生成模式：生成組合總數、生成每個值在所有組合中出現次數最少的組合、生成每個值在所有組合中出現“最大”次數的組合 (" max”的意思是“盡可能接近最大值”）。
該功能允許在選定范圍內動態更改組合的長度或提供特定數字。
根據參數，該函數可以執行冗余次數的迭代，如“錯誤總數”所示。但...
它很快！它使用集合和元組來獲得出色的性能。當 itertools.combinations 觸發返回噸（數百萬）個組合時，可能會發生唯一的問題，但就我而言，到目前為止從未發生過。

編碼：

import numpy as np

import random

import itertools

from decimal import Decimal

def get_random_combinations(values, exclude, length, mode, limit, min_ = 0, max_ = 0):

done = set()

try:

"""Creating counter"""

counter = np.zeros(len(values), np.uint)

"""Create a mask for excluded values"""

"""https://stackoverflow.com/questions/25330959/how-to-select-inverse-of-indexes-of-a-numpy-array"""

mask = np.ones(len(np.array(values)), np.bool)

mask[exclude] = False

"""available values to create combinations"""

values_a = set(values) - set(exclude)

values_a = list(values_a)

if length == 1:

if mode == 'total':

"""generate just data_number of examples"""

for ii in range(limit):

comb = random.sample(values_a, 1)[0]

del values_a[values_a.index(comb)]

done.add(tuple([comb]))

else:

"""generate one example for each comb"""

for comb in values_a: done.add(tuple([comb]))

else:

"""total number of combinations"""

if isinstance(length, str): rr = np.mean([min_, max_])

else: rr = length

nn = len(values_a)

comb_max = int(Decimal(np.math.factorial(nn)) / Decimal(np.math.factorial(rr) * np.math.factorial(nn-rr)))

err_limit = int(comb_max * 0.01)

if err_limit > 10000: err_limit = 10000

"""initiate variables"""

#should itertools be used to generate the rest of combinations

gen_comb = False

#has all combinations generated by itertools ended

comb_left_0 = False

#has the limit of errors been reached to generate itertools combinations

err_limit_reached = False

#previous combination

ll_prev = 0

dd = 0 #done counter

comb_left = set() #itertools combinations

err = 0 #errors counter

"""options variables for statistics"""

err_t = 0 #total number of errors

gen = 0 #total number of generations of itertools.combinations

ii = 0 #total number of iterations

print('GENERATING LIST OF COMBINATIONS')

while True:

"""print progress"""

ii = ii + 1

if not dd % 1000: print('.', end='')

"""check if length of combs is random or not"""

if isinstance(length, str):

"""change max_ length of combinations to

\the length of available values"""

if len(values_a) < max_: max_ = len(values_a)

ll = random.randint(min_, max_)

else: ll = length

if ll != ll_prev: gen_comb = True

"""generate combinations only when err limit is reached or

the length of combinations has changed"""

if err_limit_reached and gen_comb:

gen = gen + 1

"""after reaching the max number of consecutive errors, start generating combinations via itertools"""

"""generation is done at this point to prevent generation for a very long list"""

"""generation is also done when when length of a combination changes"""

comb_left = set(itertools.combinations(values_a, ll)) - done

"""break if there are no elements left"""

if not len(comb_left): break

"""if combinations has already been generated, used them"""

if comb_left:

"""take random sample from the set"""

comb = random.sample(comb_left, 1)[0]

"""remove it from the set"""

comb_left.remove(comb)

"""check if it was the last combination to break the loop at the end"""

if not len(comb_left): comb_left_0 = True

else:

"""generate random combination"""

comb = tuple(sorted(random.sample(values_a, ll)))

"""set previous length"""

ll_prev = ll

"""reset gen_comb"""

gen_comb = False

"""check if combination is new"""

if comb not in done: found = True

else:

"""otherwise, iterate errors"""

err = err + 1

err_t = err_t + 1

found = False

if err > err_limit: err_limit_reached = gen_comb = True

if found:

"""reset err"""

err = 0

dd = dd + 1

"""add combination to done"""

done.add(comb)

"""increase counter for the combs"""

counter[list(comb)] = counter[list(comb)] + 1

"""check if seekeing the max number of combinations or min"""

if mode == 'max':

"""for max, we must remove the elements which reached the limit"""

for l_ in list(comb):

if counter[l_] == limit:

del values_a[values_a.index(l_)]

"""if length of available elements is smaller than the possible length of the combinations"""

if isinstance(length, str):

"""for random length, choose the minimal length"""

if len(values_a) < min_: break

else:

if len(values_a) < ll: break

"""if all elements reached the limit"""

if mode == 'total':

if len(done) >= limit: break

else: #min, max

if all(counter[mask] >= limit): break

"""if the number of consecutive errors reached

the total number of combinations, break as you may never

draw a valid combination"""

if err > comb_max: break

"""if it was the last combination left"""

if comb_left_0: break

except Exception as e: print(e)

print('')

print('Combinations generated: ' + str(dd))

print('Total number of iterations: ' + str(ii))

print('Final value of err: ' + str(err))

print('Total number of errors: ' + str(err_t))

print('How many times itertools.combinations was used: ' + str(gen))

return done

"""range of values to create combinations"""

values = range(256)

"""values excluded from the combinations"""

exclude = [0,255]

"""length of combinations, if is string, the number of layers

is generated randomly withing (min_, max_) range """

length = 'r'

"""mode of how the combinations are generated:

min: minimal number of times the value appears across all combinations

(limited down by the limit value)

max: max number of times the value appears across all combinations (limited

max by the limit value)

total: total number of combinations (limited the limit value)"""

mode = 'max'

"""limit used for the mode combinations' generation"""

limit = 1000

"""min_ and max_ are used when length is string,

length is generated randomly within (min_, max_) range"""

min_ = 4

max_ = 12

done = get_random_combinations(values, exclude, length, mode, limit, min_, max_)

反對回復 2022-11-01

拉風的咖菲貓

TA貢獻1995條經驗獲得超2個贊

這是另一個關注隨機方面的答案。

您已經n選擇k了要隨機采樣的可能性，以獲得z每個值的近似出現次數（使用我在另一個答案中的符號）。我假設如果您采用(n * z) // k k-size 的樣本，并且您的隨機數生成器實際上是統一的，您將自動獲得z每個元素的近似出現次數。在您的示例中，使用n=256和k=3，z=100在 8533 中，256 個 bin 之間的分布確實是相當均勻的，這似乎是合理的。

如果您愿意接受某種程度的一致性缺陷，pythonrandom.sample是一個不錯的選擇。人口是從零開始n選擇的所有整數k。

nk在這種情況下選擇是256 * 255 * 254 / 6 = 2763520。這超出了有符號 32 位整數的范圍，但很適合無符號整數。更好的是，您可以簡單地使用 Python 的無限精度整數。

訣竅是將這些數字映射到唯一的值組合。這是使用組合數系統完成的，如此處所述。

from random import sample

from scipy.misc import combs

def gen_samples(n, k, z):

codes = sample(range(combs(n, k)), (n * z) // k)

return [unrank(n, k, code) for code in codes]

def unrank(n, k, i):

"""

Implementation of Lehmer's greedy algorithm left as

exercise for the reader

"""

# return k-element sequence

有關取消排名的提示，請參見此處。

反對回復 2022-11-01

忽然笑

TA貢獻1806條經驗獲得超5個贊

總共有多種n選擇k可能的組合符合您的標準，其中n = length和k = com_len。n choose k評估為n! / (k! * (n - k)!)。如果您生成所有不同的可能性，則每個n值都會出現(n - 1)! / ((k - 1)! * (n - k)!)多次（https://math.stackexchange.com/q/26619/295281）。你應該能夠解決這個問題，假設z <= (n - 1)! / ((k - 1)! * (n - k)!), where z = counter_expected。

對于您的示例：

n = 256
k = 3
z = 100 <= 32385

通常，生成組合的一種常用方法是k通過長度為的布爾數組n逐步增加位，始終遞增可能的最低位。每當更高的位增加時，它下面的所有位都會重置為其初始位置。這是一個示例序列：

0 0 0 0 3 2 1

0 0 0 3 0 2 1

0 0 0 3 2 0 1

0 0 0 3 2 1 0

0 0 3 0 0 2 1

...

3 2 0 0 1 0 0

3 2 0 1 0 0 0

3 2 1 0 0 0 0

我已經對位置進行了標記，以便您可以看到，如果值開始排序，則組合將始終排序。n請記住，您可以將其實現為布爾值或k索引數組。兩者都有優點和缺點。

對于您的特定用例，有一個轉折點。一旦計數超過一定數量，您就不要使用一點。有許多方法可以逐步遍歷這些位，但它們都歸結為擁有一個大小n計數器數組。

如果n * z是的倍數k，您將能夠自動獲得所有垃圾箱中的準確計數。實際上，它們本身都n不必z是的倍數k。但是，如果這不是真的，您將不可避免地出現下溢或上溢。直觀地說，您希望一次生成一個n * z總值目標k。很明顯，一個必須是后者的倍數才能使這成為可能。

您可以有兩種類型的退出標準。給定所有位的總累積計數s，

s >= n * z: 所有位的計數至少為z. 最多k - 1位的計數為z + 1.
s > n * z - k：所有位的計數為z，最多k - 1位除外，因此再添加一個組合將導致條件 1。

最后要討論的一種設計選擇是位移動的順序。由于生成一系列組合會耗盡一個垃圾箱，我希望能夠以可預測的順序在數組存儲桶的一側保持耗盡的垃圾箱按順序累積。這將從算法中刪除很多檢查。因此，我不會增加可能的最低位，而是增加可能的最高位，并在它重置時增加它下面的一位。在這種情況下，用盡的桶將始終是最低位。

因此，讓我們最終停止制作未經證實的聽起來很數學的陳述，并展示一個實現：

def generate_combos(n, k, z):

full_locs = np.arange(k + 1, dtype=np.uint)

full_locs[k] = n # makes partial vectorization easier

locs = full_locs[:k] # bit indices

counts = np.zeros(n, dtype=np.uint) # counter buckets

values = np.arange(n, dtype=np.uint) # values

min_index = 0 # index of lowest non-exhausted bin

for _ in range((n * z) // k):

counts[locs] += 1

yield values[locs]

if counts[min_index] == z:

# if lowest bin filled, shift and reset

min_index += np.argmax(counts[min_index:] < z)

locs[:] = min_index + np.arange(k)

else:

# otherwise, increment highest available counter

i = np.flatnonzero(np.diff(full_locs) > 1)

if i.size:

i = i[-1]

locs[i] += 1

# reset the remainder

locs[i + 1:] = locs[i] + np.arange(1, k - i)

else:

break

這使用條件 2。如果您需要條件 1，請在后面添加以下行：

if counters[-1] < z:

yield values[-k:]

將循環更改為類似for _ in range(-((n * z) // -k)): （由https://stackoverflow.com/a/54585138/2988730提供）將無濟于事，因為計數器并非旨在處理它。

這是一個IDEOne 鏈接，顯示了的前一百個元素generate_combos(256, 3, 10)：

[0 1 2]

[0 1 3]

[0 1 4]

[0 1 5]

[0 1 6]

[0 1 7]

[0 1 8]

[0 1 9]

[ 0 1 10]

[ 0 1 11]

[2 3 4]

[2 3 5]

[2 3 6]

[2 3 7]

[2 3 8]

[2 3 9]

[ 2 3 10]

[ 2 3 11]

[ 2 3 12]

[4 5 6]

[4 5 7]

[4 5 8]

[4 5 9]

[ 4 5 10]

[ 4 5 11]

[ 4 5 12]

[ 4 5 13]

[6 7 8]

[6 7 9]

[ 6 7 10]

[ 6 7 11]

[ 6 7 12]

[ 6 7 13]

[ 6 7 14]

[ 8 9 10]

[ 8 9 11]

[ 8 9 12]

[ 8 9 13]

[ 8 9 14]

[ 8 9 15]

[10 11 12]

[10 11 13]

[10 11 14]

[10 11 15]

[10 11 16]

[12 13 14]

[12 13 15]

[12 13 16]

[12 13 17]

[12 13 18]

[13 14 15]

[14 15 16]

[14 15 17]

[14 15 18]

[14 15 19]

[14 15 20]

[15 16 17]

[16 17 18]

[16 17 19]

[16 17 20]

[16 17 21]

[16 17 22]

[16 17 23]

[17 18 19]

[18 19 20]

[18 19 21]

[18 19 22]

[18 19 23]

[18 19 24]

[18 19 25]

[19 20 21]

[20 21 22]

[20 21 23]

[20 21 24]

[20 21 25]

[20 21 26]

[20 21 27]

[21 22 23]

[22 23 24]

[22 23 25]

[22 23 26]

[22 23 27]

[22 23 28]

[22 23 29]

[24 25 26]

[24 25 27]

[24 25 28]

[24 25 29]

[24 25 30]

[24 25 31]

[24 25 32]

[26 27 28]

[26 27 29]

[26 27 30]

[26 27 31]

[26 27 32]

[26 27 33]

[26 27 34]

[28 29 30]

[28 29 31]

...

請注意，在前 10 個元素之后，兩者0都1出現了 10 次。2并且3出現過一次，所以它們只在 9 次迭代后就用完了，以此類推。

反對回復 2022-11-01

3 回答
0 關注
141 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

生成具有值出現次數限制的值組合

生成具有值出現次數限制的值組合

3 回答

添加回答