亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

Python 使用多進程加速合并計數器

Python 使用多進程加速合并計數器

喵喔喔 2022-11-01 16:47:06
我正在嘗試使用一起購買的物品的次數來制作一個非常簡單的物品推薦系統,所以首先我創建了一個像計數器一樣的 item2item 字典# people purchased A with B 4 times, A with C 3 times.item2item = {'A': {'B': 4, 'C': 3}, 'B': {'A': 4, 'C': 2}, 'C':{'A': 3, 'B': 2}}# recommend user who purchased A and Csamples_list = [['A', 'C'], ...]    因此,對于 samples = ['A', 'C'],我建議最大 item2item['A'] + item2item['C']。但是,對于大型矩陣,合并很重,所以我嘗試使用如下的多處理from operator import addfrom functools import reducefrom concurrent.futures import ProcessPoolExecutorfrom collections import Counterwith ProcessPoolExecutor(max_workers=10) as pool:    for samples in samples_list:        # w/o PoolExecutor        # combined = reduce(add, [item2item[s] for s in samples], Counter())        future = pool.submit(reduce, add, [item2item[s] for s in samples], Counter())        combined = future.result()然而,這根本沒有加快這個過程。我懷疑在Python multiprocessing 和 shared counter和https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes中,reduce 函數中的 Counter 未共享。任何幫助表示贊賞。
查看完整描述

1 回答

?
收到一只叮咚

TA貢獻1821條經驗 獲得超5個贊

調用combined = future.result()會阻塞,直到結果完成,因此您不會在前一個請求完成之前向池提交后續請求。換句話說,您永遠不會運行多個子進程。至少您應該將代碼更改為:


with ProcessPoolExecutor(max_workers=10) as pool:

    the_futures = []

    for samples in tqdm(sample_list):

        future = pool.submit(reduce, add, [item2item[s] for s in samples], Counter())

        the_futures.append(future) # save it

    results = [f.result() for f in the_futures()] # all the results

另一種方式:


with ProcessPoolExecutor(max_workers=10) as pool:

    the_futures = []

    for samples in tqdm(sample_list):

        future = pool.submit(reduce, add, [item2item[s] for s in samples], Counter())

        the_futures.append(future) # save it

    # you need: from concurrent.futures import as_completed

    for future in as_completed(the_futures): # not necessarily the order of submission

        result = future.result() # do something with this

此外,如果您未指定構造函數,則默認為您機器上的處理器數量max_workers。ProcessPoolExecutor指定一個大于您實際擁有的處理器數量的值不會有任何收獲。


更新


如果您想在結果完成后立即處理結果并需要一種方法將結果與原始請求聯系起來,您可以將期貨作為鍵存儲在字典中,其中相應的值表示請求的參數。在這種情況下:


with ProcessPoolExecutor(max_workers=10) as pool:

    the_futures = {}

    for samples in tqdm(sample_list):

        future = pool.submit(reduce, add, [item2item[s] for s in samples], Counter())

        the_futures[future] = samples # map future to request

    # you need: from concurrent.futures import as_completed

    for future in as_completed(the_futures): # not necessarily the order of submission

        samples = the_futures[future] # the request

        result = future.result() # the result


查看完整回答
反對 回復 2022-11-01
  • 1 回答
  • 0 關注
  • 172 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號