2 回答

TA貢獻1810條經驗 獲得超4個贊
我所做的總結:
從利潤列表創建字典
運行每個鍵值對的排列
遍歷每一對以分別獲得名稱和金額的組合。
按名稱排序容器列表,按名稱分組,對每個分組的數量求和,并將最終結果加載到字典中。
將字典讀入數據框并按利潤按降序對值進行排序。
我相信所有的處理都應該在它進入數據幀之前完成,你應該得到顯著的加速:
from collections import defaultdict
from operator import itemgetter
from itertools import permutations, groupby
d = defaultdict(list)
for k, v,s in profits:
d[k].append((v,s))
container = []
for k,v in d.items():
l = (permutations(v,2))
#here I combine the names and the amounts separately into A and B
for i,j in l:
A = i[0]+'_'+j[0]
B = i[-1]+(j[-1]*-1)
container.append([A,B])
#here I sort the list, then groupby (groupby wont work if you don't sort first)
container = sorted(container, key=itemgetter(0,1))
sam = dict()
for name, amount in groupby(container,key=itemgetter(0)):
sam[name] = sum(i[-1] for i in amount)
outcome = pd.DataFrame
.from_dict(sam,
orient='index',
columns=['Profit'])
.sort_values(by='Profit',
ascending=False)
Profit
Bravo_Alpha 149.635831
Delta_Alpha 101.525568
Charlie_Alpha 78.601245
Bravo_Charlie 71.034586
Bravo_Delta 48.110263
Delta_Charlie 22.924323
Charlie_Delta -22.924323
Delta_Bravo -48.110263
Charlie_Bravo -71.034586
Alpha_Charlie -78.601245
Alpha_Delta -101.525568
Alpha_Bravo -149.635831
當我在我的 PC 上運行它時,它是 1.24 毫秒,而 urs 是 14.1 毫秒。希望有人可以更快地生產出一些東西。
更新:
我為第一個所做的一切都是不必要的。不需要置換 - 乘數為 -1。這意味著我們需要做的就是獲取每個名稱的總和,將名稱配對(不重復),將其中一個值乘以 -1 并添加到另一個值,然后當我們得到一對的一次性總和時,乘以 - 1 再次得到相反的結果。我得到了大約 18.6μs 的速度,一旦引入 pandas,它就達到了 273μs。這是一些顯著的加速。大多數計算都將數據讀入 pandas。開始:
from collections import defaultdict
from operator import itemgetter
from itertools import combinations, chain
import pandas as pd
def optimizer(profits):
nw = defaultdict(list)
content = dict()
[nw[node].append((profit)) for dat,node,profit in profits]
#sum the total for each key
B = {key : sum(value) for key ,value in nw.items()}
#multiply the value of the second item in the tuple by -1
#add that to the value of the first item in the tuple
#pair the result back to the tuple and form a dict
sumr = {(first,last):sum((B[first],B[last]*-1))
for first,last
in combinations(B.keys(),2)}
#reverse the positions in the tuple for each key
#multiply the value by -1 and pair to form a dict
rev = {tuple(reversed(k)): v*-1
for k,v in sumr.items()}
#join the two dictionaries into one
#sort in descending order
#and create a dictionary
result = dict(sorted(chain(sumr.items(),
rev.items()
),
key = itemgetter(-1),
reverse=True
))
#load into pandas
#trying to reduce the compute time here by reducing pandas workload
return pd.DataFrame(list(result.values()),
index = list(result.keys()),
)
我可能會延遲讀取數據幀,直到不可避免。我很想知道你最后運行它時的實際速度是多少。

TA貢獻1852條經驗 獲得超7個贊
這在技術上不是答案,因為它沒有使用優化技術解決,但希望有人會發現它有用。
從測試來看,DataFrame 的構建和連接是緩慢的部分。使用 Numpy 創建配對價格矩陣非??欤?/p>
arr = df['profit'].values + df['profit'].multiply(-1).values[:, None]
生成每個節點乘以每個節點的矩陣:
+---+-------------+------------+------------+------------+
| | 0 | 1 | 2 | 3 |
+---+-------------+------------+------------+------------+
| 0 | 0.000000 | 149.635831 | 78.598163 | 101.525670 |
+---+-------------+------------+------------+------------+
| 1 | -149.635831 | 0.000000 | -71.037668 | -48.110161 |
+---+-------------+------------+------------+------------+
| 2 | -78.598163 | 71.037668 | 0.000000 | 22.927507 |
+---+-------------+------------+------------+------------+
| 3 | -101.525670 | 48.110161 | -22.927507 | 0.000000 |
+---+-------------+------------+------------+------------+
number of nodes如果您構造一個維度為*的空 numpy 數組number of nodes,那么您可以簡單地將 daily 數組添加到 totals 數組中:
total_arr = np.zeros((4, 4))
# Do this for each day
arr = df['profit'].values + df['profit'].multiply(-1).values[:, None]
total_arr += arr
一旦你有了它,你需要做一些 Pandas voodoo 將節點名稱分配給矩陣并將矩陣分解為單獨的多/空/利潤行。
我最初的(詳盡的)搜索用了 47 分鐘和 60 天的數據?,F在已經縮短到 13 秒。
完整的工作示例:
profits = [
{'date':'2019-11-18', 'node':'A', 'profit': -79.629698},
{'date':'2019-11-19', 'node':'A', 'profit': -17.452517},
{'date':'2019-11-20', 'node':'A', 'profit': -19.069558},
{'date':'2019-11-21', 'node':'A', 'profit': -66.061564},
{'date':'2019-11-18', 'node':'B', 'profit': -87.698670},
{'date':'2019-11-19', 'node':'B', 'profit': -73.812616},
{'date':'2019-11-20', 'node':'B', 'profit': 198.513246},
{'date':'2019-11-21', 'node':'B', 'profit': -69.579466},
{'date':'2019-11-18', 'node':'C', 'profit': 66.3022870},
{'date':'2019-11-19', 'node':'C', 'profit': -16.132065},
{'date':'2019-11-20', 'node':'C', 'profit': -123.73898},
{'date':'2019-11-21', 'node':'C', 'profit': -30.046416},
{'date':'2019-11-18', 'node':'D', 'profit': -131.68222},
{'date':'2019-11-19', 'node':'D', 'profit': 13.2964730},
{'date':'2019-11-20', 'node':'D', 'profit': 23.5950530},
{'date':'2019-11-21', 'node':'D', 'profit': 14.1030270},
]
# Initialize a Numpy array of node_length * node_length dimension
profits_df = pd.DataFrame(profits)
nodes = profits_df['node'].unique()
total_arr = np.zeros((len(nodes), len(nodes)))
# For each date, calculate the pairs profit matrix and add it to the total
for date, date_df in profits_df.groupby('date'):
df = date_df[['node', 'profit']].reset_index()
arr = df['profit'].values + df['profit'].multiply(-1).values[:, None]
total_arr += arr
# This will label each column and row
nodes_series = pd.Series(nodes, name='node')
perms_df = pd.concat((nodes_series, pd.DataFrame(total_arr, columns=nodes_series)), axis=1)
# This collapses our matrix back to long, short, and profit rows with the proper column names
perms_df = perms_df.set_index('node').unstack().to_frame(name='profit').reset_index()
perms_df = perms_df.rename(columns={'level_0': 'long', 'node': 'short'})
# Get rid of long/short pairs where the nodes are the same (not technically necessary)
perms_df = perms_df[perms_df['long'] != perms_df['short']]
# Let's see our profit
perms_df.sort_values('profit', ascending=False)
結果:
+----+------+-------+-------------+
| | long | short | profit |
+----+------+-------+-------------+
| 4 | B | A | 149.635831 |
+----+------+-------+-------------+
| 12 | D | A | 101.525670 |
+----+------+-------+-------------+
| 8 | C | A | 78.598163 |
+----+------+-------+-------------+
| 6 | B | C | 71.037668 |
+----+------+-------+-------------+
| 7 | B | D | 48.110161 |
+----+------+-------+-------------+
| 14 | D | C | 22.927507 |
+----+------+-------+-------------+
| 11 | C | D | -22.927507 |
+----+------+-------+-------------+
| 13 | D | B | -48.110161 |
+----+------+-------+-------------+
| 9 | C | B | -71.037668 |
+----+------+-------+-------------+
| 2 | A | C | -78.598163 |
+----+------+-------+-------------+
| 3 | A | D | -101.525670 |
+----+------+-------+-------------+
| 1 | A | B | -149.635831 |
+----+------+-------+-------------+
感謝 sammywemmy 幫助我整理問題并提出一些有用的東西。
添加回答
舉報