3 回答

TA貢獻1829條經驗 獲得超6個贊
使用這個解決方案,只是簡化,因為排序已經交換:
df['new'] = df.values.dot(1 << np.arange(df.shape[-1]))
print (df)
v1 v2 v3 v4 new
0 0 0 0 0 0
1 1 0 1 1 13
2 0 0 1 1 12
3 0 1 0 1 10
4 1 1 1 1 15
1000行和 4 列的性能:
np.random.seed(2019)
N= 1000
df = pd.DataFrame(np.random.choice([0,1], size=(N, 4)))
df.columns = [f'v{x+1}' for x in df.columns]
In [60]: %%timeit
...: df['new'] = df.values.dot(1 << np.arange(df.shape[-1]))
113 μs ± 1.45 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
尤卡解決方案:
In [65]: %%timeit
...: variables = ['v1', 'v2', 'v3', 'v4']
...: df['added'] = df['v1']
...: for ind, var in enumerate(variables[1:]) :
...: df['added'] = df['added'] + [x<<ind for x in df[var]]
...:
1.82 ms ± 16.2 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
原解決方案:
In [66]: %%timeit
...: variables = ['v1', 'v2', 'v3', 'v4']
...: df['added'] = df['v1']
...: for ind, var in enumerate(variables[1:]) :
...: df['added'] = df['added'] + df[var].apply(lambda x : x << ind )
...:
3.14 ms ± 8.52 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

TA貢獻2036條經驗 獲得超8個贊
在回答您關于更有效替代方案的問題時,我發現列表理解確實對您有所幫助:
variables = ['v1', 'v2', 'v3', 'v4']
df['added'] = df['v1']
for ind, var in enumerate(variables[1:]) :
%timeit df['added'] = df['added'] + [x<<ind for x in df[var]]
308 μs ± 22.9 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
322 μs ± 19 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
316 μs ± 10.5 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
所以 315 μs 與:
variables = ['v1', 'v2', 'v3', 'v4']
df['added'] = df['v1']
for ind, var in enumerate(variables[1:]) :
%timeit df['added'] = df['added'] + df[var].apply(lambda x : x << ind )
500 μs ± 38.2 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
503 μs ± 32.1 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
481 μs ± 32 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
作為免責聲明,我不同意總和的價值,但這是一個不同的話題:)
添加回答
舉報