2 回答

TA貢獻1770條經驗 獲得超3個贊
如果您想在沒有任何內存開銷的情況下執行此操作(就地)
始終考慮數據的實際存儲方式。csc 矩陣的一個小例子。
shape=(5,5)
X=sparse.random(shape[0], shape[1], density=0.5, format='csc')
print(X.todense())
[[0.12146814 0. 0. 0.04075121 0.28749552]
[0. 0.92208639 0. 0.44279661 0. ]
[0.63509196 0.42334964 0. 0. 0.99160443]
[0. 0. 0.25941113 0.44669367 0.00389409]
[0. 0. 0. 0. 0.83226886]]
i=0 #first column
print(X.data[X.indptr[i]:X.indptr[i+1]])
[0.12146814 0.63509196]
一個 Numpy 解決方案
所以我們在這里唯一要做的就是逐列修改非零條目。這可以使用部分矢量化的 numpy 解決方案輕松完成。data只是包含所有非零值的數組,indptr存儲每列開始和結束的信息。
def Numpy_csc_norm(data,indptr):
for i in range(indptr.shape[0]-1):
xs = np.sum(data[indptr[i]:indptr[i+1]])
#Modify the view in place
data[indptr[i]:indptr[i+1]]/=xs
關于性能,這個就地解決方案已經不錯了。如果你想進一步提高性能,你可以使用 Cython/Numba/ 或其他一些可以或多或少容易地用 Python 包裝的編譯代碼。
一個 Numba 解決方案
@nb.njit(fastmath=True,error_model="numpy",parallel=True)
def Numba_csc_norm(data,indptr):
for i in nb.prange(indptr.shape[0]-1):
acc=0
for j in range(indptr[i],indptr[i+1]):
acc+=data[j]
for j in range(indptr[i],indptr[i+1]):
data[j]/=acc
表現
#Create a not to small example matrix
shape=(50_000,10_000)
X=sparse.random(shape[0], shape[1], density=0.001, format='csc')
#Not in-place from hpaulj
def hpaulj(X):
acc=X.sum(axis=0)
return X.multiply(sparse.csr_matrix(1./acc))
%timeit X2=hpaulj(X)
#6.54 ms ± 67.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#All 2 variants are in-place,
#but this shouldn't have a influence on the timings
%timeit Numpy_csc_norm(X.data,X.indptr)
#79.2 ms ± 914 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#parallel=False -> faster on tiny matrices
%timeit Numba_csc_norm(X.data,X.indptr)
#626 μs ± 30.6 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
#parallel=True -> faster on larger matrices
%timeit Numba_csc_norm(X.data,X.indptr)
#185 μs ± 5.39 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
添加回答
舉報