我正在嘗試擴展 sklearn 中的Splitter類,它與 sklearn 的決策樹類一起使用。更具體地說,我想feature_weights在新類中添加一個變量,這將通過根據特征權重按比例改變純度計算來影響最佳分割點的確定。新類幾乎是 sklearnBestSplitter類 ( https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_splitter.pyx ) 的精確副本,只有微小的變化。這是我到目前為止所擁有的:cdef class WeightedBestSplitter(WeightedBaseDenseSplitter): cdef object feature_weights # new variable - 1D array of feature weights def __reduce__(self): # same as sklearn BestSplitter (basically) # NEW METHOD def set_weights(self, object feature_weights): feature_weights = np.asfortranarray(feature_weights, dtype=DTYPE) self.feature_weights = feature_weights cdef int node_split(self, double impurity, SplitRecord* split, SIZE_t* n_constant_features) nogil except -1: # .... same as sklearn BestSplitter .... current_proxy_improvement = self.criterion.proxy_impurity_improvement() current_proxy_improvement *= self.feature_weights[<int>(current.feature)] # new line # .... same as sklearn BestSplitter ....關于上面的一些注意事項:我正在使用object變量類型,np.asfortranarray因為這是變量X在其他地方定義和設置的方式,并且X像我試圖索引一樣被索引feature_weights。此外,每個文件custom.feature都有一個變量類型( https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_splitter.pxd)。SIZE_t_splitter.pxd該問題似乎是由self.feature_weights. 上面的代碼拋出多個錯誤,但即使嘗試引用類似的東西self.feature_weights[0]并將其設置為另一個變量也會拋出錯誤:Indexing Python object not allowed without gil我想知道我需要做什么才能索引self.feature_weights標量值并將其用作乘數。
Cython - 在 nogil 函數中索引 numpy 數組
慕的地8271018
2022-12-20 11:06:19