首頁猿問如何強制Python決策樹每次只在...

如何強制Python決策樹每次只在一個節點上繼續分裂（每次形成一個節點/葉子）

Python

烙印99 2023-04-11 16:28:49

我想使用 Python 構建一個決策樹分類器，但我想強制這棵樹，無論它認為什么是最好的，每次只將一個節點分成兩片葉子。也就是說，每一次，一個節點都會分裂成一個終端葉子和另一個將繼續分裂的內部節點，而不是分裂成兩個本身可以分裂的內部節點。我希望其中一個拆分每次都以終止結束，直到您最終得到兩片低于最小數量的葉子。例如，下面的樹滿足這個要求但第二個沒有：我想這樣做的原因是為了獲得一組嵌套的觀察分割。我在另一篇文章（Finding a corresponding leaf node for each data point in a decision tree (scikit-learn)）上看到可以找到觀察的節點ID，這很關鍵。我意識到我可以通過構建一棵沒有這種限制的樹并將其中一個葉節點上升到頂部來做到這一點，但這可能無法提供足夠的觀察結果，我基本上希望在所有觀察結果中獲得這種嵌套結構數據集。在我的應用程序中，我實際上并不關心分類任務，我只想獲得由特征拆分形成的嵌套觀察集。我曾計劃讓目標變量隨機生成，這樣特征上的分割就沒有意義了（這是違反直覺的，這是我想要的，但我將它用于不同的目的）?；蛘?，如果有人知道 Python 中實現相同目的的類似二進制拆分方法，請告訴我。

查看完整描述

1 回答

慕標琳琳

TA貢獻1830條經驗獲得超9個贊

我意識到一種方法是構建一個 max_depth=1 的決策樹。這將執行分裂成兩片葉子。然后挑出雜質最高的葉子繼續分裂，再次將決策樹擬合到這個子集上，如此重復。為確保層次結構清晰可見，我重新標記了 leaf_ids，以便清楚地看到，當您在樹上向上移動時，ID 值會下降。這是一個例子：

import numpy as np

from sklearn.tree import DecisionTreeClassifier

import pandas as pd

def decision_tree_one_path(X, y=None, min_leaf_size=3):

nobs = X.shape[0]

# boolean vector to include observations in the newest split

include = np.ones((nobs,), dtype=bool)

# try to get leaves around min_leaf_size

min_leaf_size = max(min_leaf_size, 1)

# one-level DT splitter

dtmodel = DecisionTreeClassifier(splitter="best", criterion="gini", max_depth=1, min_samples_split=int(np.round(2.05*min_leaf_size)))

leaf_id = np.ones((nobs,), dtype='int64')

iter = 0

if y is None:

y = np.random.binomial(n=1, p=0.5, size=nobs)

while nobs >= 2*min_leaf_size:

dtmodel.fit(X=X.loc[include], y=y[include])

# give unique node id

new_leaf_names = dtmodel.apply(X=X.loc[include])

impurities = dtmodel.tree_.impurity[1:]

if len(impurities) == 0:

# was not able to split while maintaining constraint

break

# make sure node that is not split gets the lower node_label 1

most_impure_node = np.argmax(impurities)

if most_impure_node == 0: # i.e., label 1

# switch 1 and 2 labels above

is_label_2 = new_leaf_names == 2

new_leaf_names[is_label_2] = 1

new_leaf_names[np.logical_not(is_label_2)] = 2

# rename leaves

leaf_id[include] = iter + new_leaf_names

will_be_split = new_leaf_names == 2

# ignore the other one

tmp = np.ones((nobs,), dtype=bool)

tmp[np.logical_not(will_be_split)] = False

include[include] = tmp

# now create new labels

nobs = np.sum(will_be_split)

iter = iter + 1

return leaf_id

leaf_id 因此是按順序觀察的葉子 ID。因此，例如 leaf_id==1 是第一個被拆分成終端節點的觀察結果。leaf_id==2 是下一個從生成 leaf_id==1 的拆分中拆分出來的終端節點，如下所示。因此有 k+1 個葉子。

#|\

#1 .

# |\

# 2 .

#.......

# |\

# k (k+1)

不過，我想知道是否有一種方法可以在 Python 中自動執行此操作。

反對回復 2023-04-11

1 回答
0 關注
159 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何強制Python決策樹每次只在一個節點上繼續分裂（每次形成一個節點/葉子）

如何強制Python決策樹每次只在一個節點上繼續分裂（每次形成一個節點/葉子）

1 回答

添加回答