3 回答

TA貢獻1830條經驗 獲得超3個贊
為了檢查你的結果,你可以使用 sklearn.metrics
from sklearn.metrics import classification_report
print(classification_report(y, model.predict(x)))
precision recall f1-score support
0 1.00 1.00 1.00 50
1 1.00 1.00 1.00 50
2 1.00 1.00 1.00 50
accuracy 1.00 150
macro avg 1.00 1.00 1.00 150
weighted avg 1.00 1.00 1.00 150
如果您對結果有疑問,請目視檢查。
print(model.predict(x))

TA貢獻1785條經驗 獲得超8個贊
您在機器學習中犯了一個根本性錯誤 - 根據用于訓練它的數據評估模型。相反,您需要將數據分成兩組 - 訓練和測試。在訓練數據上訓練您的模型,并在測試數據上進行評估。請參閱https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
嘗試這樣的事情:
x_train, x_test, y_train, y_test = train_test_split(x, y)
model = tree_clf.fit(x_train,y_train)
accuracy=tree_clf.score(x_test, y_test)
要了解為什么這是一個問題,請考慮“作弊”模型的極端情況,它只記住輸入數據并輸出它記住的任何內容。使用您的代碼,它將獲得 100% 的準確性,同時什么也學不到。

TA貢獻1847條經驗 獲得超7個贊
所以我根據建議實施和更改并實施了我為 150 個數據點(120 個訓練和 30 個測試)創建的一些源代碼所以我的問題是我的分類報告實施是否正確?發件人
import pandas as pd
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
def accuracy(y_true,y_predict):
count=0;
for i in range(0,len(y_true)):
if y_true[i] == y_predict[i]:
count=count+1;
return(count*100*1.0/len(y_true));
#reading trainning data
train_data=pd.read_csv("iris_train_data.csv",header=0)
x_train=train_data.values[:,0:4];
y_train=train_data.values[:,4];
#training the classifier
clf=DecisionTreeClassifier(criterion= 'entropy');
clf.fit(x_train,y_train);
print('Depth of learnt tree is ',clf.tree_.max_depth)
#t=clf.get_n_leaves()
print('Number of leaf nodes in learnt tree is 9','\n')
#reading test data
test_data=pd.read_csv("iris_test_data.csv",header=0)
x_test=test_data.values[:,0:4];
y_test=test_data.values[:,4];
#Training accuracy and Test accuracy without pruning
print('Training accuracy of classifier is ',accuracy(y_train,clf.predict(x_train)))
print('Test accuracy using classifier is ',accuracy(y_test,clf.predict(x_test)),'\n')
import pandas as pd
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
def accuracy(y_true,y_predict):
count=0;
for i in range(0,len(y_true)):
if y_true[i] == y_predict[i]:
count=count+1;
return(count*100*1.0/len(y_true));
def pruning_by_max_leaf_nodes(t):
for i in range(1, t-1):
clfnxt1 = DecisionTreeClassifier(criterion= 'entropy',max_leaf_nodes=t-i);
clfnxt1.fit(x_train,y_train)
print('Max_leaf_nodes = ',t-i,'Test Accuracy = ',accuracy(y_test,clfnxt1.predict(x_test)))
return;
def pruning_by_max_depth(t):
for i in range(1, t):
clfnxt2 = DecisionTreeClassifier(criterion= 'entropy',max_depth=t-i);
clfnxt2.fit(x_train,y_train)
print('Max_depth = ',clfnxt2.tree_.max_depth,'Test Accuracy = ',accuracy(y_test,clfnxt2.predict(x_test)))
return;
#reading trainning data
train_data=pd.read_csv("iris_train_data.csv",header=0)
x_train=train_data.values[:,0:4];
y_train=train_data.values[:,4];
#training the classifier
clf=DecisionTreeClassifier(criterion= 'entropy');
clf.fit(x_train,y_train);
print('Depth of learnt tree is ',clf.tree_.max_depth)
#t=clf.get_n_leaves()
print('Number of leaf nodes in learnt tree is 9','\n')
#reading test data
test_data=pd.read_csv("iris_test_data.csv",header=0)
x_test=test_data.values[:,0:4];
y_test=test_data.values[:,4];
#Pruning by reducing max_depth
print('Pruning case1:By reducing the max_depth of the tree')
pruning_by_max_depth(clf.tree_.max_depth)
print('')
t=9;
#Pruning by reducing max_leaf_nodes
print('Pruning case2:By reducing the max_leaf_nodes of the tree')
pruning_by_max_leaf_nodes(t);
print(classification_report(y_test, clf.fit(x_train,y_train).predict(x_test)))
添加回答
舉報