亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

保存 sklearn 管道的中間結果

保存 sklearn 管道的中間結果

慕姐8265434 2022-07-19 20:56:51
我有一個代碼示例 - 具有兩個組件(PCA 和隨機森林)的 sklearn 管道,我想使用管道的中間結果以帶來一些可解釋性。我知道可以使用 .get_params() 來查看中間步驟,但是是否可以保存或提取中間結果以進行其他操作?我想應用 PCA 的附加功能(代碼中的 1.1 和 1.2 部分)from sklearn.datasets import load_breast_cancerimport numpy as npimport pandas as pdfrom sklearn.decomposition import FastICA, PCAfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.pipeline import Pipelinefrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_reportfrom sklearn.metrics import confusion_matrix#Convert the dataset to data framecancer = load_breast_cancer()     data = np.c_[cancer.data, cancer.target]columns = np.append(cancer.feature_names, ["target"])df = pd.DataFrame(data, columns=columns)#Split data into train and test X = df.iloc[:, 0:30].valuesY = df.iloc[:, 30].valuesX_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)#Create a pipeline n_comp = 12clf = Pipeline([('pca', PCA(n_comp)), ('RandomForest', RandomForestClassifier(n_estimators=100))])clf.fit(X_train, Y_train)#Evalute the pipeline cr = classification_report(Y_test, Y_pred)print(cr)#see the intermediate steps of the pipelineprint(clf.get_params()['pca'])##1.1 if I create PCA outside of the pipeline pca = PCA(n_components=10)principalComponents = pca.fit_transform(X)##1.2 some explainability on pca outside of the pipeline pca.explained_variance_ratio_
查看完整描述

1 回答

?
智慧大石

TA貢獻1946條經驗 獲得超3個贊

我們可以分配get_params()給一個應該返回類型對象的變量sklearn.decomposition.pca.PCA。有了這個,我們就可以訪問分解的所有方法和屬性。


from sklearn.datasets import load_breast_cancer

import numpy as np

import pandas as pd

from sklearn.decomposition import FastICA, PCA

from sklearn.ensemble import RandomForestClassifier

from sklearn.pipeline import Pipeline

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix


#Convert the dataset to data frame

cancer = load_breast_cancer()     

data = np.c_[cancer.data, cancer.target]

columns = np.append(cancer.feature_names, ["target"])

df = pd.DataFrame(data, columns=columns)



#Split data into train and test 

X = df.iloc[:, 0:30].values

Y = df.iloc[:, 30].values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)



#Create a pipeline 

n_comp = 12

clf = Pipeline([('pca', PCA(n_comp)), ('RandomForest', RandomForestClassifier(n_estimators=100))])

clf.fit(X_train, Y_train)



### --- ###

pca = clf.get_params()['pca']


type(pca)

#sklearn.decomposition.pca.PCA


pca.explained_variance_ratio_

#array([9.81327198e-01, 1.67333696e-02, 1.73934848e-03, 1.05758996e-04,

#       8.29268494e-05, 6.34081771e-06, 3.75309113e-06, 7.08990845e-07,

#       3.16742542e-07, 1.75055859e-07, 7.11274270e-08, 1.43003803e-08])


pca.components_.shape

#(12, 30)

希望這可以幫助。


查看完整回答
反對 回復 2022-07-19
  • 1 回答
  • 0 關注
  • 145 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號