已解決430363個問題，去搜搜看，總會有你想問的

在 sklearn 中為 dictvectorizer 和 Linearsvc 創建管道

首頁猿問在 sklearn 中為...

在 sklearn 中為 dictvectorizer 和 Linearsvc 創建管道

Python

喵喔喔 2023-07-18 10:23:35

根據我所讀到的內容，我需要創建模型并將其保存為管道才能執行此操作。我一直在嘗試根據 SO 上的其他示例來執行此操作，但無法使其工作。如何將現有模型轉變為流水線版本？第一個代碼片段保存，第二個代碼片段是我將其放入管道的嘗試之一，但我收到“str”對象沒有屬性“items”錯誤。我認為這與 to_dict 過程有關，但不知道如何在管道版本中復制它，任何人都可以提供幫助。dframe = pd.read_csv("ner.csv", encoding = "ISO-8859-1", error_bad_lines=False)dframe.dropna(inplace=True)dframe[dframe.isnull().any(axis=1)].size?x_df = dframe.drop(['Unnamed: 0', 'sentence_idx', 'tag'], axis=1)vectorizer = DictVectorizer()X = vectorizer.fit_transform(x_df.to_dict("records"))y = dframe.tag.valuesx_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)model = LinearSVC(loss="squared_hinge",C=0.5,class_weight='balanced',multi_class='ovr')model.fit(x_train, y_train)dump(model, 'filename.joblib')?dframe = pd.read_csv("ner.csv", encoding = "ISO-8859-1", error_bad_lines=False)dframe.dropna(inplace=True)dframe[dframe.isnull().any(axis=1)].size?x_df = dframe.drop(['Unnamed: 0', 'sentence_idx', 'tag'], axis=1)y = dframe.tag.valuesx_train, x_test, y_train, y_test = train_test_split(x_df, y, test_size=0.1, random_state=0)pipe = Pipeline([('vectorizer', DictVectorizer(x_df.to_dict("records"))), ('model', LinearSVC)])?pipe.fit(x_train, y_train)

查看完整描述

1 回答

慕容708150

TA貢獻1831條經驗獲得超4個贊

你必須像這樣調整你的第二部分：

dframe = pd.read_csv("ner.csv", encoding = "ISO-8859-1", error_bad_lines=False)

dframe.dropna(inplace=True)

dframe[dframe.isnull().any(axis=1)].size?

x_df = dframe.drop(['Unnamed: 0', 'sentence_idx', 'tag'], axis=1)

y = dframe.tag.values

x_train, x_test, y_train, y_test = train_test_split(x_df.to_dict("records"), y, test_size=0.1, random_state=0)

pipe = Pipeline([('vectorizer', DictVectorizer()), ('model', LinearSVC(loss="squared_hinge",C=0.5,class_weight='balanced',multi_class='ovr'))])?

pipe.fit(x_train, y_train)

您試圖DictVectorizer()通過使用在參數中傳遞您的數據

DictVectorizer(x_df.to_dict("記錄"))

但這不起作用。DictVectorizer 的唯一可用參數可以在文檔中找到。

第二個錯誤是您嘗試將 DictVectorizer() 與來自 x_df 的數據一起放入管道中

管道.fit（x_train，y_train）

這里的問題是 x_train 數據將提供給您的DictVectorizer()，但 x_train 只是分割 x_df ，并且在您的代碼中沒有管道的早期，您ictVectorizer()以的形式向 D 提供了數據x_df.to_dict("records")。

因此，您還需要通過管道傳遞相同類型的數據。這就是為什么我已經將調整后的代碼中x_df.to_dict("records")的與分開train_test_split()，以便矢量化器可以處理它。

最后一件事是，在定義管道時您還忘記了括號LinearSVC()

（“模型”，LinearSVC）

反對回復 2023-07-18

1 回答
0 關注
105 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

在 sklearn 中為 dictvectorizer 和 Linearsvc 創建管道

在 sklearn 中為 dictvectorizer 和 Linearsvc 創建管道

1 回答

添加回答