已解決430363個問題，去搜搜看，總會有你想問的

Gridsearch for NLP - 如何結合 CountVec 和其他功能？

首頁猿問 Gridsearch for...

Gridsearch for NLP - 如何結合 CountVec 和其他功能？

Python

郎朗坤 2023-10-26 16:33:16

我正在做一個關于情感分析的基本 NLP 項目，我想使用 GridsearchCV 來優化我的模型。下面的代碼顯示了我正在使用的示例數據框?！癈ontent”是要傳遞給 CountVectorizer 的列，“label”是要預測的 y 列，feature_1、feature_2 也是我希望包含在模型中的列。'content': 'Got flat way today Pot hole Another thing tick crap thing happen week list','feature_1': '1', 'feature_2': '34', 'label':1}, {'content': 'UP today Why doe head hurt badly','feature_1': '5', 'feature_2': '142', 'label':1},{'content': 'spray tan fail leg foot Ive scrubbing foot look better ', 'feature_1': '7', 'feature_2': '123', 'label':0},])我正在關注 stackoverflow 的答案：使用管道和網格搜索執行功能選擇from sklearn.pipeline import FeatureUnion, Pipelinefrom sklearn.base import TransformerMixin, BaseEstimatorclass CustomFeatureExtractor(BaseEstimator, TransformerMixin): def __init__(self, feature_1=True, feature_2=True): self.feature_1=feature_1 self.feature_2=feature_2 def extractor(self, tweet): features = [] if self.feature_2: features.append(df['feature_2']) if self.feature_1: features.append(df['feature_1']) return np.array(features) def fit(self, raw_docs, y): return self def transform(self, raw_docs): return np.vstack(tuple([self.extractor(tweet) for tweet in raw_docs]))下面是我嘗試將數據框放入的網格搜索：lr = LogisticRegression()# Pipelinepipe = Pipeline([('features', FeatureUnion([("vectorizer", CountVectorizer(df['content'])), ("extractor", CustomFeatureExtractor())])) ,('classifier', lr()) ])But yields results: TypeError: 'LogisticRegression' object is not callable想知道是否還有其他更簡單的方法可以做到這一點？

查看完整描述

1 回答

catspeake

TA貢獻1111條經驗獲得超0個贊

from sklearn.pipeline import FeatureUnion, Pipeline

from sklearn.base import TransformerMixin, BaseEstimator

class CustomFeatureExtractor(BaseEstimator, TransformerMixin):

def __init__(self, feature_1=True, feature_2=True):

self.feature_1=feature_1

self.feature_2=feature_2

def extractor(self, tweet):

features = []

if self.feature_2:

features.append(df['feature_2'])

if self.feature_1:

features.append(df['feature_1'])

return np.array(features)

def fit(self, raw_docs, y):

return self

def transform(self, raw_docs):

return np.vstack(tuple([self.extractor(tweet) for tweet in raw_docs]))

下面是我嘗試將數據框放入的網格搜索：

lr = LogisticRegression()

# Pipeline

pipe = Pipeline([('features', FeatureUnion([("vectorizer", CountVectorizer(df['content'])),

("extractor", CustomFeatureExtractor())]))

,('classifier', lr())

])

But yields results: TypeError: 'LogisticRegression' object is not callable

想知道是否還有其他更簡單的方法可以做到這一點？

反對回復 2023-10-26

1 回答
0 關注
175 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Gridsearch for NLP - 如何結合 CountVec 和其他功能？

Gridsearch for NLP - 如何結合 CountVec 和其他功能？

1 回答

添加回答

Gridsearch for NLP - 如何結合 CountVec 和其他功能？

Gridsearch for NLP - 如何結合 CountVec 和其他功能？