首頁猿問如何獲得scikit學習分類器的大...

如何獲得scikit學習分類器的大多數信息功能？

Python

大話西游666 2019-11-25 14:13:08

諸如liblinear和nltk之類的機器學習包中的分類器提供了一個method show_most_informative_features()，它對于調試功能確實很有幫助：viagra = None ok : spam = 4.5 : 1.0hello = True ok : spam = 4.5 : 1.0hello = None spam : ok = 3.3 : 1.0viagra = True spam : ok = 3.3 : 1.0casino = True spam : ok = 2.0 : 1.0casino = None ok : spam = 1.5 : 1.0我的問題是，是否對scikit-learn中的分類器實施了類似的操作。我搜索了文檔，但找不到類似的東西。如果尚無此類功能，是否有人知道如何解決這些值的解決方法？非常感謝！

查看完整描述

3 回答

翻閱古今

TA貢獻1780條經驗獲得超5個贊

分類器本身不記錄要素名稱，它們僅顯示數字數組。但是，如果您使用Vectorizer/ CountVectorizer/ TfidfVectorizer/ 提取了特征DictVectorizer，并且使用的是線性模型（例如LinearSVCNaive Bayes或Naive Bayes），則可以應用文檔分類示例所使用的技巧。示例（未經測試，可能包含一個或兩個錯誤）：

def print_top10(vectorizer, clf, class_labels):

"""Prints features with the highest coefficient values, per class"""

feature_names = vectorizer.get_feature_names()

for i, class_label in enumerate(class_labels):

top10 = np.argsort(clf.coef_[i])[-10:]

print("%s: %s" % (class_label,

" ".join(feature_names[j] for j in top10)))

這是用于多類分類的；對于二進制情況，我認為您應該clf.coef_[0]只使用。您可能需要對進行排序class_labels。

反對回復 2019-11-25

飲歌長嘯

TA貢獻1951條經驗獲得超3個贊

在larsmans代碼的幫助下，我想到了以下二進制情況的代碼：

def show_most_informative_features(vectorizer, clf, n=20):

feature_names = vectorizer.get_feature_names()

coefs_with_fns = sorted(zip(clf.coef_[0], feature_names))

top = zip(coefs_with_fns[:n], coefs_with_fns[:-(n + 1):-1])

for (coef_1, fn_1), (coef_2, fn_2) in top:

print "\t%.4f\t%-15s\t\t%.4f\t%-15s" % (coef_1, fn_1, coef_2, fn_2)

反對回復 2019-11-25

瀟瀟雨雨

TA貢獻1833條經驗獲得超4個贊

實際上，我必須在NaiveBayes分類器上找到功能重要性，盡管我使用了上述功能，但無法基于類獲得功能重要性。我瀏覽了scikit-learn的文檔，并對上述功能進行了一些調整，以發現它可以解決我的問題。希望它也對您有幫助！

def important_features(vectorizer,classifier,n=20):

class_labels = classifier.classes_

feature_names =vectorizer.get_feature_names()

topn_class1 = sorted(zip(classifier.feature_count_[0], feature_names),reverse=True)[:n]

topn_class2 = sorted(zip(classifier.feature_count_[1], feature_names),reverse=True)[:n]

print("Important words in negative reviews")

for coef, feat in topn_class1:

print(class_labels[0], coef, feat)

print("-----------------------------------------")

print("Important words in positive reviews")

for coef, feat in topn_class2:

print(class_labels[1], coef, feat)

請注意，您的分類器（在我的情況下是NaiveBayes）必須具有feature_count_屬性才能起作用。

反對回復 2019-11-25

3 回答
0 關注
350 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何獲得scikit學習分類器的大多數信息功能？

如何獲得scikit學習分類器的大多數信息功能？

3 回答

添加回答

如何獲得scikit學習分類器的大多數信息功能？

如何獲得scikit學習分類器的大多數信息功能？