已解決430363個問題，去搜搜看，總會有你想問的

Scikit learn/pandas - 使用機器學習預測用戶輸入的文本（存在于 xlsx 中）

首頁猿問 Scikit...

Scikit learn/pandas - 使用機器學習預測用戶輸入的文本（存在于 xlsx 中）

Python

墨色風雨 2022-06-02 16:25:15

我有一個帶有預定義文本的 Xlsx 文件，其中只有一列。用戶將輸入一個或多個單詞，輸出將是包含一個或多個單詞的文本。import numpy as npimport pandas as pdimport timeimport refrom sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer, TfidfTransformerfrom sklearn.metrics.pairwise import linear_kernel, cosine_similarityfrom sklearn.metrics.pairwise import pairwise_distancesimport pickledef load_df(path): df = pd.read_excel(path) print(df.shape) return dfdef splitDataFrameList(df, target_column, separator): def splitListToRows(row, row_accumulator, target_column, separator): split_row = row[target_column].split(separator) for s in split_row: new_row = row.to_dict() new_row[target_column] = s row_accumulator.append(new_row) new_rows = [] df.apply(splitListToRows, axis=1, args=(new_rows, target_column, separator)) new_df = pd.DataFrame(new_rows) return new_dfclass Autocompleter: def __init__(self): pass def import_json(self, json_filename): print("load Excel file...") df = load_df(json_filename) return df def process_data(self, new_df): # print("select representative threads...") # new_df = new_df[new_df.IsFromCustomer == False] print("split sentenses on punctuation...") for sep in ['. ', ', ', '? ', '! ', '; ']: new_df = splitDataFrameList(new_df, 'UserSays', sep) print("UserSays Cleaning using simple regex...")在輸入中，如果我什么都不輸入，它會為我提供這個輸出['How to access outlook on open network?', 'Email access outside ril network', 'Log in outlook away from office']這是不希望的，如果只有一個文本匹配它會給出以下輸出input - sccm['What is sccm', 'How to access outlook on open network?', 'Email access outside ril network']我希望以這樣的方式輸出，如果輸入的單詞或單詞不存在于 xlsx 文件中，那么輸出不應該返回任何東西。和

查看完整描述

1 回答

繁花不似錦

TA貢獻1851條經驗獲得超4個贊

我認為您的代碼返回的值的相似度得分為 0。您可以更改generate_completions函數中的行以僅保留相似性得分大于零的值：

similarity_scores = [i for i in similarity_scores if i[1] > 0]

反對回復 2022-06-02

1 回答
0 關注
111 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

Scikit learn/pandas - 使用機器學習預測用戶輸入的文本（存在于 xlsx 中）

Scikit learn/pandas - 使用機器學習預測用戶輸入的文本（存在于 xlsx 中）

1 回答

添加回答