首頁猿問循環遍歷列表和行以在熊貓數據框中進...

循環遍歷列表和行以在熊貓數據框中進行關鍵字匹配

Python

子衿沉夜 2022-07-26 09:43:40

我有一個看起來像這樣的數據框。它有 1 列標記為“話語”。df.utterances包含其值是 n 個單詞的字符串的行。 utterances0 okay go ahead.1 Um, let me think.2 nan that's not very encouraging. If they had a...3 they wouldn't make you want to do it. nan nan ...4 Yeah. The problem is though, it just, if we pu...我也有一個特定單詞的列表。它被稱為specific_words。它看起來像這樣：specific_words = ['happy, 'good', 'encouraging', 'joyful']我想檢查是否specific_words在任何話語中找到了來自的任何單詞。本質上，我想遍歷中的每一行df.utterance，當我這樣做時，循環specific_list查找匹配項。如果有匹配項，我希望 df.utterances 旁邊有一個布爾列來顯示這一點。def query_text_by_keyword(df, word_list): for word in word_list: for utt in df.utterance: if word in utt: match = True else: match = False return match df['query_match'] = df.apply(query_text_by_keyword, axis=1, args=(specific_words,))它不會中斷，但它只是為每一行返回 False，而它不應該中斷。例如，前幾行應如下所示： utterances query_match 0 okay go ahead. False 1 Um, let me think. False 2 nan that's not very encouraging. If they had a... True 3 they wouldn't make you want to do it. nan nan ... False 4 Yeah. The problem is though, it just, if we pu... False編輯@furas 提出了一個很好的建議來解決最初的問題。但是，我還想添加另一列，其中包含查詢中表示匹配的特定單詞。例子： utterances query_match word 0 okay go ahead False NaN 1 Um, let me think False NaN 2 nan that's not very encouraging. If they had a.. True 'encouraging' 3 they wouldn't make you want to do it. nan nan .. False NaN 4 Yeah. The problem is though, it just, if we pu.. False NaN

查看完整描述

1 回答

慕姐8265434

TA貢獻1813條經驗獲得超2個贊

你可以regex使用str.contains(regex)

df['utterances'].str.constains("happy|good|encouraging|joyful")

你可以regex用

query = '|'.join(specific_words)

您也可以使用str.lower()，因為字符串可能包含大寫字符。

import pandas as pd

df = pd.DataFrame({

'utterances':[

'okay go ahead',

'Um, let me think.',

'nan that\'s not very encouraging. If they had a...',

'they wouldn\'t make you want to do it. nan nan ...',

'Yeah. The problem is though, it just, if we pu...',

]

})

specific_words = ['happy', 'good', 'encouraging', 'joyful']

query = '|'.join(specific_words)

df['query_match'] = df['utterances'].str.lower().str.contains(query)

print(df)

結果

utterances query_match

0 okay go ahead False

1 Um, let me think. False

2 nan that's not very encouraging. If they had a... True

3 they wouldn't make you want to do it. nan nan ... False

4 Yeah. The problem is though, it just, if we pu... False

編輯：正如@HenryYik 在評論中提到的，您可以使用case=False而不是str.lower()

df['query_match'] = df['utterances'].str.contains(query, case=False)

文檔中的更多內容：pandas.Series.str.contains

編輯：獲得匹配的單詞，你可以str.extract()使用regexin(...)

df['word'] = df['utterances'].str.extract( "(happy|good|encouraging|joyful)" )

工作示例：

import pandas as pd

df = pd.DataFrame({

'utterances':[

'okay go ahead',

'Um, let me think.',

'nan that\'s not very encouraging. If they had a...',

'they wouldn\'t make you want to do it. nan nan ...',

'Yeah. The problem is though, it just, if we pu...',

'Yeah. happy good',

]

})

specific_words = ['happy', 'good', 'encouraging', 'joyful']

query = '|'.join(specific_words)

df['query_match'] = df['utterances'].str.contains(query, case=False)

df['word'] = df['utterances'].str.extract( '({})'.format(query) )

print(df)

在示例中，我添加'Yeah. happy good'了測試將返回哪個單詞happy或good. 它返回第一個匹配的單詞。

結果：

utterances query_match word

0 okay go ahead False NaN

1 Um, let me think. False NaN

2 nan that's not very encouraging. If they had a... True encouraging

3 they wouldn't make you want to do it. nan nan ... False NaN

4 Yeah. The problem is though, it just, if we pu... False NaN

5 Yeah. happy good True happy

順便說一句：現在你甚至可以做

df['query_match'] = ~df['word'].isna()

或者

df['query_match'] = df['word'].notna()

反對回復 2022-07-26

1 回答
0 關注
97 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

循環遍歷列表和行以在熊貓數據框中進行關鍵字匹配

循環遍歷列表和行以在熊貓數據框中進行關鍵字匹配

1 回答

添加回答