亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

循環遍歷列表和行以在熊貓數據框中進行關鍵字匹配

循環遍歷列表和行以在熊貓數據框中進行關鍵字匹配

子衿沉夜 2022-07-26 09:43:40
我有一個看起來像這樣的數據框。它有 1 列標記為“話語”。df.utterances包含其值是 n 個單詞的字符串的行。                               utterances0                                        okay go ahead.1                                     Um, let me think.2     nan that's not very encouraging. If they had a...3     they wouldn't make you want to do it. nan nan ...4     Yeah. The problem is though, it just, if we pu...我也有一個特定單詞的列表。它被稱為specific_words。它看起來像這樣:specific_words = ['happy, 'good', 'encouraging', 'joyful']我想檢查是否specific_words在任何話語中找到了來自的任何單詞。本質上,我想遍歷 中的每一行df.utterance,當我這樣做時,循環specific_list查找匹配項。如果有匹配項,我希望 df.utterances 旁邊有一個布爾列來顯示這一點。def query_text_by_keyword(df, word_list):    for word in word_list:        for utt in df.utterance:            if word in utt:                match = True            else:                match = False            return match    df['query_match'] = df.apply(query_text_by_keyword,                                                axis=1,                                                args=(specific_words,))它不會中斷,但它只是為每一行返回 False,而它不應該中斷。例如,前幾行應如下所示: utterances                                                    query_match    0                                        okay go ahead.       False    1                                     Um, let me think.       False    2     nan that's not very encouraging. If they had a...       True    3     they wouldn't make you want to do it. nan nan ...       False    4     Yeah. The problem is though, it just, if we pu...       False編輯@furas 提出了一個很好的建議來解決最初的問題。但是,我還想添加另一列,其中包含查詢中表示匹配的特定單詞。例子: utterances                                                 query_match   word      0                                    okay go ahead    False      NaN    1                                 Um, let me think    False      NaN    2 nan that's not very encouraging. If they had a..    True   'encouraging'    3 they wouldn't make you want to do it. nan nan ..    False      NaN    4 Yeah. The problem is though, it just, if we pu..    False      NaN
查看完整描述

1 回答

?
慕姐8265434

TA貢獻1813條經驗 獲得超2個贊

你可以regex使用str.contains(regex)


df['utterances'].str.constains("happy|good|encouraging|joyful")

你可以regex用


query = '|'.join(specific_words)

您也可以使用str.lower(),因為字符串可能包含大寫字符。


import pandas as pd


df = pd.DataFrame({

    'utterances':[

        'okay go ahead',

        'Um, let me think.',

        'nan that\'s not very encouraging. If they had a...',

        'they wouldn\'t make you want to do it. nan nan ...',

        'Yeah. The problem is though, it just, if we pu...',

    ]

})


specific_words = ['happy', 'good', 'encouraging', 'joyful']


query = '|'.join(specific_words)


df['query_match'] = df['utterances'].str.lower().str.contains(query)


print(df)

結果


                                          utterances  query_match

0                                      okay go ahead        False

1                                  Um, let me think.        False

2  nan that's not very encouraging. If they had a...         True

3  they wouldn't make you want to do it. nan nan ...        False

4  Yeah. The problem is though, it just, if we pu...        False

編輯:正如@HenryYik 在評論中提到的,您可以使用case=False而不是str.lower()


df['query_match'] = df['utterances'].str.contains(query, case=False)

文檔中的更多內容:pandas.Series.str.contains


編輯:獲得匹配的單詞,你可以str.extract()使用regexin(...)


df['word'] = df['utterances'].str.extract( "(happy|good|encouraging|joyful)" )

工作示例:


import pandas as pd


df = pd.DataFrame({

    'utterances':[

        'okay go ahead',

        'Um, let me think.',

        'nan that\'s not very encouraging. If they had a...',

        'they wouldn\'t make you want to do it. nan nan ...',

        'Yeah. The problem is though, it just, if we pu...',

        'Yeah. happy good',

    ]

})


specific_words = ['happy', 'good', 'encouraging', 'joyful']


query = '|'.join(specific_words)


df['query_match'] = df['utterances'].str.contains(query, case=False)

df['word'] = df['utterances'].str.extract( '({})'.format(query) )


print(df)

在示例中,我添加'Yeah. happy good'了測試將返回哪個單詞happy或good. 它返回第一個匹配的單詞。


結果:


                                          utterances  query_match         word

0                                      okay go ahead        False          NaN

1                                  Um, let me think.        False          NaN

2  nan that's not very encouraging. If they had a...         True  encouraging

3  they wouldn't make you want to do it. nan nan ...        False          NaN

4  Yeah. The problem is though, it just, if we pu...        False          NaN

5                                   Yeah. happy good         True        happy

順便說一句:現在你甚至可以做


df['query_match'] = ~df['word'].isna()

或者


df['query_match'] = df['word'].notna()


查看完整回答
反對 回復 2022-07-26
  • 1 回答
  • 0 關注
  • 97 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號