1 回答
TA貢獻1813條經驗 獲得超2個贊
你可以regex使用str.contains(regex)
df['utterances'].str.constains("happy|good|encouraging|joyful")
你可以regex用
query = '|'.join(specific_words)
您也可以使用str.lower(),因為字符串可能包含大寫字符。
import pandas as pd
df = pd.DataFrame({
'utterances':[
'okay go ahead',
'Um, let me think.',
'nan that\'s not very encouraging. If they had a...',
'they wouldn\'t make you want to do it. nan nan ...',
'Yeah. The problem is though, it just, if we pu...',
]
})
specific_words = ['happy', 'good', 'encouraging', 'joyful']
query = '|'.join(specific_words)
df['query_match'] = df['utterances'].str.lower().str.contains(query)
print(df)
結果
utterances query_match
0 okay go ahead False
1 Um, let me think. False
2 nan that's not very encouraging. If they had a... True
3 they wouldn't make you want to do it. nan nan ... False
4 Yeah. The problem is though, it just, if we pu... False
編輯:正如@HenryYik 在評論中提到的,您可以使用case=False而不是str.lower()
df['query_match'] = df['utterances'].str.contains(query, case=False)
文檔中的更多內容:pandas.Series.str.contains
編輯:獲得匹配的單詞,你可以str.extract()使用regexin(...)
df['word'] = df['utterances'].str.extract( "(happy|good|encouraging|joyful)" )
工作示例:
import pandas as pd
df = pd.DataFrame({
'utterances':[
'okay go ahead',
'Um, let me think.',
'nan that\'s not very encouraging. If they had a...',
'they wouldn\'t make you want to do it. nan nan ...',
'Yeah. The problem is though, it just, if we pu...',
'Yeah. happy good',
]
})
specific_words = ['happy', 'good', 'encouraging', 'joyful']
query = '|'.join(specific_words)
df['query_match'] = df['utterances'].str.contains(query, case=False)
df['word'] = df['utterances'].str.extract( '({})'.format(query) )
print(df)
在示例中,我添加'Yeah. happy good'了測試將返回哪個單詞happy或good. 它返回第一個匹配的單詞。
結果:
utterances query_match word
0 okay go ahead False NaN
1 Um, let me think. False NaN
2 nan that's not very encouraging. If they had a... True encouraging
3 they wouldn't make you want to do it. nan nan ... False NaN
4 Yeah. The problem is though, it just, if we pu... False NaN
5 Yeah. happy good True happy
順便說一句:現在你甚至可以做
df['query_match'] = ~df['word'].isna()
或者
df['query_match'] = df['word'].notna()
添加回答
舉報
