我有一個需要從 pandas 數據框列中刪除的 4,000 個字符串的列表。我下面的代碼適用于下面的示例,但是當我在我的 20k+ 行的 pandas 數據幀上使用它時,它需要很長時間。關于加快速度的任何想法?import pandas as pdimport redf = pd.DataFrame( { "ID": [1, 2, 3, 4, 5], "name": [ "Hello Sam how is it going today? oh yeah", "Hello Jane how is it going today? oh yeah", "It is an Hello example how are you doing today?", "how is it going today?n[soldjgf ", "how is it going today Hello World", ], })my_list = ['how is it going today?n[soldjgf', 'how are you doing today?']# =============================================================================# p = re.compile('|'.join(map(re.escape, my_list)))df['cleaned_text'] = [p.sub(' ', text) for text in df['name']]
1 回答

絕地無雙
TA貢獻1946條經驗 獲得超4個贊
使用 df.str.replace()
p = re.compile('|'.join(map(re.escape, my_list)))
df['cleaned_text'] = df['name'].str.replace(p, ' ')
添加回答
舉報
0/150
提交
取消