3 回答

TA貢獻1752條經驗 獲得超4個贊
首先,正確的語法是meds_df[['readcode_1', 'readcode_2','generic_name']](list索引切片中的列名)。這就是為什么你得到一個KeyError.
要回答您的問題,這是一種實現方法:
# Updated to use tuple per David's suggestion
idx = pd.concat((med_df[col].astype(str).str.startswith(tuple(list_to_extract)) for col in ['readcode_1', 'readcode_2','generic_name']), axis=1).any(axis=1)
med_df.loc[idx]
結果:
ID readcode_1 readcode_2 generic_name
1 1001 bxd1 1.146785e+09 Simvastatin
3 1003 NaN NaN Pravastatin
5 1005 bxd4 4.543234e+07 NaN
10 1010 bxde NaN NaN

TA貢獻2012條經驗 獲得超12個贊
您可以通過這種方式進行申請:
list_to_extract = ["bxd", "Simvastatin", "1146785342", "45432344", "Pravastatin"]
bool_df = df[['readcode_1', 'readcode_2','generic_name']].apply(lambda x: x.str.startswith(tuple(list_to_extract), na=False), axis=1)
df.loc[bool_df[bool_df.any(axis=1)].index]
輸出:
ID readcode_1 readcode_2 generic_name
1 1001 bxd1 1.146785e+09 Simvastatin
3 1003 NaN NaN Pravastatin
5 1005 bxd4 4.543234e+07 NaN
10 1010 bxde NaN NaN
感謝 r.ook 發現了一個小錯誤

TA貢獻1776條經驗 獲得超12個贊
另一種解決方案,在重新創建數據幀之前,字符串處理發生在 vanilla python 中:
list_to_extract = ["bxd", "Simvastatin", "1146785342", "45432344", "Pravastatin"]
cols_to_search = ['readcode_1', 'readcode_2','generic_name']
output = [(ID, *searchbox)
for ID, searchbox in zip(df.ID,df.filter(cols_to_search).to_numpy())
if any([str(box).startswith(tuple(list_to_extract)) for box in searchbox])]
pd.DataFrame(output, columns = df.columns)
ID readcode_1 readcode_2 generic_name
0 1001 bxd1 1.146785e+09 Simvastatin
1 1003 NaN NaN Pravastatin
2 1005 bxd4 4.543234e+07 NaN
3 1010 bxde NaN NaN
添加回答
舉報