我正在尋找基于另一個只有三列的小得多的數據框來過濾一個大數據框(數百萬行):ID、開始、結束。以下是我放在一起的內容(有效),但似乎groupby()或np.where可能更快。設置:import pandas as pdimport iocsv = io.StringIO(u'''time id num2018-01-01 00:00:00 A 12018-01-01 01:00:00 A 22018-01-01 02:00:00 A 32018-01-01 03:00:00 A 42018-01-01 04:00:00 A 52018-01-01 05:00:00 A 62018-01-01 06:00:00 A 62018-01-03 07:00:00 B 102018-01-03 08:00:00 B 112018-01-03 09:00:00 B 122018-01-03 10:00:00 B 132018-01-03 11:00:00 B 142018-01-03 12:00:00 B 152018-01-03 13:00:00 B 162018-05-29 23:00:00 C 1112018-05-30 00:00:00 C 1222018-05-30 01:00:00 C 1332018-05-30 02:00:00 C 1442018-05-30 03:00:00 C 155''')df = pd.read_csv(csv, sep = '\t')df['time'] = pd.to_datetime(df['time'])csv_filter = io.StringIO(u'''id start endA 2018-01-01 01:00:00 2018-01-01 02:00:00B 2018-01-03 09:00:00 2018-01-03 12:00:00C 2018-05-30 00:00:00 2018-05-30 08:00:00''')df_filter = pd.read_csv(csv_filter, sep = '\t')df_filter['start'] = pd.to_datetime(df_filter['start'])df_filter['end'] = pd.to_datetime(df_filter['end'])工作代碼df = pd.merge_asof(df, df_filter, left_on = 'time', right_on = 'start', by = 'id').dropna(subset = ['start']).drop(['start','end'], axis = 1)df = pd.merge_asof(df, df_filter, left_on = 'time', right_on = 'end', by = 'id', direction = 'forward').dropna(subset = ['end']).drop(['start','end'], axis = 1)輸出 time id num0 2018-01-01 01:00:00 A 21 2018-01-01 02:00:00 A 36 2018-01-03 09:00:00 B 127 2018-01-03 10:00:00 B 138 2018-01-03 11:00:00 B 149 2018-01-03 12:00:00 B 1511 2018-05-30 00:00:00 C 12212 2018-05-30 01:00:00 C 13313 2018-05-30 02:00:00 C 14414 2018-05-30 03:00:00 C 155關于更優雅/更快的解決方案的任何想法?
1 回答

慕的地6264312
TA貢獻1817條經驗 獲得超6個贊
為什么不是merge在過濾之前。請注意,當數據集很大時,這會占用您的內存。
newdf=df.merge(df_filter)
newdf=newdf.loc[newdf.time.between(newdf.start,newdf.end),df.columns.tolist()]
newdf
Out[480]:
time id num
1 2018-01-01 01:00:00 A 2
2 2018-01-01 02:00:00 A 3
9 2018-01-03 09:00:00 B 12
10 2018-01-03 10:00:00 B 13
11 2018-01-03 11:00:00 B 14
12 2018-01-03 12:00:00 B 15
15 2018-05-30 00:00:00 C 122
16 2018-05-30 01:00:00 C 133
17 2018-05-30 02:00:00 C 144
18 2018-05-30 03:00:00 C 155
添加回答
舉報
0/150
提交
取消