3 回答

TA貢獻1943條經驗 獲得超7個贊
d = {'date': 'last', 'open': 'last',
? ? ?'high': 'max', 'low': 'min', 'close': 'last'}
# df.index = pd.to_datetime(df.index)
s1 = df.between_time('09:00:00', '09:16:00')
s2 = s1.reset_index().groupby(s1.index.date).agg(d).set_index('date')
df1 = pd.concat([df.drop(s1.index), s2]).sort_index()
細節:
用于DataFrame.between_time
過濾數據框中介于以下df
時間之間的行:09:00
09:16
print(s1)?????????open???high????low??close date??????????????????????????????????????????? 2017-01-02?09:08:00??116.0??116.0??116.0??116.0 2017-01-02?09:16:00??116.1??117.8??117.0??113.0
用于DataFrame.groupby
將此過濾后的數據幀分組s1
并date
使用字典進行聚合d
:
print(s2)?????????open???high????low??close date??????????????????????????????????????????? 2017-01-02?09:16:00??116.1??117.8??116.0??113.0
使用從原始數據幀中刪除介于 時間 之間的DataFrame.drop
行,然后使用將其與 相連接,最后使用對索引進行排序:df
09:00-09:16
pd.concat
s2
DataFrame.sort_index
print(df1)
? ? ? ? ? ? ? ? ? ? ? ?open? ? high? ? ?low? ?close
date? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
2017-01-02 09:16:00? 116.10? 117.80? 116.00? 113.00
2017-01-02 09:17:00? 115.50? 116.20? 115.50? 116.20
2017-01-02 09:18:00? 116.05? 116.35? 116.00? 116.00
2017-01-02 09:19:00? 116.00? 116.00? 115.60? 115.75
2019-12-29 15:57:00? 260.00? 260.00? 260.00? 260.00
2019-12-29 15:58:00? 260.00? 260.00? 259.35? 259.35
2019-12-29 15:59:00? 260.00? 260.00? 260.00? 260.00
2019-12-29 16:36:00? 259.35? 259.35? 259.35? 259.35
2029-12-29 15:56:00? 259.35? 259.35? 259.35? 259.35

TA貢獻1846條經驗 獲得超7個贊
利用 @r-beginners 數據并添加額外的幾行:
import pandas as pd
import numpy as np
import io
data = '''
datetime open high low close
"2017-01-02 09:08:00" 116.00 116.00 116.00 116.00
"2017-01-02 09:16:00" 116.10 117.80 117.00 113.00
"2017-01-02 09:17:00" 115.50 116.20 115.50 116.20
"2017-01-02 09:18:00" 116.05 116.35 116.00 116.00
"2017-01-02 09:19:00" 116.00 116.00 115.60 115.75
"2017-01-03 09:08:00" 259.35 259.35 259.35 259.35
"2017-01-03 09:09:00" 260.00 260.00 260.00 260.00
"2017-01-03 09:16:00" 260.00 260.00 260.00 260.00
"2017-01-03 09:17:00" 261.00 261.00 261.00 261.00
"2017-01-03 09:18:00" 262.00 262.00 262.00 262.00
"2017-12-03 09:18:00" 260.00 260.00 259.35 259.35
"2017-12-04 09:05:00" 260.00 260.00 260.00 260.00
"2017-12-04 09:22:00" 259.35 259.35 259.35 259.35
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
下面的代碼開始了整個過程??赡懿皇亲詈玫姆椒?,但這是快速而骯臟的方法:
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')
df['date'] = df.index.date
dates = np.unique(df.index.date)
first_rows = df.between_time('9:16', '00:00').reset_index().groupby('date').first().set_index('datetime')
first_rows['date'] = first_rows.index.date
dffs = []
for d in dates:
df_day = df[df['date'] == d].sort_index()
first_bar_of_the_day = first_rows[first_rows['date'] == d].copy()
bars_until_first = df_day.loc[df_day.index <= first_bar_of_the_day.index.values[0]]
if ~first_bar_of_the_day.empty:
first_bar_of_the_day['open'] = bars_until_first['open'].values[0]
first_bar_of_the_day['high'] = bars_until_first['high'].max()
first_bar_of_the_day['low'] = bars_until_first['low'].min()
first_bar_of_the_day['close'] = bars_until_first['close'].values[-1]
bars_after_first = df_day.loc[df_day.index > first_bar_of_the_day.index.values[0]]
if len(bars_after_first) > 1:
dff = pd.concat([first_bar_of_the_day, bars_after_first])
else:
dff = first_bar_of_the_day.copy()
print(dff)
dffs.append(dff)
combined_df = pd.concat([x for x in dffs])
print(combined_df)
打印結果如下:dff對于不同日期
open high low close date
datetime
2017-01-02 09:16:00 116.00 117.80 116.0 113.00 2017-01-02
2017-01-02 09:17:00 115.50 116.20 115.5 116.20 2017-01-02
2017-01-02 09:18:00 116.05 116.35 116.0 116.00 2017-01-02
2017-01-02 09:19:00 116.00 116.00 115.6 115.75 2017-01-02
open high low close date
datetime
2017-01-03 09:16:00 259.35 260.0 259.35 260.0 2017-01-03
2017-01-03 09:17:00 261.00 261.0 261.00 261.0 2017-01-03
2017-01-03 09:18:00 262.00 262.0 262.00 262.0 2017-01-03
open high low close date
datetime
2017-12-03 09:18:00 260.0 260.0 259.35 259.35 2017-12-03
open high low close date
datetime
2017-12-04 09:22:00 260.0 260.0 259.35 259.35 2017-12-04
這combined_df
open high low close date
datetime
2017-01-02 09:16:00 116.00 117.80 116.00 113.00 2017-01-02
2017-01-02 09:17:00 115.50 116.20 115.50 116.20 2017-01-02
2017-01-02 09:18:00 116.05 116.35 116.00 116.00 2017-01-02
2017-01-02 09:19:00 116.00 116.00 115.60 115.75 2017-01-02
2017-01-03 09:16:00 259.35 260.00 259.35 260.00 2017-01-03
2017-01-03 09:17:00 261.00 261.00 261.00 261.00 2017-01-03
2017-01-03 09:18:00 262.00 262.00 262.00 262.00 2017-01-03
2017-12-03 09:18:00 260.00 260.00 259.35 259.35 2017-12-03
2017-12-04 09:22:00 260.00 260.00 259.35 259.35 2017-12-04
旁注:我不太確定您清除數據的方式是否是最好的,也許您可以看看是否完全忽略每天上午 9:16 之前的時間,甚至進行分析以檢查前 15 個數據的波動性分鐘來決定。

TA貢獻1798條經驗 獲得超7個贊
摘錄時間為 9:00 至 9:16。數據框按年、月和日分組,并根據 OHLC 值進行計算。該邏輯使用您的代碼。最后,添加 9:16 的日期列。由于我們沒有所有數據,因此我們可能遺漏了一些考慮因素,但基本形式保持不變。
import pandas as pd
import numpy as np
import io
data = '''
date open high low close
"2017-01-02 09:08:00" 116.00 116.00 116.00 116.00
"2017-01-02 09:16:00" 116.10 117.80 117.00 113.00
"2017-01-02 09:17:00" 115.50 116.20 115.50 116.20
"2017-01-02 09:18:00" 116.05 116.35 116.00 116.00
"2017-01-02 09:19:00" 116.00 116.00 115.60 115.75
"2017-01-03 09:08:00" 259.35 259.35 259.35 259.35
"2017-01-03 09:09:00" 260.00 260.00 260.00 260.00
"2017-12-03 09:18:00" 260.00 260.00 259.35 259.35
"2017-12-04 09:05:00" 260.00 260.00 260.00 260.00
"2017-12-04 09:22:00" 259.35 259.35 259.35 259.35
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
df.reset_index(drop=True, inplace=True)
df['date'] = pd.to_datetime(df['date'])
# 9:00-9:16
df_start = df[((df['date'].dt.hour == 9) & (df['date'].dt.minute >= 0)) & ((df['date'].dt.hour == 9) & (df['date'].dt.minute <=16))]
# calculate
df_new = (df_start.groupby([df['date'].dt.year, df['date'].dt.month, df['date'].dt.day])
.agg(open_first=('open', lambda x: x.iloc[0,]),
high_max=('high','max'),
low_min=('low', 'min'),
close_shift=('close', lambda x: x.iloc[-1,])))
df_new.index.names = ['year', 'month', 'day']
df_new.reset_index(inplace=True)
df_new['date'] = df_new['year'].astype(str)+'-'+df_new['month'].astype(str)+'-'+df_new['day'].astype(str)+' 09:16:00'
year month day open_first high_max low_min close_shift date
0 2017 1 2 116.00 117.8 116.00 113.0 2017-1-2 09:16:00
1 2017 1 3 259.35 260.0 259.35 260.0 2017-1-3 09:16:00
2 2017 12 4 260.00 260.0 260.00 260.0 2017-12-4 09:16:00
添加回答
舉報