2 回答

TA貢獻1824條經驗 獲得超6個贊
我會這樣做:
df = {'Request': [0, 0, 1, 0, 1, 0, 0],
'Time': ['16:00', '17:00', '18:00', '19:00', '20:00', '20:30', '24:00'],
'grant': [3, 0, 0, 5, 0, 0, 5]}
df = pd.DataFrame(df) #create DataFrame
#get rid of any rows have neither a grant nor request
df = df[(df[['grant', 'Request']].T != 0).any()]
#change the time in HH:MM to number of minutes
df['Time'] = df['Time'].str.split(":").apply(lambda x: int(x[0])*60 + int(x[1]))
#get the difference between those times
df['timeElapsed'] = df['Time'].diff()
#filter out the requests to only get the grants and their times.
#Also, drop the NA from the first line.
df = df[(df[['grant']].T != 0).any()].dropna()
#drop all columns except timeElapsed and Grant
df = df[['timeElapsed', 'grant']]
那么輸出看起來像這樣,timeElaped 以分鐘為單位:
timeElapsed grant
3 60.0 5
6 240.0 5

TA貢獻1784條經驗 獲得超2個贊
您首先需要將您的Time索引轉換為可減去的東西以找到時間增量。使用pd.to_timestamp不起作用,因為沒有24:00. 下面的解決方案使用十進制時間(1:30PM = 13.5):
# Convert the index into decimal time
df.index = pd.to_timedelta(df.index + ':00') / pd.Timedelta(hours=1)
# Get time when each request was made
r = df[df['Request'] != 0].index.to_series()
# Get time where each grant was made
g = df[df['grant'] != 0].index.to_series()
# `asof` mean "get the last available value in `r` as the in `g.index`
tmp = r.asof(g)
df['Delta'] = tmp.index - tmp
結果:
Request grant Delta
Time
16.0 0 3 NaN
17.0 0 0 NaN
18.0 1 0 NaN
19.0 0 5 1.0
20.0 1 0 NaN
20.5 0 0 NaN
24.0 0 5 4.0
添加回答
舉報