2 回答

TA貢獻1829條經驗 獲得超4個贊
首先在數據框中創建一個
as_of_date
列pd.date_range
,該列是每行開始日期和結束日期之間的日期列表lambda x:
(刪除重復項并保留最后一個)。分解 上的數據幀,
as_of_date
準備在date
和上進行合并port
。只需合并數據框(根據您的第二個問題,您可以簡單地排除此步驟)。
第 1 步:創建日期范圍列
df['as_of_date'] = df.apply(lambda x: list(pd.date_range(x['start_date'], x['end_date'], freq='d')), axis=1)
df
Out[1]:
port currency start_date end_date \
0 PortA USD 2020-01-01 2020-01-04
1 PortA CAD 2020-01-04 2020-01-06
2 PortA EUR 2020-01-06 2020-01-11
3 PortA USD 2020-01-11 2020-01-15
as_of_date
0 [2020-01-01 00:00:00, 2020-01-02 00:00:00, 202...
1 [2020-01-04 00:00:00, 2020-01-05 00:00:00, 202...
2 [2020-01-06 00:00:00, 2020-01-07 00:00:00, 202...
3 [2020-01-11 00:00:00, 2020-01-12 00:00:00, 202...
第 2 步:分解數據框并刪除重復項
df = df.explode('as_of_date').drop_duplicates('as_of_date', keep='last')
df
Out[2]:
port currency start_date end_date as_of_date
0 PortA USD 2020-01-01 2020-01-04 2020-01-01
0 PortA USD 2020-01-01 2020-01-04 2020-01-02
0 PortA USD 2020-01-01 2020-01-04 2020-01-03
1 PortA CAD 2020-01-04 2020-01-06 2020-01-04
1 PortA CAD 2020-01-04 2020-01-06 2020-01-05
2 PortA EUR 2020-01-06 2020-01-11 2020-01-06
2 PortA EUR 2020-01-06 2020-01-11 2020-01-07
2 PortA EUR 2020-01-06 2020-01-11 2020-01-08
2 PortA EUR 2020-01-06 2020-01-11 2020-01-09
2 PortA EUR 2020-01-06 2020-01-11 2020-01-10
3 PortA USD 2020-01-11 2020-01-15 2020-01-11
3 PortA USD 2020-01-11 2020-01-15 2020-01-12
3 PortA USD 2020-01-11 2020-01-15 2020-01-13
3 PortA USD 2020-01-11 2020-01-15 2020-01-14
3 PortA USD 2020-01-11 2020-01-15 2020-01-15
步驟 3:合并兩個數據框(根據您的第二個問題 - 如果您沒有數據框,您可以忽略此步驟tbl。相反,只需運行df = df[['port', 'as_of_date', 'currency']]以保留并重新排序您需要的列:
df_merge = pd.merge(df[['port', 'currency', 'as_of_date']], tbl, how='left', on=['as_of_date', 'port'])
df_merge
Out[3]:
port currency as_of_date
0 PortA USD 2020-01-01
1 PortA USD 2020-01-02
2 PortA USD 2020-01-03
3 PortA CAD 2020-01-04
4 PortA CAD 2020-01-05
5 PortA EUR 2020-01-06
6 PortA EUR 2020-01-07
7 PortA EUR 2020-01-08
8 PortA EUR 2020-01-09
9 PortA EUR 2020-01-10
10 PortA USD 2020-01-11
11 PortA USD 2020-01-12
12 PortA USD 2020-01-13
13 PortA USD 2020-01-14
14 PortA USD 2020-01-15
完整代碼:
df = pd.DataFrame(data={
'port': ['PortA','PortA','PortA','PortA'],
'currency': ['USD', 'CAD', 'EUR', 'USD'],
'start_date': ['01/01/2020', '01/04/2020', '01/06/2020', '01/11/2020'],
'end_date': ['01/04/2020', '01/06/2020', '01/11/2020', '01/15/2020']
})
df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(pd.to_datetime, errors='ignore')
tbl = pd.DataFrame(data={
'port': 'PortA',
'as_of_date': [x for x in pd.date_range(start='01/01/2020', end='01/15/2020')]
})
df['as_of_date'] = df.apply(lambda x: list(pd.date_range(x['start_date'], x['end_date'], freq='d')), axis=1)
df = df.explode('as_of_date').drop_duplicates('as_of_date', keep='last')
df_merge = pd.merge(df[['port', 'currency', 'as_of_date']], tbl, how='left', on=['as_of_date', 'port'])
df_merge

TA貢獻1810條經驗 獲得超4個贊
pd.date_range您可以使用,explode和這樣做merge:
df_dates = df.assign(dates=[pd.date_range(i, j + pd.Timedelta(days=-1), freq='D')
for i, j in zip(df['start_date'], df['end_date'])])\
.explode('dates')
bl.merge(df_dates[['port','dates','currency']],
left_on=['port', 'as_of_date'],
right_on=['port', 'dates'])
輸出:
port as_of_date dates currency
0 PortA 2020-01-01 2020-01-01 USD
1 PortA 2020-01-02 2020-01-02 USD
2 PortA 2020-01-03 2020-01-03 USD
3 PortA 2020-01-04 2020-01-04 CAD
4 PortA 2020-01-05 2020-01-05 CAD
5 PortA 2020-01-06 2020-01-06 EUR
6 PortA 2020-01-07 2020-01-07 EUR
7 PortA 2020-01-08 2020-01-08 EUR
8 PortA 2020-01-09 2020-01-09 EUR
9 PortA 2020-01-10 2020-01-10 EUR
10 PortA 2020-01-11 2020-01-11 USD
11 PortA 2020-01-12 2020-01-12 USD
12 PortA 2020-01-13 2020-01-13 USD
13 PortA 2020-01-14 2020-01-14 USD
注意: pd.Timedelta(days=-1) 處理多行上的重復日期。結束日期與下一行的開始日期重疊。
更改代碼以修復最后日期:
d = pd.Timedelta(days=-1)
l = pd.date_range #To shorten typing
df_dates = df.assign(dates=[l(i, j + d) if j != df.iloc[-1, df.columns.get_loc('end_date')]
else l(i, j) for i, j in zip(df['start_date'], df['end_date'])])\
.explode('dates')
print(tbl.merge(df_dates[['port','dates','currency']], left_on=['port', 'as_of_date'], right_on=['port', 'dates']))
添加回答
舉報