2 回答

TA貢獻1830條經驗 獲得超3個贊
起始 df
data =? [
? ? ['10/01/2015',0 ,57 ,91,0.66],
? ? ['11/01/2015',0, 678, 8,0.99],
? ? ['02/01/2016',0,68,12,0.12],
? ? ['10/01/2015' ,1, 78,79,0.22],
? ? ['11/01/2015' ,1 ,99,56, 0.67],
? ? ['1/01/2016', 1 ,789,67,0.78],
? ? ['10/01/2015', 3,678, 178, 0.780],
? ? ['11/01/2015' ,3, 2880,578,0.678]
]
df = pd.DataFrame(data, columns=['Start_Month', 'Bucket', 'Count', 'Complete','Partial'])#.set_index('Start_Month')
df['Start_Month'] = pd.to_datetime(df['Start_Month'])
? ? Start_Month Bucket? Count? ?Complete? ? Partial
0? ?2015-10-01? ? ?0? ? ? 57? ? ? ? 91? ? ? ?0.660
1? ?2015-11-01? ? ?0? ? ? 678? ? ? ? 8? ? ? ?0.990
2? ?2016-02-01? ? ?0? ? ? 68? ? ? ? 12? ? ? ?0.120
3? ?2015-10-01? ? ?1? ? ? 78? ? ? ? 79? ? ? ?0.220
4? ?2015-11-01? ? ?1? ? ? 99? ? ? ? 56? ? ? ?0.670
5? ?2016-01-01? ? ?1? ? ? 789? ? ? ?67? ? ? ?0.780
6? ?2015-10-01? ? ?3? ? ? 678? ? ? 178? ? ? ?0.780
7? ?2015-11-01? ? ?3? ? ? 2880? ? ?578? ? ? ?0.678
為存儲桶 0 制作一個包含完整日期的單獨 df
df0 = pd.DataFrame([pd.date_range('2015-10-01', '2016-02-01', freq='MS'),?
? ? ? ? ? ? ? ? ? ? [0]*5]).T.rename(columns={0: 'Start_Month', 1:'Bucket'})
? ? Start_Month Bucket
0? ?2015-10-01? 0
1? ?2015-11-01? 0
2? ?2015-12-01? 0
3? ?2016-01-01? 0
4? ?2016-02-01? 0
按相應日期和桶過濾原始 df 并將結果與 df0 合并
df_filt = df[(df['Start_Month'].isin(df0['Start_Month'])) & (df['Bucket'] == 0)]
df0 = pd.merge(df0, df_filt, left_on='Start_Month', right_on='Start_Month', how='outer')
df0 = df0.drop('Bucket_y', axis=1).rename(columns={'Bucket_x': 'Bucket'})
? ? Start_Month Bucket? Count? ?Complete Partial
0? ?2015-10-01? 0? ? ? ? 57.0? ?91.0? ? ?0.66
1? ?2015-11-01? 0? ? ? ? 678.0? 8.0? ? ? 0.99
2? ?2015-12-01? 0? ? ? ? NaN? ? NaN? ? ? NaN
3? ?2016-01-01? 0? ? ? ? NaN? ? NaN? ? ? NaN
4? ?2016-02-01? 0? ? ? ? 68.0? ?12.0? ? ?0.12
對存儲桶 1、2 和 3 重復該過程,創建 df1、df2、df3。
(由于重復而沒有顯示這一點......當然你可以循環執行此操作)。然后將所有 4 個 df 連接在一起,并用零填充 na。
# Concat
df_final = pd.concat([df0, df1, df2, df3], axis=0).fillna(0)
? ? Start_Month Bucket? ? ? ?Count? ?Complete? ?Partial
0? ?2015-10-01? ? ? ?0? ? ? ?57.0? ? ? ? 91.0? ?0.660
1? ?2015-11-01? ? ? ?0? ? ? ?678.0? ? ? ?8.0? ? 0.990
2? ?2015-12-01? ? ? ?0? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
3? ?2016-01-01? ? ? ?0? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
4? ?2016-02-01? ? ? ?0? ? ? ?68.0? ? ? ? 12.0? ?0.120
0? ?2015-10-01? ? ? ?1? ? ? ?78.0? ? ? ? 79.0? ?0.220
1? ?2015-11-01? ? ? ?1? ? ? ?99.0? ? ? ? 56.0? ?0.670
2? ?2015-12-01? ? ? ?1? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
3? ?2016-01-01? ? ? ?1? ? ? ?789.0? ? ? ?67.0? ?0.780
4? ?2016-02-01? ? ? ?1? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
0? ?2015-10-01? ? ? ?2? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
1? ?2015-11-01? ? ? ?2? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
2? ?2015-12-01? ? ? ?2? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
3? ?2016-01-01? ? ? ?2? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
4? ?2016-02-01? ? ? ?2? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
0? ?2015-10-01? ? ? ?3? ? ? ?678.0? ? ? ?178.0? 0.780
1? ?2015-11-01? ? ? ?3? ? ? ?2880.0? ? ? 578.0? 0.678
2? ?2015-12-01? ? ? ?3? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
3? ?2016-01-01? ? ? ?3? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
4? ?2016-02-01? ? ? ?3? ? ? ?0.0? ? ? ? ?0.0? ? 0.000
更新:顯示完全循環的代碼,并在評論中回答您的問題。
def get_separate_df(df, bucket_num):
? ? df_bucket = pd.DataFrame([pd.date_range('2015-10-01', '2016-02-01', freq='MS'),?
? ? ? ? ? ? ? ? ? ? ? ? [bucket_num]*5]).T.rename(columns={0: 'Start_Month', 1:'Bucket'})
? ? df_filt = df[(df['Start_Month'].isin(df_bucket['Start_Month'])) & \
? ? ? ? ? ? ? ? ? ? ? ? (df['Bucket'] == bucket_num)]
? ? df_bucket = pd.merge(df_bucket, df_filt, left_on='Start_Month', right_on='Start_Month', how='outer')
? ? df_bucket = df_bucket.drop('Bucket_y', axis=1).rename(columns={'Bucket_x': 'Bucket'})
? ? return df_bucket
dfs = [get_separate_df(df, i) for i in range(4)]?
# Concat
df_final = pd.concat(dfs, axis=0).fillna(0)
至于評論中的問題,您可以獲得一個空數據框,其中包含重復的日期和存儲桶序列,如下所示:
bucket_list = [ele for ele in [0,1,2,3] for i in range(5)]
dates = list(pd.date_range('2015-10-01', '2016-02-01', freq='MS'))*4
df = pd.DataFrame(data=[dates, bucket_list]).T.rename(columns={0:'Start_Month', 1:'Bucket'})
Output:
? ? Start_Month Bucket
0? ?2015-10-01? 0
1? ?2015-11-01? 0
2? ?2015-12-01? 0
3? ?2016-01-01? 0
4? ?2016-02-01? 0
5? ?2015-10-01? 1
6? ?2015-11-01? 1
7? ?2015-12-01? 1
8? ?2016-01-01? 1
9? ?2016-02-01? 1
10? 2015-10-01? 2
11? 2015-11-01? 2
12? 2015-12-01? 2
13? 2016-01-01? 2
14? 2016-02-01? 2
15? 2015-10-01? 3
16? 2015-11-01? 3
17? 2015-12-01? 3
18? 2016-01-01? 3
19? 2016-02-01? 3

TA貢獻1878條經驗 獲得超4個贊
寫了一篇類似的文章,但只是概括了一點
import pandas as pd
import numpy as np
# converting date string to date
df['Start_Month'] = pd.to_datetime(df['Start_Month'])
# finding the the date range and increasin by 1 month start
rng = pd.date_range(df['Start_Month'].min(),df['Start_Month'].max(), freq='MS')
# creating date dataframe
df1 = pd.DataFrame({ 'Start_Month': rng})
# Converting bucket field to integer
df['Bucket'] = df['Bucket'].astype(int)
# finding the bucket values max and min
Bucket=np.arange(df['Bucket'].min(),df['Bucket'].max()+1,1)
# Repeating the date range for every bucket
df1=pd.concat([df1]*len(Bucket))
# repeating bucket values to each date
df1['Bucket']=np.repeat(Bucket, len(rng))
# merging to the previous dataframe and filling it with 0
merged_left = pd.merge(left=df1, right=df, how='left', on=['Start_Month','Bucket']).fillna(0)
添加回答
舉報