3 回答
TA貢獻1793條經驗 獲得超6個贊
也使用explode像@Wen,但在最小/最大年齡列上直接訪問范圍
dt.assign(
age=[np.arange(x, y+1) for x, y in zip(dt['age_min'], dt['age_max'])]
).explode('age').reset_index(drop=True)
id_audience gender age_min age_max age
0 Female 13-17 female 13 17 13
1 Female 13-17 female 13 17 14
2 Female 13-17 female 13 17 15
3 Female 13-17 female 13 17 16
4 Female 13-17 female 13 17 17
5 Female 18-20 female 18 20 18
6 Female 18-20 female 18 20 19
7 Female 18-20 female 18 20 20
TA貢獻1776條經驗 獲得超12個贊
這是使用新熊貓 0.25.0 的一種方法explode
s=dt['id_audience'].str.extractall('(\d+)')
dt['age']= [list(range(y.iloc[0,0],y.iloc[1,0]+1)) for x , y in s.astype(int).groupby(level=0)]
dt=dt.explode('age').reset_index(drop=True)
TA貢獻1871條經驗 獲得超13個贊
使用Index.repeat和GroupBy.cumcount作為age列的計數器:
dt = dt.loc[dt.index.repeat(dt['age_max'] - dt['age_min'] + 1)]
dt['age'] = dt['age_min'] + dt.groupby(level=0).cumcount()
dt = dt.reset_index(drop=True)
print (dt)
id_audience gender age_min age_max age
0 Female 13-17 female 13 17 13
1 Female 13-17 female 13 17 14
2 Female 13-17 female 13 17 15
3 Female 13-17 female 13 17 16
4 Female 13-17 female 13 17 17
5 Female 18-20 female 18 20 18
6 Female 18-20 female 18 20 19
7 Female 18-20 female 18 20 20
添加回答
舉報
