已解決430363個問題，去搜搜看，總會有你想問的

從另一個數據幀創建熊貓數據幀的更快方法

首頁猿問從另一個數據幀創建熊貓數據幀的更快方法

從另一個數據幀創建熊貓數據幀的更快方法

Python

紅顏莎娜 2022-06-14 16:35:56

我有一個包含超過 41500 條記錄和 3 個字段的數據框ID：start_date和end_date.我想從中創建一個單獨的數據框，其中只有 2 個字段為：ID并將active_years包含每個標識符的記錄，這些記錄針對 start_year 和 end_year 范圍之間存在的所有可能年份（包括該范圍內的結束年份）。這就是我現在正在做的事情，但是對于 41500 行，它需要 2 個多小時才能完成。df = pd.DataFrame(columns=['id', 'active_years'])ix = 0for _, row in raw_dataset.iterrows(): st_yr = int(row['start_date'].split('-')[0]) # because dates are in the format yyyy-mm-dd end_yr = int(row['end_date'].split('-')[0]) for year in range(st_yr, end_yr+1): df.loc[ix, 'id'] = row['ID'] df.loc[ix, 'active_years'] = year ix = ix + 1那么有沒有更快的方法來實現這一點？[編輯]一些嘗試解決的示例，raw_dataset = pd.DataFrame({'ID':['a121','b142','cd3'],'start_date':['2019-10-09','2017-02-06','2012-12-05'],'end_date':['2020-01-30','2019-08-23','2016-06-18']})print(raw_dataset) ID start_date end_date0 a121 2019-10-09 2020-01-301 b142 2017-02-06 2019-08-232 cd3 2012-12-05 2016-06-18# the desired dataframe should look like thisprint(desired_df) id active_years0 a121 20191 a121 20202 b142 20173 b142 20184 b142 20195 cd3 20126 cd3 20137 cd3 20148 cd3 20159 cd3 2016

查看完整描述

1 回答

函數式編程

TA貢獻1807條經驗獲得超9個贊

動態增長的 python 列表比動態增長的 numpy 數組（這是 pandas 數據幀的底層數據結構）快得多。請參閱此處以獲取簡要說明。考慮到這一點：

import pandas as pd

# Initialize input dataframe

raw_dataset = pd.DataFrame({

'ID':['a121','b142','cd3'],

'start_date':['2019-10-09','2017-02-06','2012-12-05'],

'end_date':['2020-01-30','2019-08-23','2016-06-18'],

})

# Create integer columns for start year and end year

raw_dataset['start_year'] = pd.to_datetime(raw_dataset['start_date']).dt.year

raw_dataset['end_year'] = pd.to_datetime(raw_dataset['end_date']).dt.year

# Iterate over input dataframe rows and individual years

id_list = []

active_years_list = []

for row in raw_dataset.itertuples():

for year in range(row.start_year, row.end_year+1):

id_list.append(row.ID)

active_years_list.append(year)

# Create result dataframe from lists

desired_df = pd.DataFrame({

'id': id_list,

'active_years': active_years_list,

})

print(desired_df)

# Output:

# id active_years

# 0 a121 2019

# 1 a121 2020

# 2 b142 2017

# 3 b142 2018

# 4 b142 2019

# 5 cd3 2012

# 6 cd3 2013

# 7 cd3 2014

# 8 cd3 2015

# 9 cd3 2016

反對回復 2022-06-14

1 回答
0 關注
140 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

從另一個數據幀創建熊貓數據幀的更快方法

從另一個數據幀創建熊貓數據幀的更快方法

1 回答

添加回答