2 回答

TA貢獻1796條經驗 獲得超7個贊
如果我理解正確,如果可以將其team_stats_1970_2017作為pandas數據框,則可以應用2個合并:一個在home_team和上season_yr,一個在visitor_team和上season_yr:
merged_df = (game_df.merge(team_stats_1970_2017,
left_on=['home_team', 'season_yr'],
right_on=['team', 'season_yr'])
.merge(team_stats_1970_2017, left_on=['visitor_team', 'season_yr'],
right_on=['team', 'season_yr'],
suffixes=['_home', '_visitor'])
.drop(['team_visitor', 'team_home'], axis=1))
>>> merged_df
season_yr home_team visitor_team home_team_runs visitor_team_runs \
0 2017 ARI SFG 6 5
1 2017 ARI SFG 4 8
2 2017 ARI SFG 8 6
3 2017 ARI SFG 9 3
4 2017 ARI CLE 7 3
5 2017 ARI CLE 11 2
6 2017 ATL LAD 2 3
r_per_g_home pa_home ab_home b_r_home b_h_home ... b3_home \
0 5.01 6224.0 5525 812 1405 ... 39
1 5.01 6224.0 5525 812 1405 ... 39
2 5.01 6224.0 5525 812 1405 ... 39
3 5.01 6224.0 5525 812 1405 ... 39
4 5.01 6224.0 5525 812 1405 ... 39
5 5.01 6224.0 5525 812 1405 ... 39
6 4.52 6216.0 5584 732 1467 ... 26
b_hr_home r_per_g_visitor pa_visitor ab_visitor b_r_visitor \
0 220 3.94 6137.0 5551 639
1 220 3.94 6137.0 5551 639
2 220 3.94 6137.0 5551 639
3 220 3.94 6137.0 5551 639
4 220 5.05 6234.0 5511 818
5 220 5.05 6234.0 5511 818
6 165 4.75 6191.0 5408 770
b_h_visitor b2_visitor b3_visitor b_hr_visitor
0 1382 290 28 128
1 1382 290 28 128
2 1382 290 28 128
3 1382 290 28 128
4 1449 333 29 212
5 1449 333 29 212
6 1347 312 20 221
[7 rows x 21 columns]
然后,您可以使用它merged_df來計算特征。例如(因為它似乎你希望你的特點np.arrays),計算之間的差異pa_home和pa_visitor(這僅僅是一個虛擬的例子):
>>> (merged_df['pa_home'] - merged_df['pa_visitor']).values
array([ 87., 87., 87., 87., -10., -10., 25.])
添加回答
舉報