我正在使用olsinstatsmodels來運行回歸。一旦我對數據幀的每一行運行回歸,我想從patsy這些回歸中使用的變量中檢索 X 變量。但是,我收到一個我似乎無法理解的錯誤。編輯:我正在嘗試運行此處答案中所示的回歸,但希望在數據幀的分組版本的每一行上運行回歸df,其中它按Date, bal, dist, pay_hist, inc,分組bckts。因此,我首先如上所述對這些數據進行分組,然后嘗試對按以下df分組的每一行運行回歸Date:df.groupby(['Date']).apply(ols_coef,'bal ~ C(dist) + C(pay_hist) + C(inc) + C(bckts)')我的代碼如下:from statsmodels.formula.api import olsdf = df.groupby([['Date','bal', 'dist', 'pay_hist', 'inc', 'bckts']])######run regressiondef ols_coef(x,formula): return ols(formula,data=x).fit().paramsgamma = df.groupby(['Date']).apply(ols_coef,'bal ~ C(dist) + C(pay_hist) + C(inc) + C(bckts)')print('gamme is {}'.format(gamma))#############################Now trying to retrieve the X variables in the regressions aboveformula = 'bal ~ C(dist) + C(pay_hist) + C(inc) + C(bckts)'data = df.groupby(['Date'])[['bckts', 'wac_dist', 'pay_hist', 'inc', 'bal']]y,X = patsy.dmatrices(formula,data,return_type='dataframe')################我收到以下錯誤,并且不確定如何解決它:patsy.PatsyError: Error evaluating factor: Exception: Column(s) ['bckts', 'dist', 'pay_hist', 'inc', 'bal'] already selected bal ~ C(dist) + C(pay_hist) + C(inc) + C(bckts) ^^^^^^^^^^^
1 回答

蝴蝶刀刀
TA貢獻1801條經驗 獲得超8個贊
問題是您將分組數據幀傳遞到函數中pasty.dmatrices。由于分組數據幀是可迭代的,因此您可以在這樣的循環中執行此操作,并將所有 X 數據幀(每組一個)存儲到字典中:
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
import patsy
# Loading data
df = sm.datasets.get_rdataset("Guerry", "HistData").data
# Extracting Independent variables
formula = 'Suicides ~ Crime_parents + Infanticide'
data = df.groupby(['Region'])[['Suicides', 'Crime_parents', 'Infanticide', 'Region']]
X = {}
for name, group in data:
Y, X[name] = patsy.dmatrices(formula, group, return_type='dataframe')
print(X)
添加回答
舉報
0/150
提交
取消