亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

分析 Python For 循環中包含的數據幀

分析 Python For 循環中包含的數據幀

白板的微信 2023-07-05 15:52:47
現在的情況:我有一個函數將二進制類目標變量分為“1”和“0”,然后讀取每個變量的所有自變量。該函數還根據類別“1”和“0”確定每個自變量的 KDE,然后計算相交面積:import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom scipy.stats import gaussian_kdedef intersection_area(data, bandwidth, margin,target_variable_name):        #target_variable_name is the column name of the response variable        data = data.dropna()        X = data.drop(columns = [str(target_variable_name)], axis = 1)        names = list(X.columns)        new_columns = []        for column_name in names[:-1]:            x0= data.loc[data[str(target_variable_name)] == 0,str(column_name)]            x1= data.loc[data[str(target_variable_name)] == 1,str(column_name)]                        kde0 = gaussian_kde(x0, bw_method=bandwidth)            kde1 = gaussian_kde(x1, bw_method=bandwidth)            x_min = min(x0.min(), x1.min()) #find the lowest value between two minimum points            x_max = min(x0.max(), x1.max()) #finds the lowest value between two maximum points            dx = margin * (x_max - x_min) # add a margin since the kde is wider than the data            x_min -= dx            x_max += dx                    x = np.linspace(x_min, x_max, 500)            kde0_x = kde0(x)            kde1_x = kde1(x)            inters_x = np.minimum(kde0_x, kde1_x)            area_inters_x = np.trapz(inters_x, x) #intersection of two kde            print(area_inters_x)問題: 如果我有 n_class = 4 該函數將如下所示:
查看完整描述

1 回答

?
撒科打諢

TA貢獻1934條經驗 獲得超2個贊

考慮使用每個目標級別的多個類的列表理解來構建 x 和 kde 的列表。并且不是在每次迭代中打印結果,而是將結果綁定到數據框中:


def intersection_area_new(data, bandwidth, margin, target_variable_name):

        # Collect the names of the independent variables

        data = data.dropna()

        

        # determine the number of unique classes from a multi-class target variable and save them as a list.

        classes = data['target'].unique()

        

        kde_dicts = []

        for column_name in data.columns[:-1]:

            # BUILD LIST OF x's AND kde's

            x_s = [data.loc[(data[target_variable_name] == i), str(column_name)] for i in classes]

            kde_s = [gaussian_kde(x, bw_method=bandwidth) for x in x_s]

            

            x_min = min([x.min() for x in x_s])              # find the lowest value between two minimum points

            x_max = min([x.max() for x in x_s])              # find the lowest value between two maximum points

                            

            dx = margin * (x_max - x_min)                    # add a margin since the kde is wider than the data

            x_min -= dx

            x_max += dx

    

            x_array = np.linspace(x_min, x_max, 500)

            kde_x_s = [kde(x_array) for kde in kde_s]

                        

            inters_x = np.array(kde_x_s).min(axis=0)

            area_inters_x = np.trapz(inters_x, x_array)      # intersection of kdes

            

            kde_dicts.append({'target': target_variable_name, 

                              'column': column_name,

                              'intersection': area_inters_x})

        

        return pd.DataFrame(kde_dicts)

輸出


output = intersection_area_new(sample_dataset, None, 0.5, "target")

print(output.head(10))


#    target column  intersection

# 0  target   var1      0.842256

# 1  target   var2      0.757190

# 2  target   var3      0.676021

# 3  target   var4      0.873074

# 4  target   var5      0.763626

# 5  target   var6      0.868560


查看完整回答
反對 回復 2023-07-05
  • 1 回答
  • 0 關注
  • 127 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號