如何評估每個變量中每種類型之間的相關性?df level job0 good golfer1 bad footballer2 intermediate musician...預期輸出是一個相關表或類似的東西: golfer footballer musician ...good bad intermediate 我試過:df['level']=df['level'].astype('category').cat.codesdf['job']=df['job'].astype('category').cat.codesdf.corr()
2 回答

qq_遁去的一_1
TA貢獻1725條經驗 獲得超8個贊
您可以使用pd.crosstab
df1 = pd.crosstab(df.level, df.job)
df1
對于我的示例數據,您將得到輸出
job footballer golfer musician
level
bad 1 3 3
good 3 3 2
intermediate 1 2 2
然后除以每行的總和
df1 / df1.sum()
輸出
job footballer golfer musician
level
bad 0.2 0.375 0.428571
good 0.6 0.375 0.285714
intermediate 0.2 0.250 0.285714

慕絲7291255
TA貢獻1859條經驗 獲得超6個贊
從預期的輸出來看,您需要一個頻率表。我想這可以做得更好,但一種方法是:
count_combos = pd.Series(zip(df.level, df.job)).value_counts() count_combos.index = pd.MultiIndex.from_tuples(count_combos.index) count_combos.unstack()
添加回答
舉報
0/150
提交
取消