已解決430363個問題，去搜搜看，總會有你想問的

如何在python中測試兩組之間的相關性？

首頁猿問如何在python中測試兩組之間的...

如何在python中測試兩組之間的相關性？

Python

慕村225694 2022-07-26 16:24:39

我有兩個不同的數據框，其中一個如下df1= Datetime BSL0 7 127.5045051 8 115.2541322 9 108.9942753 10 102.9368604 11 99.8304005 12 114.6605226 13 138.2153397 14 132.1310758 15 121.4780069 16 113.79564510 17 114.038462另一個是 df2= Datetime Number of Accident0 7 34551 8 173882 9 277673 10 336224 11 334745 12 126706 13 281377 14 271418 15 265159 16 2484910 17 13013第一個基于時間的人的血糖水平（7 表示早上 7 點到早上 8 點之間）第二個是這些時間之間的事故數量當我嘗試這段代碼時df1.corr(df2, "pearson")我得到了錯誤：ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().我該如何解決？或者，我如何測試兩個不同變量之間的相關性？

查看完整描述

3 回答

紅糖糍粑

TA貢獻1815條經驗獲得超6個贊

from scipy.stats import pearsonr

df_full = df1.merge(df2,how='left')

full_correlation = pearsonr(df_full['BSL'],df_full['Accidents'])

print('Correlation coefficient:',full_correlation[0])

print('P-value:',full_correlation[1])

輸出：

(-0.2934597230564072, 0.3811116115819819)

Correlation coefficient: -0.2934597230564072

P-value: 0.3811116115819819

編輯：

您想要每小時的相關性，但在數學上這是不可能的，因為您每小時只有 1 個 xy 值。因此，輸出將充滿 NaN。這是代碼，但是輸出無效：

df_corr = df_full.groupby('Datetime')['BSL','Accidents'].corr().drop(columns='BSL').drop('Accidents',level=1).rename(columns={'Accidents':'Correlation'})

print(df_corr)

輸出：

Correlation

Datetime

7 BSL NaN

8 BSL NaN

9 BSL NaN

10 BSL NaN

11 BSL NaN

12 BSL NaN

13 BSL NaN

14 BSL NaN

15 BSL NaN

16 BSL NaN

17 BSL NaN

反對回復 2022-07-26

泛舟湖上清波郎朗

TA貢獻1818條經驗獲得超3個贊

由于您的數據框有多個列，因此您需要指定要使用的列的名稱：

df1['BSL'].corr(df2['Number of Accident'], "pearson")

反對回復 2022-07-26

POPMUISE

TA貢獻1765條經驗獲得超5個贊

corr()pandas 數據幀的方法計算一個數據幀中所有列的相關矩陣。您有兩個數據框，因此該方法不起作用。您可以通過以下方式解決此問題：

df1['number'] = df2['Number of Accident']

df1.corr("pearson")

反對回復 2022-07-26

3 回答
0 關注
236 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

如何在python中測試兩組之間的相關性？

如何在python中測試兩組之間的相關性？

3 回答

添加回答

如何在python中測試兩組之間的相關性？

如何在python中測試兩組之間的相關性？