亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定

ANOVA in python with 2018 FIFA data

Data Science Day 21:

Last time we showed an example of using the independent T-test to compare the Age mean value between players in Real Madrid and Barcelona. What statistical method should we use if we want to compare the age mean among the players in Barcelona, Real Madrid, and Juventus?

image

kappilrinesh / Pixabayimage

RonnyK / Pixabayimage

RonnyK / Pixabay[/caption]

Answer:

We will use ANOVA( analysis of variance) Test, a case of GLM(Generalized Linear Model), for comparing the means between more than 2 groups.

Null Hypothesis: Mean(A) = Mean(B) = Mean©

ANOVA Assumptions:

  • Normality of the dependent variable
  • Homogeneity of Variance
  • Independent of observations

Example: Kaggle FIFA 2018 dataset

Null Hypothesis: There is NO significance in the mean of players’ age among Real Madrid, Barcelona, and Juventus.

H0:  Age.mean(Real Madrid) = Age.mean(Barcelona) = Age.mean(Juventus)

  1. Dataset

    We choose the variable Age and Club (Real Madrid, Barcelona, and Juventus).

image

data2=data1.loc[data1["club"].isin(["Real Madrid CF", "FC Barcelona","Juventus"])]

2.Histogram Plot

image

plt.hist(data3.age, bins="auto", color="c" ,edgecolor="k",alpha=0.5)
plt.hist(data4.age, bins="auto", color="r",edgecolor="k", alpha=0.5)
plt.hist(data5.age, bins="auto", color="y",edgecolor="k", alpha=0.5)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Dist in Barcelona vs MFC vs Juventus')

plt.show()

3. KDE Density Plot

image

#kde
df=pd.DataFrame({"mfc": data3.age, "barcelona":data4.age,
                "juventus": data5.age ,})
ax=df.plot.kde()
plt.title("Density Plot for Players' Age in Barcelona vs MFC vs Juventus")
plt.show()

4. ANOVA Test

stats.f_oneway(data3.age, data4.age, data5.age)
F_onewayResult(statistic=4.8827728579356524, pvalue=0.010152460067260918)

Outcome:

F-statistics 4.88 and P-value= 0.01 which is indicating there is an overall significance of the players’ mean age among MFC, Barcelona, and Juventus. Both Histogram and Density plots supported the outcome.  However, we don’t know where the difference lies between the groups, we can use the Bonferroni Method for further investigation.

Bonus:

I remember Song asked me, it is good to know what ANOVA is used for, but do you know which test generates the P-value of ANOVA?

I thought since ANOVA has similar application as T-test, so the t.test generates P-value.
However, the truth is F-test generates the ANOVA’s P-value.

Later, little rain mentioned T-test and F-test is convertible with the relation T^{2}= F.
We will go over the relationship between T-test and F-test next time!

Happy Studying and Soccer game watching! ⚽

點擊查看更多內容
TA 點贊

若覺得本文不錯,就分享一下吧!

評論

作者其他優質文章

正在加載中
  • 推薦
  • 評論
  • 收藏
  • 共同學習,寫下你的評論
感謝您的支持,我會繼續努力的~
掃碼打賞,你說多少就多少
贊賞金額會直接到老師賬戶
支付方式
打開微信掃一掃,即可進行掃碼打賞哦
今天注冊有機會得

100積分直接送

付費專欄免費學

大額優惠券免費領

立即參與 放棄機會
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號

舉報

0/150
提交
取消