亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定

Data Science Day 20:

When we are watching Soccer games, at the beginning of the match, the screen will show the basic info for each team. Suppose we want to know is there any difference between the average age between Real Madrid and Barcelona pl****ayers, What statistical test should we use?

image

RonnyK / Pixabayimage

kappilrinesh / Pixabay[/caption]

Answer:

We can use T-test to determine whether there is a significant difference between the means of two groups.

T-test assumptions:

  • The dependent variable is Normally distributed
    Note, identify the probability of a particular outcome
  • Independent observations
  • The dependent variable is Continuous.
  • No outliers

Example: Kaggle FIFA 2018 dataset

Null Hypothesis H0: There is NO significant difference between the age of  Real Madrid and Barcelona’s players.

  1. We choose the variable Age and Club (Real Madrid, Barcelona).
    image

import packages

import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import statistics as st
import seaborn as sns

data1= data[["club","age"]]
data2=data1.loc[data1["club"].isin(["Real Madrid CF", "FC Barcelona"])
  1. **Histogram Graph for Age **

image

data3=data1.loc[data1["club"].isin(["Real Madrid CF"])]
data4=data1.loc[data1["club"].isin(["FC Barcelona"])]

plt.hist(data3.age, bins="auto", color="c" ,edgecolor="k",alpha=0.5)
plt.hist(data4.age, bins="auto", color="r", alpha=0.5)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution in Barcelona vs MFC')

plt.show()

3**. Density Plot of Age**

image

#kde plot
df=pd.DataFrame({"mfc": data3.age, "barcelona":data4.age,})
ax=df.plot.kde()
plt.title("Density Plot for Players' Age in Barcelona vs MFC")
plt.show()

** 4. Statistical T-test **

stats.ttest_ind(data3.age,data4.age, equal_var=False)
Ttest_indResult(statistic=-1.9061510499479299, pvalue=0.062416380021536121)

Conclusion:

Although the Histogram graph does not show a normal distribution, the Density Plot represents some feature of the Normality for Age Distribution. Since the P-value= 0.06, we will Accept the Null Hypothesis: 
There is No significant difference in players age between Real Madrid and Barcelona.

Additional Info:

We used Non-direction (two sided) Ttest to generate the results,  but one question we can ask ourselves is how sure are we about the results?

  1. Type 1 error, Reject a null hypothesis that is True
    Predict there is a difference while in reality there’s no.
    p=0.05,  there is  a 5% chance we are making type 1 error
  2. Type 2 error, Accept a null hypothesis that is false
    Predict there  is no difference when the reality has one

In the previous example, we have a 2-level independent variable Club (Barcelona, Real Madrid), and one dependent variable age.

What if we have an independent variable more than 2 levels?
AC Milan, Barcelona, and Real Madrid ?

That will be ANOVA’s show!

Happy Studying! 🍉

點擊查看更多內容
TA 點贊

若覺得本文不錯,就分享一下吧!

評論

作者其他優質文章

正在加載中
  • 推薦
  • 評論
  • 收藏
  • 共同學習,寫下你的評論
感謝您的支持,我會繼續努力的~
掃碼打賞,你說多少就多少
贊賞金額會直接到老師賬戶
支付方式
打開微信掃一掃,即可進行掃碼打賞哦
今天注冊有機會得

100積分直接送

付費專欄免費學

大額優惠券免費領

立即參與 放棄機會
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號

舉報

0/150
提交
取消