Hypothesis Testing for Data Scientists with Python

Oct 31, 2025 • 5 min read

As a Data Scientist, you’re often tasked with determining whether a difference in outcomes or a trend in the data is significant, or simply the result of random variation. This is where hypothesis testing becomes essential. It provides a structured, statistical framework to validate assumptions, compare groups, and make confident, data-driven decisions. So, in this article, I’ll take you through a practical guide to Hypothesis Testing for Data Scientists with Python.

Hypothesis Testing for Data Scientists with Python: Getting Started

We’ve been given a dataset of 1000 employees, with information on:

Age, Department, Education, Experience
Whether they attended a training program
Their performance scores (scaled from 0 to 100)

We want to evaluate whether the training program improved performance, on average, compared to employees who didn’t attend the training. You can find the dataset here.

Step 1: Define the Hypotheses

In hypothesis testing, we start by stating two opposing claims:

Null Hypothesis (H₀): There is no difference in average performance scores between trained and untrained employees.
Alternative Hypothesis (H₁): Trained employees have a higher average performance score than untrained employees.

This is a one-tailed test, as we’re specifically interested in improvement.

Now, before the second step, we will import the dataset:

import pandas as pd
df = pd.read_csv('/content/Employee_Training_and_Performance_Dataset.csv')
df.head()

Step 2: Prepare the Groups

Next, we will split the dataset into two groups based on whether employees attended the training:

group_yes = df[df['TrainingAttended'] == 'Yes']['PerformanceScore']
group_no = df[df['TrainingAttended'] == 'No']['PerformanceScore']

Step 3: Check for Normality

Most parametric tests, including the t-test, assume that the data is normally distributed. So, we will use the Shapiro-Wilk Test to verify this for both groups. If the p-value > 0.05, we fail to reject the assumption of normality:

from scipy import stats

sample_size = min(len(group_yes), len(group_no), 300)

shapiro_yes = stats.shapiro(group_yes.sample(sample_size, random_state=1))
shapiro_no = stats.shapiro(group_no.sample(sample_size, random_state=1))

print("Shapiro Test (Training = Yes):", shapiro_yes)
print("Shapiro Test (Training = No):", shapiro_no)

Shapiro Test (Training = Yes): ShapiroResult(statistic=np.float64(0.9947190464566062), pvalue=np.float64(0.3910129582664982))
Shapiro Test (Training = No): ShapiroResult(statistic=np.float64(0.99501435026432), pvalue=np.float64(0.44369527154076494))

Both groups are approximately normal, so we can proceed with the t-test.

Step 4: Check for Equal Variance

Before running a t-test, we need to determine whether the two groups have equal variances. We use Levene’s Test for this:

levene = stats.levene(group_yes, group_no)
print("Levene’s Test:", levene)

Levene’s Test: LeveneResult(statistic=np.float64(3.6987757209752585), pvalue=np.float64(0.05473666933558896))

While this p-value is just above 0.05, it’s borderline. To be cautious, we assume unequal variances and use Welch’s t-test, which is more robust.

Step 5: Perform Welch’s T-Test

Now, we will perform the actual hypothesis test:

t_stat, p_val = stats.ttest_ind(group_yes, group_no, equal_var=False)
print("T-test statistic:", t_stat)
print("T-test p-value:", p_val)

T-test statistic: 9.187893626181372
T-test p-value: 2.8582551803382495e-19

A p-value this small means there’s an extremely low probability that the observed difference happened by chance.

Since p-value < 0.05, we will reject the null hypothesis. We now have strong statistical evidence that employees who attended training perform significantly better than those who did not.

Here’s a visual comparison of both groups:

import plotly.express as px

fig = px.box(
    df,
    x='TrainingAttended',
    y='PerformanceScore',
    title='Performance Score by Training Attendance',
    labels={
        'TrainingAttended': 'Training Attended',
        'PerformanceScore': 'Performance Score'
    },
    color='TrainingAttended',  
    points='all',  
)

fig.update_layout(
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='white',
    margin=dict(l=40, r=40, t=80, b=60),
    showlegend=False
)

fig.show()

Performance Score by Training Attendance

Hypothesis Testing for Data Scientists with Python