How to Use Chi Square to Fuel A/B Test?

aakash93 Last Updated : 15 Jun, 2023

9 min read

Introduction

You may have heard that “Customer is King“. This is because the customer decides which product will stay and which will not. Whether marketers treat a customer as a ‘King‘ or not, he is always a ‘King’. He has the money the marketers want. He is not going to give that away for free. So, how do we know which product is accepted more by customers? The answer is to do an experiment known as A/B tests.

In this article, I will talk about A/B Test and we will also see a method that works well for comparing click rates across different advertisements within a campaign. I will also explain how to calculate the same in Excel and in Python and talk about how to interpret the results.

This article was published as a part of the Data Science Blogathon.

Introduction
What is A/B Testing?
Limitations of Traditional A/B Test
What is Chi-square (or Chi-Sq)?
Actual Calculations Using Excel
Code to Implement Chi-Sq in Python
How to Count of Instances or Volume Plays an Important Role in Determining Significance?
Applications of Chi Square
Advantages of Using Chi-square Test
Conclusion
Frequently Asked Questions

What is A/B Testing?

A/B testing, also known as split testing, compares two versions of a variable (for example, a advertisement A vs advertisement B), showing each version to equal numbers of users at random to determine which performs better against a business goal.

A/B testing can be useful when you want to test metrics of two variables against each other. However, traditional A/B testing will only take you so far. Keep reading to learn more about the limitations of A/B testing and discover an alternative approach.

Let us understand the same from a Data Scientist’s perspective. For a Data Scientist, experiments are important part of day to day work. But how do we validate those experiments?

Let’s say you observe the below metrics for an email advertisement campaign.

Now, you want to understand whether the click rates of the advertisement are different.

So, you can calculate the click rate for each advertisement (like below)

Inference: You see that Advertisement B has better click rates. But is it significant enough??

Once, the Data Scientists create a solution to the problem, they usually deploy the solution and perform an A/B test to evaluate whether the solution is performing well practically.

Limitations of Traditional A/B Test

Traditionally, most Data Scientists use t-test (or z-test) to determine the significance of A/B test. It is generally used to compare the means of 2 groups to understand of there is any statistical difference. t-test assumes that the distribution is Gaussian (or normal). However, it may produce non-reliable metrics if the distribution is not normal.

Since, we cannot always have a normal distribution with the data, we need to use a better and more generic approach.

In this article, I will talk about a particular A/B Testing method that works well for comparing click rates across different advertisements within a campaign. I will also explain how to calculate the same in Excel and in Python and talk about how to interpret the results.

What is Chi-square (or Chi-Sq)?

Chi-sq test is used to determine associations between 2 or more categorical variables. The Chi-square formula is used in data that consist of variables distributed across various categories and helps us to know whether that distribution is different from what one would expect by chance.

Understanding with example:

We will use the same example we saw in the introduction. Since, we are doing a statistical test, lets understand a few terminologies.

Null Hypothesis : H0 : The two categorical variables have no relationship (independent)

Alternate Hypothesis : H1 : There is a relationship (dependent) between two categorical variables

We will define a significant factor to determine whether the relation between the variables is of considerable significance. Generally, a significant factor or alpha value of 0.05 is chosen. This alpha value denotes the probability of erroneously rejecting H0 when it is true.

A lower alpha value is chosen in cases when we expect more precision. If the p-value for the test comes out to be strictly greater than the alpha value, then we will accept our H0.

Actual Calculations Using Excel

Step 1: Create a contingency table for observed values

Create a Total column and row. Also, create a Click rate row. It will represent the Click/Total for each advertisement.

Step 2: Calculate Expected values

Create a contingency table for expected values.

Expected Frequency = (Row Total x Column Total)/Grand Total

Value for the 1st cell = (51*100)/200 = 25.5

You will get the below table for the expected values.

Notice how the total values are same, but other cell values have changed.

Step 3: Calculate Chi-sq statistic

Use the tables created in above 2 steps and calculate the chi-sq statistic using the below formula.

Here,
O => Observed values or the actual values that we saw in Step 1.

E=> Expected values that we computed in Step 2.

χ2 => Chi-Sq statistic

For each cell of the table,

Numerator = squared value of (Observed value – Expected value)
Denominator = Expected value
Chi-Sq statistic = Numerator/Denominator

For e.g. Value for the 1st cell = (25 – 25.5)^2/25.5 = 0.01

Similarly, calculate values across all cells and add them up. See the highlighted cell in Yellow.

We get an overall Chi-sq value of 0.03

Step 4: Calculate DOF (Degrees of freedom)

Definition: In statistics, Degrees of freedom are the number of independent variables that can be estimated in a statistical analysis and tell you how many items can be randomly selected before constraints must be put in place.

E.g. there are 100 people who were recommended Advertisement A through email. A person can either Click or not Click the advertisement (N=2). So, if there are 25 people who clicked on that Advertisement, we can calculate that (100-25=) 75 people did not click the advertisement. So, we need only 1 variable to ascertain that information. Hence, the DOF will be 1. It can be calculated using (N-1).

Importance of DOF:

Degrees of freedom are important for finding critical cutoff values for a statistical test.

Degree of Freedom for a Chi-sq Test = (rows − 1) * (columns − 1)

DOF = (2 − 1)*(2 − 1) = 1×1 = 1

Step 5a: Use Critical value table to accept/reject Null hypothesis.

Now, we need to find the critical value of the chi-square distribution.

We can obtain this from the chi-square distribution table.

Now, let us look at the table and find the value corresponding to 1 degrees of freedom and a 0.05 significance factor.

The tabular or critical value of chi-square here is 3.841

Since, our calculated chi-sq value <= Critical chi-sq value, we fail to reject null hypothesis.

In simpler terms, there is no significance in clicks between the two advertisements.

Step 5b: Calculate and interpret p-value to accept/reject Null hypothesis.

Assuming that we already have the observed values from step 1, use the below steps for Excel:

Calculate Expected value using Step 2 in Excel.
Now, we can directly use the CHISQ.TEST function in Excel
Formula = CHISQ.TEST(Observed Range, Expected Range). Here, we will use CHISQ.TEST(C4:D5,H4:I5) since the Observed data is in cells C4:D5 and the Expected data is in cells H4:I5. (See image Below)

Upon running this test, you will observe a p-value of 0.871

We can use the below grid to interpret the p-value.

Since the p-value>0.05, we will fail to Reject the Null Hypothesis.

It means that the clicks are independent of the advertisement.

Step 5c: Calculate and interpret statistical significance to accept/reject Null hypothesis.

Statistical Significance = 1-p-value = 1 – 0.871 = 0.129.

It means that we are only 12.9% confident that these click rates will be observed for the two advertisements. Since, this significance is less than 0.95, we fail to reject null hypothesis.

Code to Implement Chi-Sq in Python

# python
from scipy.stats import chi2_contingency

# defining the table
# [a_click, a_noclick], [b_click, b_noclick]
data = [[25,75], [26,74]]

stat, p, dof, expected = chi2_contingency(data,correction=False)

# interpret p-value
alpha = 0.05
print("p value is ",(p))

if p <= alpha:
    print('Dependent (reject H0)')
else:
    print('Independent (H0 holds true)')

Make sure that you pass the correction=False argument in the function.

Output:

p value is 0.8711230866931309

Independent (H0 holds true)

How to Count of Instances or Volume Plays an Important Role in Determining Significance?

Let us increase the volume of the data as below:

Now let us rerun the same using Python snippet to see how the p-value changed.

# python
from scipy.stats import chi2_contingency

# defining the table
# [a_click, a_noclick], [b_click, b_noclick]
data = [[2500,7500], [2600,7400]]

stat, p, dof, expected = chi2_contingency(data,correction=False)

# interpret p-value
alpha = 0.05
print("p value is ",(p))

if p <= alpha:
    print('Dependent (reject H0)')
else:
    print('Independent (H0 holds true)')#import csv

Output:

p value is 0.10473464597187702

Independent (H0 holds true)

You see – how the p-value reduced from ~0.87 to ~0.1. If the traffic increases further and the click rate remains same, the p-value will further decrease.

Applications of Chi Square

Given below are a few most common applications of the chi-square formula

Biologists use it to determine if there is a significant association between the two variables, such as the association between two species in a community.
Genetic analysts use it to interpret the numbers in various phenotypic classes.
It is used in various statistical procedures to help to decide if to hold onto or reject the hypothesis.
It is used in the medical literature to compare the incidence of the same characteristics in two or more groups.
It is used in A/B testing to compare multiple groups.

Advantages of Using Chi-square Test

Easy to understand
Does not require the distribution to be normal
Can be used to test multiple metrics across multiple groups. For instance, if we were tracking 3 metrics : Click_on_button, Click_on_anywhere_else, No_Click for 3 advertisements. We could have a contingency table of 3 X 3 and DOF (Degree Of Freedom) = (3-1) * (3-1) = 4

Conclusion

A/B testing, also known as split testing, is a method of testing that compares two versions of a variable (for example a advertisement A vs advertisement B), showing each version to equal numbers of users at random to determine which performs better against a business goal.
Traditionally, t-test (or z-test) is used to determine the significance of A/B test. t-test assumes that the distribution is Gaussian (or normal). Since, we cannot always have a normal distribution with the data, we need to use a better and more generic approach.
Chi-sq test is more reliable for such cases.
It is easy to calculate and interpret Chi-sq in Excel and in Python.
The significance value of a Chi-sq statistic increases with increase in traffic.

Feel free to connect with me on LinkedIn if you want to discuss this with me.

Frequently Asked Questions

Q1. Why is it called chi-square?

A. The term “chi-square” is used because the test statistic follows a chi-square distribution. The distribution was first introduced by the statistician Karl Pearson, who named it after the Greek letter “χ” (chi), which resembles the shape of the distribution curve.

Q2. What is a chi-square test used for?

A. A chi-square test is used to analyze categorical data and determine if there is a significant association between two variables. It helps to assess whether the observed frequencies in different categories deviate significantly from the expected frequencies, indicating a relationship or independence between the variables.

Q3. What is the difference between t-test and chi-square?

A. The main difference between a t-test and a chi-square test lies in the types of data they analyze. A t-test is used to compare means between two groups, typically for continuous numerical data. In contrast, a chi-square test is used for categorical data to examine the association or independence between variables.

Q4. What is chi-square test and its types?

A. A chi-square test is a statistical method used to analyze categorical data. It has different types based on the research question and nature of data, including the chi-square goodness-of-fit test (assessing whether observed data fits an expected distribution) and the chi-square test of independence (evaluating the association between two categorical variables).

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

aakash93

Data Scientist with extensive experience in solving many real world business problems across different domains. Possess fine blend of business knowledge, maths/stats and technology/programming.

Experienced in handling client facing roles, stakeholder management, effective communication with presentation & negotiation skills.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

How to Use Chi Square to Fuel A/B Test?

Introduction

Table of contents

What is A/B Testing?

Limitations of Traditional A/B Test

What is Chi-square (or Chi-Sq)?

Actual Calculations Using Excel

Step 1: Create a contingency table for observed values

Step 2: Calculate Expected values

Step 3: Calculate Chi-sq statistic

Step 4: Calculate DOF (Degrees of freedom)

Importance of DOF:

Step 5a: Use Critical value table to accept/reject Null hypothesis.

Step 5b: Calculate and interpret p-value to accept/reject Null hypothesis.

Step 5c: Calculate and interpret statistical significance to accept/reject Null hypothesis.

Code to Implement Chi-Sq in Python

How to Count of Instances or Volume Plays an Important Role in Determining Significance?

Applications of Chi Square

Advantages of Using Chi-square Test

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)