Creating a Simple Z-test Calculator using Streamlit

Rahul Shah Last Updated : 27 Nov, 2021

8 min read

This article was published as a part of the Data Science Blogathon.

Statistics plays an important role in the domain of Data Science. It is a significant step in the process of decision making, powered by Machine Learning or Deep Learning algorithms. One of the popular statistical processes is Hypothesis Testing having vast usability, not limited up to the Data Science domain only. One can find the applications of Hypothesis Testing in Healthcare, Biological, Mechanical domains. In fact, many of the biological process decisions are based on Hypothesis Testing.

Z-test Calculator image — Photo by cottonbro from Pexels

In this article, we will learn about Hypothesis Testing & one of the most popular Hypothesis Test i.e. Z-test. We will also look at how to build our own Z-test Calculator in an interactive way quickly, using Python’s package Streamlit.

1. What is Hypothesis?

2. How Hypothesis Testing is Performed?

3. Z – test

4. Creating Z – test Calculator

5. Conclusions

What is a Hypothesis?

A Hypothesis is a statement that is not proved or is yet to be proved. In other words, it’s an assumption that is to be tested to check if it’s true or not. The tests which are involved in testing these hypotheses are known as Hypothesis Testing, which we will see in the next section. Although there are many types of hypotheses, generally there are two types of it widely used namely Null Hypothesis, Alternate Hypothesis. These two hypotheses are completely opposite to each other. A Null Hypothesis is a statement which we believe currently is true. An Alternate Hypothesis is a statement that becomes true when the Null Hypothesis turns out to be false. Null Hypothesis is identified by H0 & Alternate Hypothesis is identified by Ha.

Statistically, it checks if a sample from a population is equal to the population, based on the population parameters such as Mean, Standard Deviation etc. For example, we know that the Mean Female height in a particular society is 168 cm & we want to test this claim if the mean height is 168 cm or not. Here, our Null Hypothesis is

H0: μ = 168

and Alternate Hypothesis is

Ha: μ ≠ 168

Now, based on the statements, the alternate hypothesis might have less than ≤, or greater than ≥ signs between the known values & Population’s Mean or Proportion. These signs also decide the tails of the test. Once we have the Null Hypothesis value, we can easily identify the Alternate Hypothesis statement & the sign to be involved.

How Hypothesis Testing is Performed?

To decide which hypothesis is to be adopted out of Null & Alternative hypotheses, we need to use the Hypothesis Test. There are several tests available that can be used to make decisions. The choice of the test depends on several parameters such as sample size, number of tails involved in the test, sample data type, Population Parameters involved, & others. Each of these tests calculates a test statistic. For example, a Z-test calculates Z-score or Z-statistic, T-test calculates T-score or T-statistic. Based on these statistical values, p-value, level of significance, we make a decision whether we are able to reject the Null Hypothesis or fail to reject the Null Hypothesis. A general process of Hypothesis testing is:

1. State the Null & Alternate Hypothesis statements

2. Specify Level of Significance (α)

3. Calculate the Test Statistic & p-value

4. Specify Critical Region

5. Conclusion – Reject or Accept Null Hypothesis

Remember that in Hypothesis Testing we test the Null Hypothesis. This means that either we will be able to Reject the Null Hypothesis or Fail to reject the Null Hypothesis. We would never say that we are accepting Alternative Hypotheses or Rejecting the Alternate Hypothesis. Although technically it means the same we should always affirm in terms of the Null Hypothesis. In this article, we will be focusing only on the Z-test & calculating its statistic value, or Z-score. Choosing the correct Hypothesis Test is the most important part, as selecting an incorrect test would lead to incorrect results & as a result, one would make incorrect decisions. You can refer to this article to learn how to choose the correct Hypothesis Test based on the problem statement.

What is Z-Test?

Z-test is a kind of Hypothesis test based on Standard Normal Distribution. It is also known as Standard Normal Z Test. Using this test, we calculate the Z-score or Z-statistic value. Z-test is used for testing the following:

1. Mean of a single population (μ)

2. Difference between means of two populations (μ1 – μ2)

3. Proportion of a single population (P)

4. Difference between proportions of two populations (P1 – P2)

The Z-test has several assumptions which need to be fulfilled before using it. The assumptions are as follows:

1. The sample size should be more than 30.

2. Sample data should be selected at random from the Population.

3. The Samples should be drawn from Normal Population Data.

4. The Population Variance should be known beforehand.

5. The Samples (or Populations) should be independent of each other.

The Z-statistic value can be calculated using the formula:

Where,

x̄ is the Sample Mean,

μ is Population Mean,

σ is Sample Standard Deviation

n is the sample size.

Note: This formula is for one sample Z-test.

After calculating the Z-score, the conclusion may change based on the tails of the test. Generally, there are three types of tailed tests: Left tailed test, Right Tailed test, Two-tailed test. The thumb rule is if we have ≤ or ≥ signs between the Ha & the given value, it is a one-tailed test (or Left & Right tail, respectively), or if we have ≠ between the Ha & the given value, it is a two-tailed test.

Thus, after calculating the z statistic value, if our test is left tailed, our conclusion can be determined by the rule: If the calculated Z-statistic value is less than the critical Z value, Reject Null Hypothesis H0 else we fail to reject the H0. If our test is right-tailed, our conclusion can be determined by the rule: If the calculated Z-statistic value is more than the critical Z value, Reject Null Hypothesis H0 else we fail to reject the H0. If our test is two-tailed, our conclusion can be determined by the rule: If the calculated Z-statistic value is less than or greater than the critical Z value, Reject Null Hypothesis H0 else we fail to reject the H0.

Creating Z – test Calculator

Now we will create the Z-test calculator. For this, we need to import the required libraries, firstly.

import streamlit as st
import numpy as np

Now, we will accept the sample data from the users. We accept the sample data from the user using Streamlit’s .text_input() method with a default value of 0 since .text_input() accepts a string type value. Then, to convert all the input values into a float value in the next step, first, we need to make sure that there is no empty string between the data as it would not be parsed to float.

raw_data = st.text_input('Enter the Data', value = 0)
raw_data = raw_data.strip()
data = raw_data.replace(" , " , " ").replace(", " , " ").replace(" ," , " ").replace(" " , ',').split(',')
x = [float(i) for i in data]

Next, we will accept the known Population Mean from the user. We accept the user input using the Streamlit method .text_input() and convert it into a float since .text_input() accepts a string type value. The value = 0 argument inside the .text_input() states that the default value for this field is 0 when no data is filled by the user.

mu = st.text_input('Enter Population Mean', value = 0)
mu = float(mu)

Next, we will calculate the Sample Size, Mean & Standard Deviation using the NumPy package. We used the .mean() method to calculate the mean of sample input data, the .std() method to calculate the standard deviation of the sample input data (with denominator degrees of freedom = 1) and len() for calculating the same size of input data.

xbar = np.mean(x)
sigma = np.std(x, ddof = 1)
n = len(x)

Finally, we will calculate the Z-statistic Value using the formula mentioned above.

z_cal = (xbar - mu) / (sigma / np.sqrt(n))
st.write("Your z - statistic value is: ", np.round(z_cal, 3))

Note that since we are using Streamlit, a Python library for making the Z-test Calculator interactive, we are using its methods in place of generic Python methods. Thus, we are using .text_input() in place of input() for accepting the text input. Also, using the .write() method in place of the generic print() method to display the strings.

Putting it all together for Z-test Calculator

import streamlit as st
import numpy as np
raw_data = st.text_input('Enter the Data', value = 0)
raw_data = raw_data.strip()
data = raw_data.replace(" , ", " ").replace(", ", " ").replace(" ,", " ").replace(" ", ',').split(',')
x = [float(i) for i in data]
mu = st.text_input('Enter Population Mean', value = 0)
mu = float(mu)
xbar = np.mean(x)
sigma = np.std(x, ddof = 1)
n = len(x)
z_cal = (xbar - mu) / (sigma / np.sqrt(n))
st.write("Your z - statistic value is: ", np.round(z_cal, 3))

One can run the script by executing the statement streamlit run .py on the Command prompt or Anaconda Prompt.

On executing this code, when we added sample data, we get:

Now, let’s add the error handling functionalities to make it more robust & accessible in any test case. This can be done by adding try-except blocks, wherever necessary.

Putting it all together after adding the try-except blocks

import streamlit as st
import numpy as np
try:
    raw_data = st.text_input('Enter the Data', value = 0)
    raw_data = raw_data.strip()
    data = raw_data.replace(" , ", " ").replace(", ", " ").replace(" ,", " ").replace(" ", ',').split(',')
    x = [float(i) for i in data]
except:
    st.write('Enter Valid Numerical Data!')
try:
    mu = st.text_input('Enter Population Mean', value = 0)
    mu = float(mu)
except:
    st.write('Enter valid Population Mean!')
try:
    xbar = np.mean(x)
    sigma = np.std(x, ddof = 1)
    n = len(x)
    z_cal = (xbar - mu) / (sigma / np.sqrt(n))
    st.write("Your z - statistic value is: ", np.round(z_cal, 3))
except:
    st.write('Cannot compute z-statistic value. One or more fields does not contain valid Data.')
    st.write('Check for the input field having warnings!')

Let’s add a non-numerical value to the Sample Data input field and check the results.

personal computer — Source – Personal Computer

As expected, the script threw the exception with the error message when we added the non-numerical data into the input field.

Conclusions

In this article, we learned about Hypothesis & Hypothesis Testing. We also learned about one of the most popular Hypothesis tests, i.e. Z-test or the Standard Normal Test meant for Standard Normal Distribution. We also looked at the formula for calculating the Z-statistic formula. This formula, later, was used for building the calculator as well. Then, we learned how to build a Streamlit application. We built a One-Sample Z-test statistic Calculator. One can try building a Two-Sample Z-test statistic calculator, p-value calculator based on Z-score, printing conclusions based on the Tail type selected. Similarly, We can also build a confidence interval calculator too. Thus, this created application can be used as a prototype to create a more functional application.

About the Author

Connect with me on LinkedIn.

For any suggestions or article requests, you can email me here.

Check out my other Articles Here and on Medium

You can provide your valuable feedback to me on LinkedIn.

Thanks for giving your time!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion

Rahul Shah

IT Engineering Graduate currently pursuing Post Graduate Diploma in Data Science.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Creating a Simple Z-test Calculator using Streamlit

Table of Contents

What is a Hypothesis?

How Hypothesis Testing is Performed?

What is Z-Test?

Creating Z – test Calculator

Putting it all together for Z-test Calculator

Putting it all together after adding the try-except blocks

Conclusions

About the Author

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#