Understanding Dimensional Modeling

Tanmay Last Updated : 28 Feb, 2023

10 min read

Introduction

One of the most important assets of any organization is the data it produces on a daily basis. This data is used by an organization to find valuable insights which help in improving an organization’s growth and strategies and give them an upper hand over its competitors. This article explains to you the idea of Dimensional Modeling as part of Data Warehousing with different steps involved.

Learning Objective

Understand Data Warehousing
Understand the difference between Normalised and Denormalised form
Learn Dimensional Modeling and its implementation using real-world application

But this data, both current and historical, can only be utilized to its full potential when it is easily accessible and available. So now the question arises of how we can store this data in such a way that it fulfills the above requirements.
Here comes the concept of Data Warehousing.

Data Warehousing

Data warehousing is a technique in which information is stored in a central repository that can be used by business analysts, data engineers, data scientists, and decision-makers through business intelligence (BI) tools to make decisive decisions.
It is used to handle large amounts of data to understand relationships and trends across the data.
Advantages of Data Warehousing –

It makes information easily accessible.
It presents information consistently.
It adapts to changes in the data.
It presents information in a timely way.
It is secure, so it protects the information assets.

Data in Data Warehouse is stored in a tabular format which can be normalized or denormalized.

Difference Between Normalized and Denormalized Form

In normalized form, data is stored in multiple tables, reducing data redundancy and inconsistency, thus achieving data integrity. In the denormalized form, data is stored in a limited number of tables (maybe a single table) to reduce querying time.
Both of them contain joined tables, but the key difference between them is the degree of normalization. As the degree of normalization increases, the complexity of the model increases, and as the complexity of the model increases, the time to retrieve data also increases.

Dimensional Modeling

The data model used to store data in the denormalized form is called Dimensional Modeling. It is the technique of storing data in a Data Warehouse in such a way that enables fast query performance and easy access to its business users. It involves creating a set of dimensional tables that are designed to support business intelligence and reporting needs.
The goal of dimensional modeling is to provide a simple and intuitive way to access and analyze data, making it easy for business users to understand and use it. It aims at making simple data models. When the data models are as simple as possible, they can be understood easily, allowing the software to navigate and deliver results quickly and efficiently.
The core concept of dimensional modeling is the creation of a star schema. It is called so as the tables are arranged in the form of a star. Dimensional modeling includes facts and dimensions. Let’s have a basic idea of what Facts and Dimensions are.
Fact tables contain measures or numerical data associated with a business process, like the number of products sold. In contrast, dimensional tables store the description or textual information related to the business process, like who bought the products. We will discuss facts and dimensions in detail later in this article.
A dimensional model represents the different business processes of an organization. A fact table with its dimension table is a single business process.
Each dimensional model consists of many fact tables, with each fact table joined with corresponding dimension tables. A fact table is connected to another fact table via a common dimensional table between them; this common dimensional table is called a bridge table. We can even connect a fact table with a fact table directly, but it is not a wise option as it makes the model complex and difficult to understand.
Dimensional Modeling

Source: Javatpoint

A dimensional table is connected to the fact table using the foreign key in the fact table. The dimensional table is the parent table, and the fact table is the child table.
A dimensional model contains the same information as a normalized model. Still, the data in a dimensional model is packed in such a way that delivers user understandability, query performance, and resilience to change.

Implementation of Dimensional Modeling in Real World

Dimensional Modeling can be best understood by implementing it in real-world cases. We will pick up the E-Commerce industry like Myntra, Flipkart, Amazon, etc., as it is familiar to everyone. We will be creating a dimensional model depicting different business processes that take place in an E-Commerce industry.

Four Steps of Designing a Dimensional Model

Step 1: Select the Business Process

The first step involves selecting the business process, and it should be an action resulting in output.
Business Process #1:The e-Commere industry is widely known for selling and buying goods over the internet, so our first business process will be the products bought by the customers.
Business Process #2: Delivery status is also one of the most important business processes in this industry. It tells us where the product is currently from. It’s dispatched from the warehouse to the customer’s given address.
Business Process #3: Maintaining the inventory in order to ensure that items don’t run out of stock, how sales are going on etc.

Step 2: Decide the Grain of each Business Process

A grain is a business process at a specified level. It tells us what exactly a row, in fact, a table, represents. All the rows in a fact table should result from the same grain. Each fact table is the result of a different grain selected in a business process. The grain should be as granular (at the lowest level) as possible.
Grains for the above business processes are
Grain 1: We can have the grain as the products purchased by the customer, i.e., each row of the fact table will represent all the products checked out by the customer from the cart but suppose a customer ordered 100 products, so this will be represented as a single row. Imagine how complex it will become to query such data, so we must choose a grain as granular as possible. Therefore, our grain will be an individual product ordered by a customer, i.e., one product per row. This will make the data simple and easy to query.
Similarly, we will select the most granular grains for the remaining processes.
Grain 2: Here also, the grain will be the status of an individual product shipped from the warehouse to the delivery location.
Grain 3: Here, each row will represent the daily inventory for each product in each store., it will tell the stock of that product left in the inventory and how many products have already been sold.

Step 3: Identify the Dimensions for the Dimensional Table

Before identifying the dimensions we will understand what a dimensional table is.

Dimensional Tables
These are the tables that are joined to fact tables. It describes the “who, what, where, when, how, and why” associated with the business event. It contains the descriptive attributes used for grouping and filtering the facts.
Some important points regarding Dimension Tables:

It stores textual information related to a business process.
It answers the ‘who, what, where, when, why, and how’ questions related to a particular business process.
Dimension tables have more columns and less number of rows.
Each dimension table has a primary key that is joined to its given fact table.
Dimension attributes are the primary source of query constraints, grouping, and filtering.

Dimensions describe the measurements of the fact table. For example, customer id is a measurement, but we can describe its attributes further, more as what is the name of the customer, the address of the customer, gender, etc.
Our dimensional model will have the following dimensions:

Date Dimension: This dimension table is used in almost every dimensional model as it helps monitor the business’s performance with time.

Data Dimension in Dimensional Modeling

Product Dimension: This table will contain information regarding the product ordered.

Product Dimension in Dimensional Modeling

Order Dimension: This detail will contain information regarding the order.

Order Dimension in Dimensional Modeling

Customer Dimension: This dimension table will contain the customer’s information

Promotion Dimension: This table covers the promotion condition under which the product was sold. The promotion conditions include temporary sales, reduction in price, discounts, etc.

Warehouse Dimension: This table will contain information about the different warehouses located across the country.

Step 4: Identify the Facts for the Dimensional Table

This is the final step in which we have to decide which facts (measurements) must be included in the fact table, but before that, let’s discuss what a fact table is.

Fact Table

The term fact represents a business measure; therefore, a fact table in dimensional modeling stores the performance measurements resulting from a business process. These performance measurements measure the business, i.e., these are the metrics through which we can infer whether our business is in profit or loss. Different business measurements can be unit price, number of goods sold, etc. Each row in a fact table is a business event that results in measurements, and each fact table represents a business process in the organization. Now the event depends upon the grain we select.
The selection of grain plays a vital role in the success of our dimensional model as it helps in selecting the measurements in the fact table to which further dimension tables are joined.
Since we have chosen three business processes, we will have three fact tables, but sometimes we get confused about whether an attribute should be added to the fact table or dimension table. To avoid that confusion, we will be using the following points to identify whether an attribute is a fact or dimension:

Textual data is generally stored in dimension tables where, whereas numeric data is generally stored in the fact table.
Continuous valued numeric values are stored in the fact tables, whereas discrete numeric values are stored in the dimension table.
The values that constantly change are kept in the fact table, whereas values that remain static or change very less with time are kept in the dimensional table.

Fact Table 1:

Grain: Individual product of the order per row.
So we will select the measurements corresponding to this grain. When we check out, the measurements that come are unit price, quantity, ordered, discount, etc., so we have added these measurements.
In this way, we add our measurements to a fact table. Weight also could have been added as a measurement, but its value remains constant therefore, we will keep it in the dimension table.
Fact Table 2:

Grain: Delivery Status of individual products in the order

This fact table will tell us the delivery status of the product, i.e., the location of the product it got delivered, etc.
Fact Table 3:

Grain: Daily inventory for each product in each store.
So through this fact table, we will track the stock of the different products. This is how we create our dimensional model by following the above steps.
Our final model will look like this:

There are three types of fact tables:

Transaction Fact Tables: It records a row in the table whenever there is a transaction. Here the transaction is the grain itself. The first fact table in our model is a transaction fact table in which different transactions between customers and companies are depicted.
Periodic Fact Tables: These record a row in the table for a definite period of time. That period of time can be daily, monthly, etc. Here the grain is the period of time we select. Example – The inventory fact table that we have taken is an example of a periodic fact table in which we will be calculating the stock of the products on a daily basis.
Accumulating Fact Tables: It stores predictable steps between the process’s beginning and end. Whenever a predictable step is recorded, the table is revisited and updated. Updation of a row is unique to this type of fact table only as compared to the other types of fact tables. Example: The order’s delivery status is an example of the accumulating fact table in which we have to update the product’s location whenever it reaches the desired location until it is delivered.
Factless Fact Tables: It records a row in the table with no numeric measurement, but that grain is important to store. This table is also used to record data that didn’t happen. Example: A table showing the products on sale which were not sold.

Some important points regarding Fact Tables:

A fact table generally contains numeric and additive facts. E.g., the number of products sold, the cost of each product, etc
Facts can also be semi-additive or non-additive.
Generally, textual data is not stored in a fact table; if required, it can be stored.
Continuous numeric values are stored in the fact table. E.g., The cost of products can take value in a broad range.
We should not store redundant textual information in fact tables unless the text is unique for every row in the fact table.
We should avoid using 0 for no activity in the fact table.
Fact tables have more rows and less number of columns.
Fact tables have two or more foreign keys that connect to the dimension table’s primary keys.
It answers the question, “What is the business process measuring.”

Conclusion

I hope through this article; you got a basic understanding of Dimensional modeling. This article shows how fact and dimension tables are created and the steps we follow to implement a dimensional model successfully.

Key Takeaways

Data Warehouse is a central repository used to store data that data analysts and data engineers can use for analysis purposes.
Dimensional modeling is a technique used to store normalized forms of data.
Dimensional modeling consists of creating Fact and Dimension tables.

Tanmay

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Understanding Dimensional Modeling

Introduction

Table of Contents

Data Warehousing

Difference Between Normalized and Denormalized Form

Dimensional Modeling

Implementation of Dimensional Modeling in Real World

Four Steps of Designing a Dimensional Model

Step 1: Select the Business Process

Step 2: Decide the Grain of each Business Process

Step 3: Identify the Dimensions for the Dimensional Table

Step 4: Identify the Facts for the Dimensional Table

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)