One of the most important assets of any organization is the data it produces on a daily basis. This data is used by an organization to find valuable insights which help in improving an organization’s growth and strategies and give them an upper hand over its competitors. This article explains to you the idea of Dimensional Modeling as part of Data Warehousing with different steps involved.
Learning Objective
But this data, both current and historical, can only be utilized to its full potential when it is easily accessible and available. So now the question arises of how we can store this data in such a way that it fulfills the above requirements.
Here comes the concept of Data Warehousing.
Data warehousing is a technique in which information is stored in a central repository that can be used by business analysts, data engineers, data scientists, and decision-makers through business intelligence (BI) tools to make decisive decisions.
It is used to handle large amounts of data to understand relationships and trends across the data.
Advantages of Data Warehousing –
Data in Data Warehouse is stored in a tabular format which can be normalized or denormalized.
In normalized form, data is stored in multiple tables, reducing data redundancy and inconsistency, thus achieving data integrity. In the denormalized form, data is stored in a limited number of tables (maybe a single table) to reduce querying time.
Both of them contain joined tables, but the key difference between them is the degree of normalization. As the degree of normalization increases, the complexity of the model increases, and as the complexity of the model increases, the time to retrieve data also increases.
The data model used to store data in the denormalized form is called Dimensional Modeling. It is the technique of storing data in a Data Warehouse in such a way that enables fast query performance and easy access to its business users. It involves creating a set of dimensional tables that are designed to support business intelligence and reporting needs.
The goal of dimensional modeling is to provide a simple and intuitive way to access and analyze data, making it easy for business users to understand and use it. It aims at making simple data models. When the data models are as simple as possible, they can be understood easily, allowing the software to navigate and deliver results quickly and efficiently.
The core concept of dimensional modeling is the creation of a star schema. It is called so as the tables are arranged in the form of a star. Dimensional modeling includes facts and dimensions. Let’s have a basic idea of what Facts and Dimensions are.
Fact tables contain measures or numerical data associated with a business process, like the number of products sold. In contrast, dimensional tables store the description or textual information related to the business process, like who bought the products. We will discuss facts and dimensions in detail later in this article.
A dimensional model represents the different business processes of an organization. A fact table with its dimension table is a single business process.
Each dimensional model consists of many fact tables, with each fact table joined with corresponding dimension tables. A fact table is connected to another fact table via a common dimensional table between them; this common dimensional table is called a bridge table. We can even connect a fact table with a fact table directly, but it is not a wise option as it makes the model complex and difficult to understand.
Source: Javatpoint
A dimensional table is connected to the fact table using the foreign key in the fact table. The dimensional table is the parent table, and the fact table is the child table.
A dimensional model contains the same information as a normalized model. Still, the data in a dimensional model is packed in such a way that delivers user understandability, query performance, and resilience to change.
Dimensional Modeling can be best understood by implementing it in real-world cases. We will pick up the E-Commerce industry like Myntra, Flipkart, Amazon, etc., as it is familiar to everyone. We will be creating a dimensional model depicting different business processes that take place in an E-Commerce industry.
The first step involves selecting the business process, and it should be an action resulting in output.
Business Process #1:The e-Commere industry is widely known for selling and buying goods over the internet, so our first business process will be the products bought by the customers.
Business Process #2: Delivery status is also one of the most important business processes in this industry. It tells us where the product is currently from. It’s dispatched from the warehouse to the customer’s given address.
Business Process #3: Maintaining the inventory in order to ensure that items don’t run out of stock, how sales are going on etc.
A grain is a business process at a specified level. It tells us what exactly a row, in fact, a table, represents. All the rows in a fact table should result from the same grain. Each fact table is the result of a different grain selected in a business process. The grain should be as granular (at the lowest level) as possible.
Grains for the above business processes are
Grain 1: We can have the grain as the products purchased by the customer, i.e., each row of the fact table will represent all the products checked out by the customer from the cart but suppose a customer ordered 100 products, so this will be represented as a single row. Imagine how complex it will become to query such data, so we must choose a grain as granular as possible. Therefore, our grain will be an individual product ordered by a customer, i.e., one product per row. This will make the data simple and easy to query.
Similarly, we will select the most granular grains for the remaining processes.
Grain 2: Here also, the grain will be the status of an individual product shipped from the warehouse to the delivery location.
Grain 3: Here, each row will represent the daily inventory for each product in each store., it will tell the stock of that product left in the inventory and how many products have already been sold.
Before identifying the dimensions we will understand what a dimensional table is.
Dimensional Tables
These are the tables that are joined to fact tables. It describes the “who, what, where, when, how, and why” associated with the business event. It contains the descriptive attributes used for grouping and filtering the facts.
Some important points regarding Dimension Tables:
Dimensions describe the measurements of the fact table. For example, customer id is a measurement, but we can describe its attributes further, more as what is the name of the customer, the address of the customer, gender, etc.
Our dimensional model will have the following dimensions:
Date Dimension: This dimension table is used in almost every dimensional model as it helps monitor the business’s performance with time.
Product Dimension: This table will contain information regarding the product ordered.
Order Dimension: This detail will contain information regarding the order.
Customer Dimension: This dimension table will contain the customer’s information
.
Promotion Dimension: This table covers the promotion condition under which the product was sold. The promotion conditions include temporary sales, reduction in price, discounts, etc.
Warehouse Dimension: This table will contain information about the different warehouses located across the country.
This is the final step in which we have to decide which facts (measurements) must be included in the fact table, but before that, let’s discuss what a fact table is.
Fact Table
The term fact represents a business measure; therefore, a fact table in dimensional modeling stores the performance measurements resulting from a business process. These performance measurements measure the business, i.e., these are the metrics through which we can infer whether our business is in profit or loss. Different business measurements can be unit price, number of goods sold, etc. Each row in a fact table is a business event that results in measurements, and each fact table represents a business process in the organization. Now the event depends upon the grain we select.
The selection of grain plays a vital role in the success of our dimensional model as it helps in selecting the measurements in the fact table to which further dimension tables are joined.
Since we have chosen three business processes, we will have three fact tables, but sometimes we get confused about whether an attribute should be added to the fact table or dimension table. To avoid that confusion, we will be using the following points to identify whether an attribute is a fact or dimension:
Fact Table 1:
Grain: Individual product of the order per row.
So we will select the measurements corresponding to this grain. When we check out, the measurements that come are unit price, quantity, ordered, discount, etc., so we have added these measurements.
In this way, we add our measurements to a fact table. Weight also could have been added as a measurement, but its value remains constant therefore, we will keep it in the dimension table.
Fact Table 2:
Grain: Delivery Status of individual products in the order
This fact table will tell us the delivery status of the product, i.e., the location of the product it got delivered, etc.
Fact Table 3:
Grain: Daily inventory for each product in each store.
So through this fact table, we will track the stock of the different products. This is how we create our dimensional model by following the above steps.
Our final model will look like this:
There are three types of fact tables:
Some important points regarding Fact Tables:
I hope through this article; you got a basic understanding of Dimensional modeling. This article shows how fact and dimension tables are created and the steps we follow to implement a dimensional model successfully.
Key Takeaways