Organizing a vast dataset into an easily accessible structure is crucial. The Star schema provides an effective approach, comprising two key elements: fact table and dimension tables. Dimension tables encircle a central fact table, creating the star or snowflake shape. Clear distinctions between these components, especially in the context of fact table vs dimension table, require a comprehensive grasp of their individual concepts and roles, ensuring a solid understanding of their specific purposes. Let’s explore the difference between these two data models now!
The fact table is a comprehensive collection of attributes derived from the dimension table. It contains quantitative data where dimension table attributes influence the values. This table holds denormalized data for analysis. Comprising two columns, one column stores foreign keys, while the other stores corresponding data or values. The foreign key column is linked with dimension table attributes, while the second column contains numerical values. Vertical expansion in the fact table is more pronounced, reflecting a growing number of records, as opposed to horizontal expansion with fewer attributes.
The dimension table is the significant part of the Star schema that provides measurable dimensional modeling for fact tables. The table witnesses horizontal expansion with thousands of rows with non-frequent updates. It is curated hierarchically and comprises quantitative data. The dimension table also contains two keys, primary and surrogate keys. The primary key concerns the presence of a unique identity in each row or record. The system-generated surrogate key is associated with row identification in the table.
Parameters | Fact Table | Dimension Table |
---|---|---|
Basic | Contains quantitative data concerning business events | Provides descriptive context and attributes for the data in the fact table |
Sequence of creation | Made after dimension table | Created first |
Components | Facts, metrics and measurements | Descriptive attributes |
Quantity of components | Fewer attributes and more records | Fewer records and more attributes |
Marked by | Grain or atomic level | Words, completeness, level of detail |
Hierarchy | Absent | Present |
Location in Star schema | Middle | Edges |
Purpose | Analysis and decision making | Data and process storage |
Growth | Vertical | Horizontal |
The characteristics of the fact table and dimension table are as follows:
Characteristics of Fact Table:
Characteristics of Dimension Table:
There are different types of facts and dimensions. Additionally, there are different types of fact table and dimension table.
They capture the individual business or transaction events as soon as they occur. Order updates and discrete actions in sales operations is an example of this. Such tables also have a high level of detail and grow significantly in size. The usage of these tables is suitable for data analysis at a granular level. However, they may require complex queries for performance.
These tables store aggregated data at predefined intervals. These tables are preferred for storing the summarized data of specific times and periods. It is of importance when required to analyze trends and changes over time without affecting the individual transactions.
These tables are significant for tracking the progress of a process or workflow over time. Holding more superficial information, it covers the milestones or stages of different processes. In such tables, rows represent the progression of the entity through various stages, which further helps in easy tracking and analysis of the process. The application is seen in project management, order fulfillment and production cycles.
These tables lack measurable numerical facts or data. They are used to capture relationships between dimensions crucial for contextual analysis. The application is seen in tracking occurrences, events or associations significant to the business process. Users use these for insights into patterns and trends.
The type of dimension table varies with the dimension. Here, there are common types of both as enlisted:
Here, the dimensions witness slow changes rather than periodic changes. Change in tables can be done in three ways, type 1, type 2 and type 3.
Here, the dimension or attribute is stored in the fact table rather than in a separate dimension table. For example, invoice or transaction numbers are degenerate dimensions.
The junk dimension table comprises a mixture of unrelated and different attributes. Thus the table is less complex and is often suited for rapidly changing dimensions.
The table comprises numerous valid relationships with the fact table. It involves the usage of specific attributes for different and multiple attributes. For instance, the Date dimension will be used for both ‘order date’ and ‘shipment date’.
The attribute is used in multiple locations in fact tables in the data warehouse. It maintains consistency and avoids discrepancy.
Let us understand the fact and dimension table with certain examples. Suppose there is production of noodles in an industry. Fact table involves organized data on food processing, storage and delivery information. The fact table can be structured like:
Serial Number | Scheduled Measures |
---|---|
1 | Processing_Data |
2 | Packaging_Data |
3 | Storage_Data |
4 | Delivery_Data |
The fact table organized a structure of the scheduled measures of the overall processing of the product. The dimension table will be made using the fact table like the following.
Detailed information about the processing will be reported in the table.
Processing Material | Supplier | Date of Expiry |
---|---|---|
White_flour | Supplier A | xx-xx-yy |
Salt | Supplier B | xx-xx-yy |
Oil | Supplier C | xx-xx-yy |
Packaging involved structured information specifically of the packaging.
Packaging Material ID | Supplier | Material Type | Unit Price |
---|---|---|---|
111 | Supplier K | Plastic | $10 |
112 | Supplier L | Cardboard | $9.5 |
113 | Supplier M | Wrapping_tissue | $7.2 |
Storage information is prepared via a dimension table, where detailed analyses have been done on each product stored.
Product Number | Supervision | Hall no. | Shelf no. |
---|---|---|---|
51 | Officer A | B1 | B-B2 |
52 | Officer B | C2 | C-H13 |
53 | Officer C | H1 | H-Y2 |
The dimension table represents organised observation on delivery data.
Trunk code | Supervision | Delivered City | Date of Delivery |
---|---|---|---|
AA2 | Agent X | Atlanta | xx-xx-yy |
AA53 | Agent Y | Albany | xx-xx-yy |
BC1 | Agent Z | Homerville | xx-xx-yy |
The limitations of fact and dimension tables are enlisted individually as follows:
There are certain limitations in the dimensions tables
In a data warehouse, fact tables and dimension tables work together to store and analyze business data. They form the foundation of a data warehouse schema, typically designed in a star schema or snowflake schema.
Fact tables store quantitative data, also known as measures or facts. This is the numerical data you use to analyze trends, comparisons, and other metrics. Examples of facts include sales figures, website clicks, or inventory levels. Fact tables tend to have fewer attributes but many rows, with each row representing a single event or transaction.
Dimension tables provide context and descriptive details about the facts in the fact table. They contain non-numerical attributes that categorize and define the measures. Think of them as providing the “who, what, when, where, why, and how” behind the numbers in the fact table. Examples of dimensions include customer information (customer dimension), product details (product dimension), dates, or locations. Dimension tables tend to have more attributes but fewer rows.
When designing a data warehouse, it’s crucial to consider various aspects like Power BI integration for visualizations, the structure of your data mart, and potential interview questions about your design choices. Effective design helps ensure your data tables are optimized for performance and easy to query.
In conclusion, understanding the roles of dimension and fact tables within data warehousing and business intelligence systems is paramount for effective data analysis. With their widespread applications, accurate differentiation is crucial for seamless operations. Fact tables hold numerical data, whereas dimension tables provide descriptive context to the information stored in fact tables. When tackling queries related to ‘what’ and ‘how much,’ refer to the fact table. For insights into ‘who,’ ‘where,’ ‘when,’ and ‘why,’ turn to the dimension table. This foundational understanding enables optimal utilization of these tables for enhanced decision-making and analytical capabilities.So, with this tutorial hope you understand the difference between fact table and dimension table.
Ready to deepen your understanding of data analysis? Take your skills to the next level with Analytics Vidhya’s Blackbelt program and excel in the world of data-driven insights.
A. Fact table stores quantitative data representing business events or transactions. The dimension table contains descriptive attributes for the fact table data.
A. Consider a sales fact table of a business. The components of the fact table will be order data, product ID, customer ID and sales amount with vertical enlistments of multiple details
A. Dimension tables are designed before fact tables as they provide the context and description for the fact tables.
A. The fact tables are of four types, transaction fact, periodic snapshot fact, accumulating snapshot fact and factless fact tables.
Fact tables are larger than dimension tables due to their distinct roles in storing and organizing data. Fact tables contain numerical data and may include multiple measures for each dimension combination, while dimension tables hold descriptive attributes. Additionally, fact tables may be denormalized to enhance query performance, further increasing their size.