This article was published as a part of the Data Science Blogathon.
Introduction
NoSQL databases allow us to store vast amounts of data and access them anytime, from any location and device. However, deciding which data modelling technique best suits your needs is complex. Fortunately, there is a data modelling technique for every use case.
This tutorial will cover all the different NoSQL data modelling techniques you can use when building a NoSQL database.
Understanding of NoSQL Data Modelling Techniques
NoSQL or ‘Not Only SQL’ is a data model that differs significantly from traditional
SQL expectations. The primary difference is that NoSQL does not use relational data modelling techniques and emphasises flexible design. The absence of schematic requirements makes designing a much simpler and cheaper process. This doesn’t mean that you can’t use the schema completely, but rather that the schema design is very flexible.
Another helpful feature of NoSQL data models is that they are designed for high efficiency and speed in making up millions of queries per second. This is achieved by having all the data in one table so that JOINS and cross-references are not as performance intensive. SQL is only vertically scalable, but on the other hand, NoSQL is both vertically & horizontally scalable. In addition, with NoSQL, you can use another shard, which is cheap, rather than buying additional hardware, which isn’t.
Note: Horizontal scalability allows for clusters of quickly provisioned infrastructure such as Bare Metal Cloud to host a complex NoSQL database for various enterprise use cases.
Four Types of NoSQL Databases
There are four different types of NoSQL databases on which dozens of data models are based:
Key-value store
Built specifically for high-performance requirements and probably one of the most common data models, key-value stores use key-values with pointers to store data.
Image Source: creware.asia
This pointer refers directly to a specific piece of information, which can be anything you want. You can even use an empty string as the value key if you wish to, although there are upper limits to how big the value can be depending on the database.
Interestingly, Amazon initially helped get this data model off the ground and used it for DynamoDB. Since they are one of the largest online marketplaces in the world, you can see how powerful this data model can be.
Document-based Store
XML and JSON tend to be tied to SQL, which slows down queries and the whole process. However, because NoSQL doesn’t use a relational model, it doesn’t have to, which is where document-based stores come in. All data is stored in one table, so there is no need for cross-referencing, and instead of storing information in a table, it is stored in a document. While it is very similar to a key-value store and can sometimes be considered an umbrella for it, the difference is that document-based NoSQL generally has some form of encoding, such as XML.
Image Source: creware.asia
is an XML-specific NoSQL database that uses a document store. In addition, Strider CD uses MongoDB as a backend store.
Column based store
This data model stores information in columns rather than rows, which is more common with SQL. Data is stored in columns that are grouped into families, and these families are further grouped into more columns. This essentially creates an almost unlimited column nesting data model.
Row-oriented vs column-oriented database types.
Image Source: creware.asia
advantage is that it offers incredibly high speeds compared to other models or NoSQL when it comes to searches. In addition, the data is treated as one continuous record, so there is no need to jump across rows or different areas where the information is stored.
Graph-based trading
Graph or network data models consider the relationship between two pieces of information to be as meaningful as the information itself. As such, this data model is really made for any information you would typically represent in a chart. It uses relationships and nodes, where the data is the information itself, and the connection is created between the nodes. Graph NoSQL database
Image Source: creware.asia
How is data stored in NoSQL?
NoSQL data storage depends on what type of database you are using. Because NoSQL does not require a schema, there is no blueprint for how data should be stored and therefore varies between databases. There are generally two ways a NoSQL data store works: On-disk using B-trees, with the upper part persistently in RAM.
In-memory where everything is in RAM using RB-Trees and anything stored on disk is just an attachment.
Schema design for NoSQL
Since NoSQL databases don’t have a set structure, schema development and design usually focus on the physical data model. That means developing for large, horizontally-spanning environments, which is where NoSQL excels. Therefore, the specific peculiarities and problems brought about by scalability are in the foreground.
So the first step is to define the business requirements because optimising access to data is a must and can only be achieved if we know what the business wants to do with the data. Schema design should complement the workflows associated with your use case. There are several ways to choose a primary key, ultimately depending on the users themselves. That being said, some data may indicate a more efficient scheme, especially regarding how often that data is queried.
Note: NoSQL is flexible enough to keep changing. So if you find that the first scheme you designed needs some tweaks to improve user accessibility and provide additional optimisation, that’s not a problem.
NoSQL Data Modelling Techniques
Conceptual Techniques
There are three conceptual techniques for modelling NoSQL data:
- Denormalisation. Denormalisation is a common technique that involves copying data into multiple tables or forms to simplify it. Use denormalisation to easily group all the data you need to query in one place. Unfortunately, this means that the data volume increases for various parameters, considerably increasing the data volume.
- Aggregates. This allows users to create nested entities with complex internal structures and change their specific systems. Ultimately, aggregation limits connections by minimising one-to-one relationships. Most NoSQL data models have some form of this soft schema technique. For example, graph and key-value store databases have values in any format because these data models place no restrictions on the matter.
- Application Side Joins. Since NoSQL databases are question-oriented and join are performed during design time, NoSQL often does not enable joins. Compared to relational databases, this is done when the query is executed. Naturally, this frequently entails a performance penalty and is sometimes unavoidable
General Modelling Techniques
There are five general techniques for modelling NoSQL data:
• Enumerable keys. For the most part, unordered fundamental values are instrumental because items can be distributed across multiple dedicated servers just by hashing the key. Adding some functionality using ordered keys is helpful, although it may add a bit of complexity and a performance hit.
• Dimension reduction. GIS tend to use R-Tree indexes and needs to be updated in place, which can be expensive when dealing with large volumes of data. Another traditional approach is to flatten the 2D structure into a simple list, like what is done with Geohash.
You can use dimensionality reduction to map multidimensional data to simple key-value models or even multifaceted models.
Use dimensionality reduction to map multidimensional data to a key-value model or another non-multidimensional model.
• Index table. With an index table, please take advantage of indexes in stores that don’t necessarily support them internally. Try to create and maintain a unique table with keys following a specific access pattern. For example, a master table to store user accounts for access by user ID.
• Composite key index. Although a general technique, composite keys are handy when using ordered keys. If you take that and combine it with secondary keys, you can create a multidimensional index that is very similar to the dimensionality reduction technique above.
• Inverted Lookup – Direct Aggregation. The concept behind this technique is to use an index that meets a specific set of criteria but then aggregates that data with full scans or some form of the original representation.
This is a data processing model rather than data modelling, yet this pattern certainly influences data models. Note that the random search for records required for this technique is inefficient. Use batch query processing to mitigate this problem.
Hierarchical Modelling Techniques
Image Source: creware.asia
There are seven hierarchy modelling techniques for NoSQL data:
• Tree aggregation. Tree aggregation is essentially modelling data as a single document. This can be effective for any record that is always accessible at once, such as a Twitter thread or a Reddit post. The problem then, of course, is that random access to any single record is inefficient.
• Neighborhood lists. This is a direct technique where nodes are modelled as independent field records with direct ancestors. That’s a fancy way of saying it lets you search for nodes by their parents or children. However, like tree aggregation, it is relatively inefficient for retrieving the entire subtree for any given node.
• Materialized paths. This technique is a denormalisation type used to avoid recursive traversal in tree structures. We primarily want to assign parents or children to each node, which helps us determine the possible ancestors or descendants of the node without worrying about traversal. We can store materialised paths as IDs, either as a set or as a single string.
• Nested sets. A standard technique for tree structures in relational databases applies equally to NoSQL and key-value or document databases. The idea is to store the tree leaves as an array and then map each non-leaf node to a range of leaves using start/end indices.
Modelling this way is efficient for dealing with immutable data because it requires only a tiny amount of memory and doesn’t necessarily use traversal. That said, updates are expensive because they need index updates.
• Merging nested documents: Numeric field names. Most search engines tend to work with records that are a flat list of fields and values rather than something with a complex internal structure. As such, this data modelling technique attempts to map these complex structures onto a simple document, such as mapping documents with a hierarchical structure, which is a common problem you may encounter.
Of course, this work is painful and not easily scalable, especially as the nested structures grow.
• Merging nested documents: Proximity queries. One way to solve potential problems with the numbered field name data modelling technique is to use a similar process called Proximity Queries. These limit the distance between words in a document, which helps increase performance and reduce the impact on query speed.
• Batch processing of graphs. Batch graph processing is an excellent technique for exploring up or down relationships for a node in multiple steps. However, it’s an expensive process and doesn’t necessarily scale well. We can use Message Passing and MapReduce to do this type of graph processing.
Conclusion
NoSQL data modelling techniques are instrumental, especially since many programmers are not necessarily familiar with the flexibility of NoSQL. Furthermore, the specifics differ because NoSQL is not a singular language like SQL but rather a set of database management philosophies. As a result, data modelling techniques vary significantly from database to database.
But don’t let that put you off; learning NoSQL data modelling techniques is very useful, especially when designing a schema for a DBM that doesn’t require any. Even more important is learning to take advantage of the flexibility of NoSQL. You don’t have to worry about schema design details as much as you do with SQL.
Let’s recap what we have learned:
- NoSQL or ‘Not Only SQL’ is a data model that differs significantly from traditional SQL expectations. The primary difference is that NoSQL does not use relational data modelling techniques and emphasizes flexible design.
- There are four different types of NoSQL databases: – Key-Value Store, Document based store, Column Based Store and Graph-Based Store
- All NoSQL data modelling techniques are grouped into three main groups: Conceptual techniques, General modelling techniques, Hierarchical modelling techniques.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Data Analyst who love to drive insights by visualizing the data and extracting the knowledge from it. Automating various tasks using python & builds Real time Dashboard's using tech like React and node.js. Capable of Creaking complex SQL queries to fetch the accurate data.