This article was published as a part of the Data Science Blogathon.
Hashing is the process of mapping keys and values into a hash table by using a hash function. It makes elements more accessible faster. The efficiency of the hash function determines how well it can handle the mapping.
When you have 20000 numbers in a list and you are asked to search for a number from within the list, you will scan each number until you find a match.
The process of searching an entire list for that specific number takes a lot of time. Not only is this manual process of scanning time-consuming, but it is also inefficient. The data structure reduces the search time and allows you to find the number within seconds due to its hashing function.
Key – An Identifier to uniquely identify the Data(Entity).
Value – The Data Entity (with its associated details) that we are storing.
Hashing works in two steps:
The hashing algorithm is designed to solve the problem of efficiently finding and storing data within an array.
For example: A student is an entity. Each student can have a name, address, phone number, unique ID (Roll number).
To make it a key-value pair we take the roll number as the key and the pair will be rest attributes like address, name, phone number, etc. This way we created a mapping of students with roll number as key and basic information as value. Now if we want to know a student’s details we can just know his/her roll number.
You can think of it as an object in OOPS terms, or as a set of properties that all belong to a specific thing.
As shown above, Student data can include information like his Roll Number, his name, his address, and so on.
We can think of all these details as “data” we want to store. The Roll Number is the key since it is unique to each student.
The idea of hashing is used in the world of Data Structures to find the Memory address or Index in which to store this Key-Value pair in a given data structure. We can’t place these data at either 100 or any Student detail if Key=100 and Value=certain Student detail, for example.
As a result, we use the hash function which is able to take a key as input and return a memory address(or an index inside the Data structure) to place a key-value pair at.
Later in this section, we’ll examine how this works internally in detail.
Key-value pairs are stored in hash tables in data structures using hashing.
When storing some data (e.g. Value) we can use a Hash Table as long as we have a unique key.
Basically, the hash function is a mathematical formula that will return a small integer value (within an array size) for certain big keys.
The following are three methods of how this method works internally:
1) Division Method – Among all the methods, this is the easiest to understand. Consider the scenario where there are only 10 cells in an array, and the Key is 15, or greater than 10. If we don’t have a method that takes 15 and returns 10, we need a method that returns an index < 10. This can be done pretty easily by modulus operators.
2) Method for dividing: HashFunction = Key % size
If Key is divided by Size, the remainder is returned by the Modulus Operator here. We can insert this Data at the 5th index of the array of size 10 if key=15. For example, 15%10=5 and 5*10, so if key=15, then we insert it at 5 in the array of size 10.
3) Multiplication Method – Works similarly to Division by using the following function:
h(k) = floor( n( kA mod 1 ) )
The key k will be the constant value between 0 and 1, and A will be any constant value between 0 and 1.
Both k and A are multiplied, and their fractional part is separated using the modulus operator. This is then multiplied with n to get the hash value.
k=123 n=100 A=0.618033 (Has to be between 0 and 1) h(123) = 100 (123 * 0.618033 mod 1) = 100 (76.018059 mod 1) = 100 (0.018059) = 1
The value K=123 can be found at index 1 if the Key is K.
Please note that there can be cases where even after folding, we get a number > Size e.g., Key=56571 then breaking it down into parts of 2= 56+57+1=114
Therefore, this Data cannot be placed at index 114 of the array of size 100, so we can either use the algorithm again or the Division method (most commonly used in such scenarios) to obtain 114%100=14.
Therefore, this Data should be placed at index 14 of this array.
Likewise, if the array size is 1000, then we can insert three-digit numbers, so key=20574 has the value 205+74=279.
In an array of Size 1000, Data with Key=20574 will be at the 279th Index.
In computing, hashing is the process of finding a small Number from a Data set. Using this number, you can determine where to store the data. If the database is stored in memory, this could be the index in an Array or a database location on the disk.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.