Understanding Hashing in DBMS: Benefits and Techniques

📙 Definition

Hashing in DBMS is a technique used to improve the efficiency of database operations by providing fast access to data. Hashing involves mapping a large data set to a smaller, fixed-size data set using a hash function. This hash function takes an input (such as a record key) and produces a hash value, which is used to index and locate the corresponding record in the database.

Hashing can significantly improve search, insertion, and deletion operations in databases, as it allows for constant-time access to records. Hashing is commonly used in database indexing, where it is used to improve the speed of searches and queries. Some popular hash functions used in DBMS include the division method, multiplication method, and folding method.

📙 Terms used in Hashing

Here are some important key terms in hashing:

Hash function: A hash function is a mathematical function that takes an input (such as a key) and produces a fixed-size output (the hash value). The hash function should be fast and efficient, and should distribute the hash values uniformly across the available hash table slots.
Hash table: A hash table is a data structure that stores the hash values and pointers to the corresponding data records in the database. A hash table typically consists of an array of slots, where each slot stores a linked list of data records with the same hash value.
Hash collision: A hash collision occurs when two different keys produce the same hash value. Hash collisions can lead to performance degradation and should be minimized using collision resolution techniques.
Collision resolution: Collision resolution techniques are used to resolve hash collisions and allocate the same hash table slot to multiple data records with the same hash value. Some common collision resolution techniques include chaining (using linked lists to store data records with the same hash value), open addressing (probing other slots until an empty slot is found), and rehashing (generating a new hash function to redistribute the data).
Load factor: The load factor is the ratio of the number of data records in the hash table to the number of hash table slots. A high load factor can lead to more collisions and degraded performance, while a low load factor can lead to wasted space in the hash table.
Probing: Probing is a technique used in open addressing collision resolution to search for an empty hash table slot by checking other slots in the hash table. Some common probing techniques include linear probing, quadratic probing, and double hashing.
Perfect hashing: Perfect hashing is a technique used to eliminate collisions completely by generating a hash function that maps each key to a unique hash value. Perfect hashing is commonly used for small, static data sets where the keys are known in advance.

📙 Types of Hashing

Hashing is a technique used in database management systems (DBMS) to efficiently locate and retrieve data from a large collection of records. Hashing involves transforming a search key into an address using a hash function. There are several types of hashing techniques in DBMS, including static hashing, dynamic hashing, linear hashing, and extendible hashing.

♦ Static Hashing

In a database management system (DBMS), hashing is a popular technique used for indexing and retrieving records quickly. Hashing involves generating a unique value or key for each record and then using that key to locate the record in the database.

There are two main types of hashing techniques in DBMS: static hashing and dynamic hashing. Let's take a closer look at each of these techniques and their subtypes.

Static hashing, also known as closed hashing, is a technique where the number of buckets or slots used for hashing is predetermined and fixed. When a new record needs to be inserted into the database, the hash function generates a key for the record, which is then used to determine the bucket where the record should be stored.

✔ What is static hash function ?

In a database management system, a hash function is used to map keys to specific locations in a hash table. A static hash function is a hash function that is fixed and does not change during the lifetime of the hash table. Here are some characteristics of static hash functions:

Fixed hash table size: Static hash functions require a fixed size for the hash table, which is determined at the time of creation. Once the size is set, it cannot be changed without creating a new hash table.
Key distribution: Static hash functions are designed to distribute the keys evenly across the hash table to avoid collisions, which occur when multiple keys are mapped to the same location.
Deterministic: Static hash functions are deterministic, meaning that given the same key and hash table, the function will always return the same location in the table.
Fast: Static hash functions are usually designed to be fast and efficient, as they are often used in real-time applications where quick access to data is important.
Limited flexibility: Static hash functions have limited flexibility in handling changes to the data being stored, as the size of the hash table and the hash function cannot be easily changed.
Examples: Some examples of static hash functions include division method hashing, multiplication method hashing, and folding method hashing.

Overall, static hash functions can be a good choice for certain types of applications where the data is relatively stable and the hash table size can be determined in advance. However, they may not be as suitable for applications where the data changes frequently or where the number of keys to be stored is not known in advance.

There are two types of static hashing:

a) Direct Addressing

Direct addressing is a simple technique where the key of the record is used as an index into an array of buckets. The record is then stored in the bucket corresponding to its key. For example, if we have a database of student records with student IDs as keys, we can use direct addressing to quickly retrieve the record for a given student ID.

b) Division Hashing

Division hashing is a technique where the key of the record is divided by the number of buckets, and the remainder is used as the index into the array of buckets. For example, if we have a database of employee records with employee IDs as keys, and we have 100 buckets, we can use division hashing to quickly retrieve the record for a given employee ID.

♦ Dynamic Hashing

Dynamic hashing, also known as open hashing, is a technique where the number of buckets used for hashing can change dynamically based on the number of records in the database. When a new record needs to be inserted into the database, the hash function generates a key for the record, which is then used to determine the bucket where the record should be stored.

There are two types of dynamic hashing:

a) Extendible Hashing

Extendible hashing is a technique where the hash function generates a binary key for the record, and the first few bits of the key

b) Linear Hashing

Linear hashing is a type of dynamic hashing that uses a series of hash tables to avoid the need for rehashing when the number of records changes. Each hash table consists of a fixed number of buckets, and each bucket contains a linked list of records. When a new record is inserted, the hash function maps the

✔ Open Hashing and Closed Hashing

Open and closed hashing are both subtypes of dynamic hashing, which is a technique used in database management systems to handle large amounts of data that can grow or shrink over time.

Difference between open and closed hashing are -

Dynamic hashing allows for the hash table size to be adjusted as needed, rather than having a fixed size like in static hashing. Open and closed hashing are two different ways of resolving collisions that can occur when two keys hash to the same location in the hash table.

📙 Ordered Indexing vs Hashing

Here are the differences between ordered indexing and hashing -

📙 Collision in hashing

In a database management system that uses hashing, a collision occurs when two or more keys are hashed to the same location in the hash table. Collisions are a common occurrence in hashing, especially when the hash table size is relatively small or the number of keys to be stored is large.

✔ How to handle collision?

There are several techniques for dealing with collisions in hashing:

Separate chaining (open hashing): When a collision occurs, the new key-value pair is added to a linked list that is stored in the same slot in the hash table as the original key. This technique allows multiple keys to be stored in the same location in the hash table without causing a collision.
Open addressing (closed hashing): When a collision occurs, the new key is stored in the next available slot in the hash table. This technique can involve probing to find an empty slot, which can be done using methods such as linear probing, quadratic probing, or double hashing.
Rehashing: If the number of collisions becomes too high, the hash function can be modified to distribute the keys more evenly across the hash table. This process is known as rehashing.
Resizing the hash table: If the number of collisions cannot be reduced through rehashing, the hash table size can be increased to accommodate more keys. This can involve creating a new, larger hash table and rehashing all of the keys from the old table into the new table.

Overall, the technique used for dealing with collisions in hashing depends on the specific requirements of the database system and the characteristics of the data being stored.

Thanks for reading, and happy coding!

Understanding Hashing in DBMS: Benefits and Techniques -> Mastering SQL: Essential Commands for Database Management

Coding Age

Coding is the new literacy

Understanding Hashing in DBMS: Benefits and Techniques

Recent Posts

Book your class today