top of page

Understanding Normalization in DBMS : Maximizing Data Efficiency



📘 Definition of Normalization


Normalization is the process of organizing data in a database in a way that reduces redundancy and dependency, while ensuring data integrity and consistency. It involves breaking down a large table into smaller, more manageable tables and defining relationships between them based on their functional dependencies. The goal of normalization is to eliminate data anomalies and inconsistencies, improve data efficiency, and simplify the management and querying of data in the database.


📘 Why we need Normalization


Normalization is a technique used in data preprocessing to transform and scale numerical data into a common range. This process has several benefits, including:

  1. Improved Model Performance: Normalization helps to prevent large differences in the ranges of values within features, which can cause issues in machine learning models that rely on distance calculations. By scaling the data to a common range, normalization can help improve the accuracy and stability of these models.

  2. Easier Interpretation: Normalized data is easier to interpret because the range of values is consistent across all features. This makes it easier to compare the relative importance of different features in a dataset.

  3. Improved Convergence: In certain machine learning algorithms, normalization can help improve convergence by preventing numerical overflow or underflow.

  4. Reducing the impact of outliers: Normalization can reduce the impact of outliers by bringing all values within a similar range.

Overall, normalization is an essential preprocessing step for many machine learning applications that involve numerical data.


📘 Types of Normalization


♦ First Normal Form (1NF)


In this form, the table should have no repeating groups or arrays. Each attribute should contain atomic values. For example, let's consider a table of customer orders with repeating fields for the items ordered. We can normalize this table to 1NF by creating a new table for the items and linking them to their respective orders through a foreign key.

Original table:


Normalized tables:


Orders table:

Items table:

Order Items table:


♦ Second Normal Form (2NF)


In this form, the table should be in 1NF and every non-key attribute should depend on the entire primary key. For example, let's consider a table of students and their course grades. We can normalize this table to 2NF by separating the course information into a separate table and linking them to the student grades table through a foreign key.



♦ Third Normal Form (3NF)


In this form, the table should be in 2NF and every non-key attribute should depend only on the primary key and not on any other non-key attribute. For example, let's consider a table of employees and their department information. We can normalize this table to 3NF by separating the department information into a separate table and linking them to the employee table through a foreign key.


Original table:


Normalized tables:


Employees table:

Departments table:

Managers table:

By normalizing the data in these examples, we have eliminated redundancy, minimized data anomalies, and ensured data consistency and integrity, making the management and querying of the data much simpler and more efficient.


♦ Boyce Codd Normal Form (BCNF)


BCNF is a normal form that is more restrictive than 3NF. In BCNF, a table is considered to be in the normal form if every determinant is a candidate key. This means that there should be no non-trivial functional dependencies between any of the candidate keys.


BCNF Example:

Consider the following table of employees:

In this table, there is a functional dependency between the Department ID and Department Name attributes. Both of these attributes are dependent on the Department ID, which is a candidate key. However, the Department Name attribute is also dependent on the Department ID attribute, which is not a candidate key.

To bring this table into BCNF, we need to split it into two tables: one for employees and one for departments, as follows:

Employees table:

Departments table:

Now, each table has only one candidate key and no non-trivial functional dependencies between them.


♦ Fourth Normal Form (4NF)


4NF is a normal form that is even more restrictive than BCNF. It is designed to eliminate a specific type of redundancy known as multi-valued dependency (MVD). A multi-valued dependency occurs when there is a relationship between non-key attributes that is not fully dependent on the primary key.


4NF Example:

Consider the following table of students and their courses:


In this table, there is a multi-valued dependency between the Student Name and Course Name attributes. The Student Name attribute is not fully dependent on the Student ID attribute, and the Course Name attribute is not fully dependent on the Course ID attribute.


To bring this table into 4NF, we need to split it into three tables: one for students, one for courses, and one for the relationship between students and courses, as follows:

Students table:

Courses table:

Student_Courses table:

Now, there is no longer any redundancy in the data, and each table has only one candidate key.


♦ Fifth Normal Form (5NF)


Fifth normal form (5NF) is a level of database normalization that ensures that all dependencies between data in a table are logically necessary and irreducible. In other words, 5NF requires that all of the relationships between the data in a table are based on a unique set of keys, and that there are no redundancies or unnecessary relationships between the data.


5NF Example:

Consider the following table of products and their attributes:


5th normal form (5NF) is a level of database normalization that ensures that all dependencies between data in a table are logically necessary and irreducible. This means that every piece of data in a table can be uniquely identified by a combination of one or more keys, and there are no redundant or unnecessary relationships between the data.

Now, let's consider the following table of products and their attributes:


This table is not in 5NF because it contains a multi-valued dependency. Specifically, the product attributes are dependent on the product ID, but each attribute is not dependent on the other attributes.

To bring this table to 5NF, we could break it down into several smaller tables. One possible solution is:



In this solution, we have eliminated the multi-valued dependency by creating separate tables for each attribute. Each table has a unique key (the combination of the product ID and attribute), and all dependencies are logically necessary and irreducible. This design is now in 5NF.


Thanks for reading, and happy coding!


Understanding Normalization in DBMS : Maximizing Data Efficiency -> Understanding Data Independence in DBMS: Importance and Benefits




bottom of page