This blog explains the basics of database normalization and why it is important for organizing and improving relational databases.It focuses on key ideas like reducing duplicate data, keeping data accurate, and arranging data in clear and efficient ways using normalization steps. With easy-to-follow examples and explanations, this guide is great for database managers, developers, and anyone interested in building strong, scalable, and easy-to-manage database systems.
Database normalization is a way to organize data into tables so that the database functions smoothly and gives accurate results. It focuses on reducing repeated data and ensuring it remains consistent and correct.
Normalization usually involves splitting data into smaller, connected tables and creating links between them. Though this might cause some data repetition, it removes errors, makes data management easier, and simplifies updates.
Why Database Normalization is Important?
Eliminates Redundant Data: Normalization reduces duplicate data by dividing information into smaller, related tables, making it more efficient.
Boosts Query Performance: Smaller, organized tables resulting from normalization allow quicker and more effective query execution.
Prevents Update Anomalies: Normalized tables allow easier data updates without impacting other records.
Ensures Data Integrity: It ensures data remains consistent, accurate, and reliable across the database.
Normal Forms in DBMSNormal forms are a set of guidelines used to design efficient, well-organized databases. These guidelines help ensure that data is stored logically, reducing redundancy and preventing potential issues such as inconsistencies or anomalies. Each level of normalization builds upon the previous one.
There are 5 major types of norms, they are as follows:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
First Normal Form (1NF)1. Ensure Each Column Contains Single Values: Each column in your table must hold single values. This ensures no cell contains multiple values.
2. Avoid Mixing Data Types in Columns:
Each value in a table column must adhere to the same data type or category. The table should not contain random or inconsistent values.
3. Columns Must Have Unique Names:
Each column in a table must have a unique name to avoid confusion during data operations. Duplicate column names can cause errors in the system. 4. No Duplicated Rows:
Duplicate rows in a table should be avoided to maintain data integrity and reliability. Each row should represent a unique record or entity.
Example: Retail Inventory System
Before 1NF:
In a table, the products column stores multiple values for each row. This means one cell can contain more than one item, which makes it difficult to manage and query the data efficiently.
After 1NF:
After applying 1NF, each column contains only one value, and each row represents one piece of information, meaning an individual product in an order.
Second Normal Form (2NF):1. Table in First Normal Form (1NF): The table must be in 1NF, and it must have a primary key.
2. No Partial Dependency:
It should not have partial dependency, meaning that all non-key columns must depend on the entire primary key. If the primary key consists of multiple columns, then every non-key column should rely on all those columns together, not just one or a few of them.
Example: Order Information for Customers
Before 2NF:
· This table has repeated data, such as the Customer_Name, causing redundancy. While it is in (1NF) because each column contains atomic values, it isn't in 2NF.
· Customer_Name is a non-prime attribute that depends on the Customer_ID, not the whole composite key (Order_ID, Product_ID). This is a partial dependency because Customer_Name is only partially dependent on the primary key. It should only be tied to Customer_ID but not to Order_ID or Product_ID.
· Product_Name and Product_Price are non-prime attributes, and they depend on Product_ID, not the entire primary key. These attributes describe the product and should only be related to Product_ID, not to Order_ID or both Order_ID and Product_ID. This is another partial dependency.
After 2NF: Orders Table: Customers Table: Products Table:
· One table for Customer Information (to remove the dependency of Customer_Name on the Order_ID).
· One table for Product Information (to remove the dependency of Product_Name and Product_Price on Product_ID).
· Orders Table links Customer_ID and Product_ID, and the order information is stored without any redundancy.
· All non-key attributes in the Orders Table are fully dependent on the Order_ID (which is the primary key), and there is no partial dependency.
Third Normal Form (3NF):1. Table in 1NF and 2NF: The table should have a primary key, and all data entries should be atomic, and there should be no partial dependency. Every non-key attribute must be fully dependent on the primary key.
2. No Transitive Dependencies:
Any non-key attribute that depends on another non-key attribute must be removed. This ensures that all non-key attributes depend only on the primary key. Example: University Database
Before 3NF:
The instructor's name ‘Prakash’ is repeated for every student enrolled in the Math course. If ‘Prakash’ changes his name, we need to update all rows where ‘Prakash’ is listed as the instructor. If we want to add a new course, we would have to repeat instructor information for every student enrolled.
After 3NF: Student Table: Course Table: Instructor Table: Enrollment Table:
· The Student Table stores student details.
· The Course Table stores course details.
· The Instructor Table stores instructor details. Instructor information is stored only once per course and, if required, is updated in one place.
· The Enrollment Table keeps track of which student is enrolled in which course and the associated instructor.
By applying 3NF, redundancy is removed, and it is ensured that all attributes are functionally dependent only on the primary key, with no transitive dependencies.
Boyce-Codd Normal Form (BCNF):
Boyce-Codd Normal Form (BCNF) is a higher version of the Third Normal Form (3NF) in database design.
Key Concepts of BCNF: · Functional Dependency: Relationship between attributes such that one attribute depends on another. For example, in a table, if column A determines column B, we write this as A → B.
· Candidate Key: A minimal set of attributes that can uniquely identify a record in a table. For example, if a combination of columns (A and B) can uniquely identify records, then (A, B) is a candidate key.
· Super Key: A super key is a set of attributes that can uniquely identify a record in the table.
BCNF Condition:
· It should be in 3NF.
· For every functional dependency X→Y, X must be a super key (which means X is a key or a combination of attributes that uniquely identifies rows in a table).
Example: University Database
Before BCNF: · Student_ID, CourseID → Instructor, the instructor depends on both the student and the course.
· Course_ID → Instructor, the instructor depends only on the course.
After BCNF: Student-Courses Table: Course-Instructor Table: CourseID Instructor Table:
· Student-Courses Table will store student-course combinations, as Student_ID and Course_ID together form a unique combination (super key).
· Course-Instructor Table eliminates the dependency Course_ID → Instructor. Here, Course_ID is a super key/primary key, and the instructor depends on it.
Fourth Normal Form (4NF): 1. Table in BCNF: The table should be in BCNF and have no non-trivial functional dependencies where a non-super key determines a key attribute.
2. No Multi-Valued Dependencies:
It should not have multi-valued dependencies, which means a single attribute can have multiple independent values for another attribute without depending on the rest of the data in the table. Example: Online Store Database
Before 4NF:
For Product_ID -101, the suppliers (Supplier A, Supplier B) are independent of the regions (North America, Europe). The table contained unnecessary combinations of suppliers and regions, leading to creating unnecessary rows and data anomalies.
After 4NF: Product_Supplier Table:
Product_ID Supplier:
· Splitting the table into two tables, Product_Supplier Table and Product_Region Table. Therefore, two independent relationships (Product-Supplier and Product-Region) were split into separate tables, eliminating multi-valued dependencies.
· Insertion, deletion, and update anomalies are resolved here.
Apart from the normal forms mentioned above, we also have the Fifth Normal Form (5NF) and the Sixth Normal Form (6NF), which are high-level normalizations.
· The Fifth Normal Form (5NF) ensures that a table is decomposed into smaller tables to eliminate redundancy while preserving all possible dependencies. It deals with complex, multi-valued dependencies.
· The Sixth Normal Form (6NF), which is rarely used, focuses on handling time-dependent data by breaking it into minimal structures.
ConclusionIn conclusion, relational databases follow specific rules called normal forms (1NF, 2NF, 3NF, BCNF, 4NF) to reduce unnecessary duplicate data and keep the data accurate. Each normal form fixes different problems with the data, improving upon the previous one. The choice of normal form depends on the type of data you have. While higher normal forms provide better data accuracy, they can make the database structure more complex.
Banuprakash Vellingiri
Associate Data Engineer
Since joining Ignitho in July 2024, Banuprakash has leveraged his skills in data engineering and data management. His role involves contributing to a digital marketing project, where he has gained experience in Python and various AWS services, including Athena, Lambda, and Amazon S3. Banuprakash focuses on building robust and scalable data pipelines by designing, and implementing ETL workflows and ensuring data quality.
This website uses cookies to understand your preferences, improve your experience, and gather analytics, in line with GDPR. Learn more or adjust your preferences in our Privacy Policy.
Your daily dose of the Tech world
Don't miss out on the latest tech feeds from the best Digital, Innovation & Software Practitioners across the globe.