Normalization in DBMS

Anomalies in Databases

Databases often face issues known as anomalies, which can lead to data inconsistencies and integrity problems. Normalization helps resolve these anomalies by organizing data efficiently.

Types of Anomalies

Insertion Anomaly: Difficulty in adding new data due to missing other related information.
Update Anomaly: Changes in one part of the database need to be updated in multiple places, leading to potential inconsistencies.
Deletion Anomaly: Removing data unintentionally deletes other valuable data.

Let's illustrate these anomalies using an example:

Unnormalized Table

StudentID	StudentName	Major	CourseNo	CourseName	InstructorName	Grade
101	Alice	CS	1	Math	Mr. Smith	A
102	Bob	EE	1	Math	Mr. Smith	B
103	Charlie	CS	2	English	Mrs. Johnson	C
104	David	ME	3	Physics	Mr. Brown	A
105	Eve	EE	3	Physics	Mr. Brown	B

Insertion Anomaly: Adding a new course without enrolled students is not possible without adding student data.
Update Anomaly: Changing Mr. Brown's name requires updates in multiple rows, which can lead to inconsistencies if not done correctly.
Deletion Anomaly: If Charlie drops English, deleting his row would also remove information about the English course and Mrs. Johnson.

Normalization

Normalization is the process of structuring a relational database to minimize redundancy and improve data integrity. It involves organizing data into multiple related tables.

First Normal Form (1NF)

Condition: Each cell should contain only atomic (indivisible) values.

Below is an example of a table not in 1NF:

StudentID	StudentName	PhoneNo
101	Alice	9090909090, 8080808080

To convert it to 1NF:

StudentID	StudentName	PhoneNo
101	Alice	9090909090
101	Alice	8080808080

Second Normal Form (2NF)

Condition: The table should be in 1NF, and all non-key attributes must be fully dependent on the primary key.

In our original example, the InstructorName depends only on CourseNo, not on both StudentID and CourseNo, indicating a partial dependency.

To resolve this:

Student Table

StudentID	StudentName	Major
101	Alice	CS
102	Bob	EE
103	Charlie	CS
104	David	ME
105	Eve	EE

Course Table

CourseNo	CourseName	InstructorNo
1	Math	1
2	English	2
3	Physics	3

Instructor Table

InstructorNo	InstructorName
1	Mr. Smith
2	Mrs. Johnson
3	Mr. Brown

Grade Table

StudentID	CourseNo	Grade
101	1	A
102	1	B
103	2	C
104	3	A
105	3	B

Third Normal Form (3NF)

Condition: The table should be in 2NF, and there should be no transitive dependency (non-key attributes should not depend on other non-key attributes).

Our tables are already in 3NF. But to demonstrate 3NF, let's say, we had a table like below.

Grade Table

StudentID	CourseNo	Marks	Total Marks	Exam Type
101	1	80	100	Theory
102	1	40	50	Practical
103	2	40	50	Practical
104	3	85	100	Theory
105	3	90	100	Theory

Here, a non-prime column Total Marks is dependent on another non-prime column Exam Type. In such cases, we would take both of them and make another table (Exam) and use their primary key ExamId wherever required.

Boyce-Codd Normal Form (BCNF)

Condition: A table is in BCNF if it is in 3NF, and every determinant is a candidate key.

Our tables already meet BCNF standards. However, consider an Employee table:

Employee Table

EmployeeID	Name	Email	DeskNo
101	Tom	`[email protected]`	3
102	Bob	`[email protected]`	4

Here, DeskNo is a non-key attribute but uniquely identifies EmployeeID, which is not allowed in BCNF.

To correct this: Split into two tables: Employee (EmployeeID, Name, Email) and DeskAssignment (EmployeeID, DeskNo).

Denormalization

Denormalization involves combining normalized tables to improve read performance at the expense of write performance and storage efficiency. This technique is often used in data warehousing and reporting systems.

Normalized Tables

Customer Table

CustomerId	Name
101	Tom
102	Bob

Order Table

OrderId	Date	Amount	Item	CustomerId
1	2021-01-01	70000	Iphone	101
2	2021-02-02	50000	Ipad	102

If we need some data from Customer Table and some from Order Table, we need to perform join operation like below.

SQL
SELECT Order.OrderId, Customer.Name, Order.Date, Order.Amount
FROM Order
JOIN Customer ON Order.CustomerId = Customer.CustomerId;

Denormalized

Order Table

OrderId	CustomerId	Name	Date	Amount	Item
1	101	Tom	2021-01-01	70000	Iphone
2	102	Bob	2021-02-02	50000	Ipad

Benefits

Improved Read Performance: By combining tables, the need for joins is eliminated, which can significantly speed up read operations, especially for complex queries or large datasets.

Simplified Queries: Queries become simpler because they no longer need to join multiple tables, making it easier to write and understand SQL statements.

Reduced Latency: With fewer joins, there's less processing required to fetch the data, reducing the overall query latency.

Trade-offs

Increased Redundancy: Data is duplicated, as customer information is repeated in every order row. This increases storage requirements.

Higher Risk of Inconsistency: Updates to customer information must be propagated to all rows where that customer appears. This can lead to anomalies if updates are not handled correctly.

More Complex Updates: Any change to the customer data (e.g., address change) must be updated in multiple rows, making update operations more complex and time-consuming.

Note

Denormalization is a strategic decision based on specific use cases, especially where read performance is more critical than write performance and storage efficiency. It’s commonly used in data warehousing, reporting systems, and applications with heavy read operations.

Normalization in DBMS

Anomalies in Databases​

Types of Anomalies​

Unnormalized Table​

Normalization​

First Normal Form (1NF)​

Second Normal Form (2NF)​

Third Normal Form (3NF)​

Boyce-Codd Normal Form (BCNF)​

Denormalization​

Normalized Tables​

Denormalized​

Benefits​

Trade-offs​