Skip to main content

Normalization in DBMS

Anomalies in Databases​

Databases often face issues known as anomalies, which can lead to data inconsistencies and integrity problems. Normalization helps resolve these anomalies by organizing data efficiently.

Types of Anomalies​

  • Insertion Anomaly: Difficulty in adding new data due to missing other related information.

  • Update Anomaly: Changes in one part of the database need to be updated in multiple places, leading to potential inconsistencies.

  • Deletion Anomaly: Removing data unintentionally deletes other valuable data.

Let's illustrate these anomalies using an example:

Unnormalized Table​

StudentIDStudentNameMajorCourseNoCourseNameInstructorNameGrade
101AliceCS1MathMr. SmithA
102BobEE1MathMr. SmithB
103CharlieCS2EnglishMrs. JohnsonC
104DavidME3PhysicsMr. BrownA
105EveEE3PhysicsMr. BrownB
  • Insertion Anomaly: Adding a new course without enrolled students is not possible without adding student data.

  • Update Anomaly: Changing Mr. Brown's name requires updates in multiple rows, which can lead to inconsistencies if not done correctly.

  • Deletion Anomaly: If Charlie drops English, deleting his row would also remove information about the English course and Mrs. Johnson.

Normalization​

Normalization is the process of structuring a relational database to minimize redundancy and improve data integrity. It involves organizing data into multiple related tables.

First Normal Form (1NF)​

Condition: Each cell should contain only atomic (indivisible) values.

Below is an example of a table not in 1NF:

StudentIDStudentNamePhoneNo
101Alice9090909090, 8080808080

To convert it to 1NF:

StudentIDStudentNamePhoneNo
101Alice9090909090
101Alice8080808080

Second Normal Form (2NF)​

Condition: The table should be in 1NF, and all non-key attributes must be fully dependent on the primary key.

In our original example, the InstructorName depends only on CourseNo, not on both StudentID and CourseNo, indicating a partial dependency.

To resolve this:

Student Table

StudentIDStudentNameMajor
101AliceCS
102BobEE
103CharlieCS
104DavidME
105EveEE

Course Table

CourseNoCourseNameInstructorNo
1Math1
2English2
3Physics3

Instructor Table

InstructorNoInstructorName
1Mr. Smith
2Mrs. Johnson
3Mr. Brown

Grade Table

StudentIDCourseNoGrade
1011A
1021B
1032C
1043A
1053B

Third Normal Form (3NF)​

Condition: The table should be in 2NF, and there should be no transitive dependency (non-key attributes should not depend on other non-key attributes).

Our tables are already in 3NF. But to demonstrate 3NF, let's say, we had a table like below.

Grade Table

StudentIDCourseNoMarksTotal MarksExam Type
101180100Theory
10214050Practical
10324050Practical
104385100Theory
105390100Theory

Here, a non-prime column Total Marks is dependent on another non-prime column Exam Type. In such cases, we would take both of them and make another table (Exam) and use their primary key ExamId wherever required.

Boyce-Codd Normal Form (BCNF)​

Condition: A table is in BCNF if it is in 3NF, and every determinant is a candidate key.

Our tables already meet BCNF standards. However, consider an Employee table:

Employee Table

EmployeeIDNameEmailDeskNo
101Tom[email protected]3
102Bob[email protected]4

Here, DeskNo is a non-key attribute but uniquely identifies EmployeeID, which is not allowed in BCNF.

To correct this: Split into two tables: Employee (EmployeeID, Name, Email) and DeskAssignment (EmployeeID, DeskNo).

Denormalization​

Denormalization involves combining normalized tables to improve read performance at the expense of write performance and storage efficiency. This technique is often used in data warehousing and reporting systems.

Normalized Tables​

Customer Table

CustomerIdName
101Tom
102Bob

Order Table

OrderIdDateAmountItemCustomerId
12021-01-0170000Iphone101
22021-02-0250000Ipad102

If we need some data from Customer Table and some from Order Table, we need to perform join operation like below.

SQL
SELECT Order.OrderId, Customer.Name, Order.Date, Order.Amount
FROM Order
JOIN Customer ON Order.CustomerId = Customer.CustomerId;

Denormalized​

Order Table

OrderIdCustomerIdNameDateAmountItem
1101Tom2021-01-0170000Iphone
2102Bob2021-02-0250000Ipad

Benefits​

Improved Read Performance: By combining tables, the need for joins is eliminated, which can significantly speed up read operations, especially for complex queries or large datasets.

Simplified Queries: Queries become simpler because they no longer need to join multiple tables, making it easier to write and understand SQL statements.

Reduced Latency: With fewer joins, there's less processing required to fetch the data, reducing the overall query latency.

Trade-offs​

Increased Redundancy: Data is duplicated, as customer information is repeated in every order row. This increases storage requirements.

Higher Risk of Inconsistency: Updates to customer information must be propagated to all rows where that customer appears. This can lead to anomalies if updates are not handled correctly.

More Complex Updates: Any change to the customer data (e.g., address change) must be updated in multiple rows, making update operations more complex and time-consuming.

Note

Denormalization is a strategic decision based on specific use cases, especially where read performance is more critical than write performance and storage efficiency. It’s commonly used in data warehousing, reporting systems, and applications with heavy read operations.