Normalization and its Types: 1NF, 2NF, 3NF, and BCNF

Summary: In this tutorial, we will learn what is normalization and what is its purpose in DBMS. In addition, we will also look at different types of normal forms (such as 1NF, 2NF, 3NF, and BCNF) and learn how to normalize data or tables in a SQL database.

What is Normalization in DBMS?

Normalization in DBMS is the process of structuring a database, usually a relational database, in such a way that there is minimum data redundancy and better data integrity.

In normalization, the tables of relational databases are structured according to a series of so-called normalization forms such as 1NF, 2NF, 3NF, etc, so that the data redundancies and anomalies do not occur.

If the database is not normalized and its tables have data redundancies (have multiple copies of the same data), it may cause anomalies in the DBMS.

Anomalies in DBMS

Anomalies in DBMS are any undesirable side-effects that arise in a database, that have not been sufficiently normalized, when an attempt is made to modify (update, insert, or delete) a relation.

There are three types of anomalies that occur when a database is not normalized. These are: update anomaly, insert anomaly and, delete anomaly.

Example of Update, Delete and, Insert Anomalies

For example, suppose there is a ‘user_details’ table in a SQL database that contains the following records:

id	name	dob	zip	city	language
1	Harry	15/05/1995	563503	NY	English
1	Harry	15/05/1995	563503	NY	Korean
2	Sam	01/12/2000	563503	NY	English
3	Rock	15/05/1995	856246	DL	Japanese Hindi

user_details

Clearly, the table has a lot of duplicate values and is not normalized. If we make any attempt to modify the above table the following anomalies may occur:

Update Anomaly: The above table has two rows for Harry. If we have to update the zip of Harry, we need to update both the rows for the data to be consistent. If by chance only one row is updated then Harry will have two different zip values in the same database, which is inconsistent.

Insert Anomaly: Every row of the table has non-nullable values for zip and city. If we have to insert a new user he should have an associated zip and city otherwise, we may not be able to insert him in the database table.

Delete Anomaly: The Rock in the above table is the only row that has ‘DL’ as a city. If we delete the row, we will also lose the only available information about the city ‘DL’.

To prevent such anomalies from happening, we need to normalize the table into suitable normal forms.

Types of Normal Forms

These are some of the normal forms which are enough to have a well-normalized database:

Normal Form	Description
1NF	A table is in 1NF if every cell of the table has an atomic value (cells cannot have tables as values).
2NF	A table is in 2NF if it is in 1NF and has no partial functional dependencies of non-prime attributes on candidate keys.
3NF	A table is in 3NF if it is in 2NF and has no transitive functional dependencies of non-prime attributes on candidate keys.
BCNF (3.5NF)	A table is in BCNF if it is in 3NF and every non-trivial functional dependency begins with a superkey.
4NF	A table is in 4NF if it is in BCNF and every non-trivial multivalued dependency begins with a superkey.

To know more advanced normal forms, refer here.

First Normal Form (1NF)

The last row of the ‘user_details’ table contains two values for the language column:

id	name	dob	zip	city	language
1	Harry	15/05/1995	563503	NY	English
1	Harry	15/05/1995	563503	NY	Korean
2	Sam	01/12/2000	563503	NY	English
3	Rock	15/05/1995	856246	DL	Japanese Hindi

user_details

The table is clearly is not in 1NF as the cell doesn’t have an atomic value.

To transform the table in 1NF, we have to make the cell atomic as follows:

id	name	dob	zip	city	language
1	Harry	15/05/1995	563503	NY	English
1	Harry	15/05/1995	563503	NY	Korean
2	Sam	01/12/2000	563503	NY	English
3	Rock	15/05/1995	856246	DL	Japanese
3	Rock	15/05/1995	856246	DL	Hindi

user_details

Second Normal Form (2NF)

The table is in 1NF but not in 2NF because the table has partial functional dependencies:

partial functional dependencies: id → name, id → dob

id	language	name	dob	zip	city
1	English	Harry	15/05/1995	563503	NY
1	Korean	Harry	15/05/1995	563503	NY
2	English	Sam	01/12/2000	563503	NY
3	Japanese	Rock	15/05/1995	856246	DL
3	Hindi	Rock	15/05/1995	856246	DL

table: user_details
candidate key: {id, language}
non-prime attributes: name, dob, zip, city

The non-prime attributes (name and dob) have functional dependencies on id, which is the proper subset of the candidate key.

To convert the table to 2NF, we must have a separate table for the name and dob column.

id	language	zip	city
1	English	563503	NY
1	Korean	563503	NY
2	English	563503	NY
3	Japanese	856246	DL
3	Hindi	856246	DL

user_details

id	name	dob
1	Harry	15/05/1995
2	Sam	01/12/2000
3	Rock	15/05/1995

user_names

Third Normal Form (3NF)

The ‘user_details’ table is now in 2NF but not in 3NF. It is because it has a transitive functional dependency:

transitive functional dependency: {id, language} → zip, zip → city

id	language	zip	city
1	English	563503	NY
1	Korean	563503	NY
2	English	563503	NY
3	Japanese	856246	DL
3	Hindi	856246	DL

table: user_details
candidate_key: {id, language}
non-prime attributes: zip, city

id	name	dob
1	Harry	15/05/1995
2	Sam	01/12/2000
3	Rock	15/05/1995

user_names

The columns in a candidate key are called prime attributes, and a column that does not occur in any candidate key is called a non-prime attribute.

The table has a transitive functional dependency because a non-prime attribute (zip) is determining another non-prime attribute (city).

To make the table in accordance with 3NF, we have to separate zip and city from the user_details table as follows.

id	language	zip
1	English	563503
1	Korean	563503
2	English	563503
3	Japanese	856246
3	Hindi	856246

user_zips

zip	city
563503	NY
856246	DL

zips

Boyce Codd Normal Form (BCNF)

For the table to be in BCNF, every non-trivial functional dependency should begin with a super key.

A dependency X → Y is non-trivial if Y is not a subset of X.

There is no table that we have normalized so far that have a dependency that does not start with a super key:

user_zips: {id, language} → zip
zips: {zip} → city
user_names: {id} → name, {id} → dob

Therefore all the above tables are in BCNF.

Fourth Normal Form (4NF)

A table or relation is in 4NF if, for every one of its non-trivial multivalued dependencies X $\twoheadrightarrow$ Y, X is a superkey.

A multivalued dependency exists when there are at least three attributes (like X,Y and Z) in a relation, Y and Z are dependent on X, however, the set of values of Y is independent of set Z and vice versa.
Wikipedia

In the following ‘user_zips’ table because the language is attached to the id and the zip is attached to the id are independent of each other, the table has a multivalued dependency.

id	language	zip
1	English	563503
1	Korean	563503
2	English	563503
3	Japanese	856246
3	Hindi	856246

user_zips

If we were to add a new language for id=1, we would have to add one record for each of the zip on that id, and vice versa.

Overall, there are two multivalued dependencies on this table:

id $\twoheadrightarrow$ language
id $\twoheadrightarrow$ zip

Similarly, the ‘user_names’ table has two multivalued dependencies:

id $\twoheadrightarrow$ name
id $\twoheadrightarrow$ dob

id	name	dob
1	Harry	15/05/1995
2	Sam	01/12/2000
3	Rock	15/05/1995

user_names

However, in the ‘user_names’ table, the multivalue dependency begins with a superkey, whereas in the ‘user_zips’ table the dependency begins from id which is not a super key but a subset of the super key.

To make the ‘user_zips’ table in 4NF, we have to separate the id $\twoheadrightarrow$ language and id $\twoheadrightarrow$ zip into two different tables as follows:

id	language
1	English
1	Korean
2	English
3	Japanese
3	Hindi

user_languages

id	zip
1	563503
2	563503
3	856246

user_zips

Conclusion

Normalization in DBMS is the process to structure relation or talbes in a database in such a way such that there is minimum data redundancies and improved data integrity.

The objectives of normalization beyond 1NF (first normal form) as stated by Codd is as follows:

To free the database from undesirable insert, update and delete anomalies.
To reduce the need for restructuring of tables, as new types of data are introduced.
To make the relational model more informative to users.

What is Normalization in DBMS?

Anomalies in DBMS

Example of Update, Delete and, Insert Anomalies

Types of Normal Forms

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

Conclusion

Leave a Reply Cancel reply