Summary: In this tutorial, we will learn what is normalization and what is its purpose in DBMS. In addition, we will also look at different types of normal forms (such as 1NF, 2NF, 3NF, and BCNF) and learn how to normalize data or tables in a SQL database.

What is Normalization in DBMS?

Normalization in DBMS is the process of structuring a database, usually a relational database, in such a way that there is minimum data redundancy and better data integrity.

In normalization, the tables of relational databases are structured according to a series of so-called normalization forms such as 1NF, 2NF, 3NF, etc, so that the data redundancies and anomalies do not occur.

If the database is not normalized and its tables have data redundancies (have multiple copies of the same data), it may cause anomalies in the DBMS.

Anomalies in DBMS

Anomalies in DBMS are any undesirable side-effects that arise in a database, that have not been sufficiently normalized, when an attempt is made to modify (update, insert, or delete) a relation.

There are three types of anomalies that occur when a database is not normalized. These are: update anomaly, insert anomaly and, delete anomaly.

Example of Update, Delete and, Insert Anomalies

For example, suppose there is a ‘user_details’ table in a SQL database that contains the following records:

idnamedobzipcitylanguage
1Harry15/05/1995563503NYEnglish
1Harry15/05/1995 563503NYKorean
2Sam01/12/2000563503NYEnglish
3Rock 15/05/1995 856246 DLJapanese
Hindi
user_details

Clearly, the table has a lot of duplicate values and is not normalized. If we make any attempt to modify the above table the following anomalies may occur:

Update Anomaly: The above table has two rows for Harry. If we have to update the zip of Harry, we need to update both the rows for the data to be consistent. If by chance only one row is updated then Harry will have two different zip values in the same database, which is inconsistent.

Insert Anomaly: Every row of the table has non-nullable values for zip and city. If we have to insert a new user he should have an associated zip and city otherwise, we may not be able to insert him in the database table.

Delete Anomaly: The Rock in the above table is the only row that has ‘DL’ as a city. If we delete the row, we will also lose the only available information about the city ‘DL’.

To prevent such anomalies from happening, we need to normalize the table into suitable normal forms.

Types of Normal Forms

These are some of the normal forms which are enough to have a well-normalized database:

Normal FormDescription
1NFA table is in 1NF if every cell of the table has an atomic value (cells cannot have tables as values).
2NFA table is in 2NF if it is in 1NF and has no partial functional dependencies of non-prime attributes on candidate keys.
3NFA table is in 3NF if it is in 2NF and has no transitive functional dependencies of non-prime attributes on candidate keys.
BCNF (3.5NF)A table is in BCNF if it is in 3NF and every non-trivial functional dependency begins with a superkey.
4NFA table is in 4NF if it is in BCNF and every non-trivial multivalued dependency begins with a superkey.

To know more advanced normal forms, refer here.

First Normal Form (1NF)

The last row of the ‘user_details’ table contains two values for the language column:

idnamedobzipcitylanguage
1Harry15/05/1995563503NYEnglish
1Harry15/05/1995 563503NYKorean
2Sam01/12/2000563503NYEnglish
3Rock 15/05/1995 856246 DLJapanese
Hindi
user_details

The table is clearly is not in 1NF as the cell doesn’t have an atomic value.

To transform the table in 1NF, we have to make the cell atomic as follows:

idnamedobzipcitylanguage
1Harry 15/05/1995 563503 NY English
1Harry15/05/1995563503NYKorean
2Sam01/12/2000563503NYEnglish
3Rock15/05/1995 856246DL Japanese
3Rock15/05/1995 856246DL Hindi
user_details

Second Normal Form (2NF)

The table is in 1NF but not in 2NF because the table has partial functional dependencies:

partial functional dependencies: id → name, id → dob

idlanguagenamedobzipcity
1EnglishHarry 15/05/1995 563503 NY
1KoreanHarry15/05/1995563503NY
2EnglishSam01/12/2000563503NY
3JapaneseRock15/05/1995856246DL
3HindiRock15/05/1995856246DL
table: user_details
candidate key: {id, language}
non-prime attributes: name, dob, zip, city

The non-prime attributes (name and dob) have functional dependencies on id, which is the proper subset of the candidate key.

To convert the table to 2NF, we must have a separate table for the name and dob column.

idlanguagezipcity
1English563503 NY
1Korean563503NY
2English563503NY
3Japanese856246DL
3Hindi856246DL
user_details
idnamedob
1Harry15/05/1995
2Sam01/12/2000
3Rock15/05/1995
user_names

Third Normal Form (3NF)

The ‘user_details’ table is now in 2NF but not in 3NF. It is because it has a transitive functional dependency:

transitive functional dependency: {id, language} → zip, zip → city

idlanguagezipcity
1English563503 NY
1Korean563503NY
2English563503NY
3Japanese856246DL
3Hindi856246DL
table: user_details
candidate_key: {id, language}
non-prime attributes: zip, city
idnamedob
1Harry15/05/1995
2Sam01/12/2000
3Rock15/05/1995
user_names

The columns in a candidate key are called prime attributes, and a column that does not occur in any candidate key is called a non-prime attribute.

The table has a transitive functional dependency because a non-prime attribute (zip) is determining another non-prime attribute (city).

To make the table in accordance with 3NF, we have to separate zip and city from the user_details table as follows.

idlanguagezip
1English563503
1Korean563503
2English563503
3Japanese856246
3Hindi856246
user_zips
zipcity
563503NY
856246DL
zips

Boyce Codd Normal Form (BCNF)

For the table to be in BCNF, every non-trivial functional dependency should begin with a super key.

A dependency X → Y is non-trivial if Y is not a subset of X.

There is no table that we have normalized so far that have a dependency that does not start with a super key:

  • user_zips: {id, language} → zip
  • zips: {zip} → city
  • user_names: {id} → name, {id} → dob

Therefore all the above tables are in BCNF.

Fourth Normal Form (4NF)

A table or relation is in 4NF if, for every one of its non-trivial multivalued dependencies X\twoheadrightarrow YX is a superkey.

A multivalued dependency exists when there are at least three attributes (like X,Y and Z) in a relation, Y and Z are dependent on X, however, the set of values of Y is independent of set Z and vice versa.

Wikipedia

In the following ‘user_zips’ table because the language is attached to the id and the zip is attached to the id are independent of each other, the table has a multivalued dependency.

idlanguagezip
1English563503
1Korean563503
2English563503
3Japanese856246
3Hindi856246
user_zips

If we were to add a new language for id=1, we would have to add one record for each of the zip on that id, and vice versa.

Overall, there are two multivalued dependencies on this table:

  • id \twoheadrightarrow language
  • id \twoheadrightarrow zip

Similarly, the ‘user_names’ table has two multivalued dependencies:

  • id \twoheadrightarrow name
  • id \twoheadrightarrow dob
idnamedob
1Harry15/05/1995
2Sam01/12/2000
3Rock15/05/1995
user_names

However, in the ‘user_names’ table, the multivalue dependency begins with a superkey, whereas in the ‘user_zips’ table the dependency begins from id which is not a super key but a subset of the super key.

To make the ‘user_zips’ table in 4NF, we have to separate the id \twoheadrightarrow language and id \twoheadrightarrow zip into two different tables as follows:

idlanguage
1English
1Korean
2English
3Japanese
3Hindi
user_languages
idzip
1563503
2563503
3856246
user_zips

Conclusion

Normalization in DBMS is the process to structure relation or talbes in a database in such a way such that there is minimum data redundancies and improved data integrity.

The objectives of normalization beyond 1NF (first normal form) as stated by Codd is as follows:

  1. To free the database from undesirable insert, update and delete anomalies.
  2. To reduce the need for restructuring of tables, as new types of data are introduced.
  3. To make the relational model more informative to users.

Adarsh Kumar

I am an engineer by education and writer by passion. I started this blog to share my little programming wisdom with other programmers out there. Hope it helps you.

Leave a Reply