🛠 Learn ERWIN

Normalization (1NF, 2NF, 3NF)

Organizing data to eliminate redundancy — one normal form at a time.

Normalization is the process of organizing tables to reduce redundancy and avoid update anomalies. It was formalized by Edgar Codd in the 1970s and remains essential for OLTP database design.

Why normalize?

1NF — First Normal Form

A table is in 1NF when:

  1. All values are atomic (indivisible)
  2. There are no repeating groups
  3. Each column contains a single-valued attribute

Violation of 1NF

StudentCourses StudentID │ Name │ Courses 101 │ John │ Math, Physics, Chemistry ← multi-valued! 102 │ Sara │ English, History

Converted to 1NF

StudentCourses (1NF) StudentID │ Name │ Course 101 │ John │ Math 101 │ John │ Physics 101 │ John │ Chemistry 102 │ Sara │ English 102 │ Sara │ History
Notice the redundancy? John's name now appears three times. We've achieved 1NF but introduced new problems — that's what 2NF will fix.

2NF — Second Normal Form

A table is in 2NF when:

  1. It is in 1NF
  2. All non-key attributes are fully dependent on the entire primary key (no partial dependencies)

This matters when you have a composite primary key.

Violation of 2NF

StudentCourses (1NF, NOT 2NF) StudentID │ CourseID │ Name │ CourseTitle 101C1 │ John │ Mathematics 101C2 │ John │ Physics ↑ ↑ Depends only on Depends only on StudentID CourseID (partial dependency on composite key)

Converted to 2NF (split into three tables)

Student StudentCourses Course StudentID │ Name StudentID │ CourseID CourseID │ CourseTitle 101 │ John 101C1 C1 │ Mathematics 102 │ Sara 101C2 C2 │ Physics 102C1

3NF — Third Normal Form

A table is in 3NF when:

  1. It is in 2NF
  2. There are no transitive dependencies (non-key attributes don't depend on other non-key attributes)

Violation of 3NF

Student (2NF, NOT 3NF) StudentID │ Name │ DepartmentID │ DepartmentName 101 │ John │ D1 │ Engineering 102 │ Sara │ D1 │ Engineering ← redundant! 103 │ Amy │ D2 │ Marketing DepartmentName depends on DepartmentID, not on StudentID → transitive dependency: StudentID → DepartmentID → DepartmentName

Converted to 3NF

Student Department StudentID │ Name │ DepartmentID DepartmentID │ DepartmentName 101 │ John │ D1 D1 │ Engineering 102 │ Sara │ D1 D2 │ Marketing 103 │ Amy │ D2

The progression

FormEliminatesRule
1NFMulti-valued attributesAtomic values, no repeating groups
2NFPartial dependenciesAll non-key attributes fully depend on full PK
3NFTransitive dependenciesNon-key attributes don't depend on other non-key attributes

Beyond 3NF

For most production OLTP databases, 3NF is the sweet spot. Higher normal forms exist (BCNF, 4NF, 5NF) but are rarely worth the complexity outside specialized cases.

When NOT to normalize

For OLAP workloads (analytics, reporting, data warehouses), you'll often deliberately denormalize for query performance. Normalization minimizes writes; denormalization optimizes reads. Choose based on your workload.

Interview shortcut "1NF: atomic values. 2NF: no partial dependencies on composite keys. 3NF: no transitive dependencies between non-key columns. We typically normalize to 3NF for OLTP and denormalize for OLAP."