Normalization (1NF, 2NF, 3NF)

Organizing data to eliminate redundancy — one normal form at a time.

Normalization is the process of organizing tables to reduce redundancy and avoid update anomalies. It was formalized by Edgar Codd in the 1970s and remains essential for OLTP database design.

Why normalize?

Avoid redundancy — don't store the same fact in two places
Prevent update anomalies — update once, not in 50 rows
Maintain integrity — relationships enforced by foreign keys
Save space — no duplicated text in millions of rows

1NF — First Normal Form

A table is in 1NF when:

All values are atomic (indivisible)
There are no repeating groups
Each column contains a single-valued attribute

Violation of 1NF

StudentCourses StudentID │ Name │ Courses 101 │ John │ Math, Physics, Chemistry ← multi-valued! 102 │ Sara │ English, History

Converted to 1NF

StudentCourses (1NF) StudentID │ Name │ Course 101 │ John │ Math 101 │ John │ Physics 101 │ John │ Chemistry 102 │ Sara │ English 102 │ Sara │ History

Notice the redundancy? John's name now appears three times. We've achieved 1NF but introduced new problems — that's what 2NF will fix.

2NF — Second Normal Form

A table is in 2NF when:

It is in 1NF
All non-key attributes are fully dependent on the entire primary key (no partial dependencies)

This matters when you have a composite primary key.

Violation of 2NF

StudentCourses (1NF, NOT 2NF) StudentID │ CourseID │ Name │ CourseTitle 101 │ C1 │ John │ Mathematics 101 │ C2 │ John │ Physics ↑ ↑ Depends only on Depends only on StudentID CourseID (partial dependency on composite key)

Converted to 2NF (split into three tables)

Student StudentCourses Course StudentID │ Name StudentID │ CourseID CourseID │ CourseTitle 101 │ John 101 │ C1 C1 │ Mathematics 102 │ Sara 101 │ C2 C2 │ Physics 102 │ C1

3NF — Third Normal Form

A table is in 3NF when:

It is in 2NF
There are no transitive dependencies (non-key attributes don't depend on other non-key attributes)

Violation of 3NF

Student (2NF, NOT 3NF) StudentID │ Name │ DepartmentID │ DepartmentName 101 │ John │ D1 │ Engineering 102 │ Sara │ D1 │ Engineering ← redundant! 103 │ Amy │ D2 │ Marketing DepartmentName depends on DepartmentID, not on StudentID → transitive dependency: StudentID → DepartmentID → DepartmentName

Converted to 3NF

Student Department StudentID │ Name │ DepartmentID DepartmentID │ DepartmentName 101 │ John │ D1 D1 │ Engineering 102 │ Sara │ D1 D2 │ Marketing 103 │ Amy │ D2

The progression

Form	Eliminates	Rule
1NF	Multi-valued attributes	Atomic values, no repeating groups
2NF	Partial dependencies	All non-key attributes fully depend on full PK
3NF	Transitive dependencies	Non-key attributes don't depend on other non-key attributes

Beyond 3NF

For most production OLTP databases, 3NF is the sweet spot. Higher normal forms exist (BCNF, 4NF, 5NF) but are rarely worth the complexity outside specialized cases.

When NOT to normalize

For OLAP workloads (analytics, reporting, data warehouses), you'll often deliberately denormalize for query performance. Normalization minimizes writes; denormalization optimizes reads. Choose based on your workload.

Interview shortcut "1NF: atomic values. 2NF: no partial dependencies on composite keys. 3NF: no transitive dependencies between non-key columns. We typically normalize to 3NF for OLTP and denormalize for OLAP."

← Fact vs Dimension Tables Star vs Snowflake Schema →