Normalization (1NF, 2NF, 3NF)
Organizing data to eliminate redundancy — one normal form at a time.
Normalization is the process of organizing tables to reduce redundancy and avoid update anomalies. It was formalized by Edgar Codd in the 1970s and remains essential for OLTP database design.
Why normalize?
- Avoid redundancy — don't store the same fact in two places
- Prevent update anomalies — update once, not in 50 rows
- Maintain integrity — relationships enforced by foreign keys
- Save space — no duplicated text in millions of rows
1NF — First Normal Form
A table is in 1NF when:
- All values are atomic (indivisible)
- There are no repeating groups
- Each column contains a single-valued attribute
Violation of 1NF
Converted to 1NF
2NF — Second Normal Form
A table is in 2NF when:
- It is in 1NF
- All non-key attributes are fully dependent on the entire primary key (no partial dependencies)
This matters when you have a composite primary key.
Violation of 2NF
Converted to 2NF (split into three tables)
3NF — Third Normal Form
A table is in 3NF when:
- It is in 2NF
- There are no transitive dependencies (non-key attributes don't depend on other non-key attributes)
Violation of 3NF
Converted to 3NF
The progression
| Form | Eliminates | Rule |
|---|---|---|
| 1NF | Multi-valued attributes | Atomic values, no repeating groups |
| 2NF | Partial dependencies | All non-key attributes fully depend on full PK |
| 3NF | Transitive dependencies | Non-key attributes don't depend on other non-key attributes |
Beyond 3NF
For most production OLTP databases, 3NF is the sweet spot. Higher normal forms exist (BCNF, 4NF, 5NF) but are rarely worth the complexity outside specialized cases.
When NOT to normalize
For OLAP workloads (analytics, reporting, data warehouses), you'll often deliberately denormalize for query performance. Normalization minimizes writes; denormalization optimizes reads. Choose based on your workload.