Star vs Snowflake Schema

Two ways to organize a data warehouse — performance vs storage.

Both schemas have a central fact table connected to dimensions. The difference is whether those dimensions are denormalized (star) or normalized (snowflake).

Star schema

A central fact table connects directly to denormalized dimension tables. Each dimension is a single table containing all related attributes.

Star Schema DateDim │ │ CustomerDim ────► SalesFact ◄──── ProductDim │ │ StoreDim Each dimension is one denormalized table holding all attributes. ProductDim has columns: ProductID, Name, Category, Brand, Manufacturer, etc.

Snowflake schema

Dimensions are normalized into multiple related tables. Where a star schema has one ProductDim, a snowflake has ProductDim → CategoryDim, ProductDim → BrandDim, etc.

Snowflake Schema DateDim │ CountryDim │ CategoryDim │ │ │ │ │ │ CityDim ─► CustomerDim ─► SalesFact ◄─ ProductDim ─► BrandDim │ │ StoreDim Each dimension is split into normalized sub-tables.

Side-by-side comparison

Aspect	Star Schema	Snowflake Schema
Structure	Simple, flat dimensions	Normalized, hierarchical dimensions
Normalization	Denormalized	Normalized (3NF or higher)
Storage	More (data is redundant)	Less (no duplication)
Query performance	Faster (fewer joins)	Slower (more joins)
Maintenance	Easier — single dim to update	Harder — cascading updates
Query simplicity	Simpler — fewer tables	More complex SQL
Data integrity	Lower (denormalized risks)	Higher (enforced relationships)
Use case	OLAP, BI dashboards	Storage-constrained warehouses

Concrete example

Star — product dimension as one table

ProductDim ProductID │ ProductName │ CategoryID │ CategoryName │ BrandID │ BrandName 1 │ Laptop X │ 10 │ Electronics │ B1 │ TechCo 2 │ Phone Y │ 10 │ Electronics │ B2 │ MobileCo 3 │ Chair Z │ 20 │ Furniture │ B3 │ HomeCo "Electronics" repeats for every electronic product. Fast to query, more storage.

Snowflake — product, category, brand as separate tables

ProductDim CategoryDim BrandDim ProductID │ Name │ CatID │ BrandID CatID │ CatName BrandID │ BrandName 1 │ Laptop X │ 10 │ B1 10 │ Electronics B1 │ TechCo 2 │ Phone Y │ 10 │ B2 20 │ Furniture B2 │ MobileCo 3 │ Chair Z │ 20 │ B3 B3 │ HomeCo "Electronics" stored once. Less storage, but every query needs 2-3 more joins.

Query complexity comparison

Star — one join

SELECT p.CategoryName,
SUM(s.Amount)
FROM SalesFact s
JOIN ProductDim p
ON s.ProductID = p.ProductID
GROUP BY p.CategoryName;

Snowflake — two joins

SELECT c.CatName,
SUM(s.Amount)
FROM SalesFact s
JOIN ProductDim p ON s.ProductID = p.ProductID
JOIN CategoryDim c ON p.CatID = c.CatID
GROUP BY c.CatName;

When to pick which

Pick Star when…

Query performance is the top priority
You're building dashboards and BI reports
Storage is cheap (typical cloud warehouse)
Your team prefers simpler SQL

Pick Snowflake when…

Storage is genuinely constrained
Dimensions are huge with lots of repeated text
Data integrity rules are critical
You're integrating with normalized source systems

The 90% answer For most modern cloud data warehouses (Snowflake, BigQuery, Redshift), star schemas win. Storage is cheap; query speed matters more. Use snowflake only when you have a specific reason to normalize a particular dimension.

← Normalization (1NF/2NF/3NF) Slowly Changing Dimensions →