Star vs Snowflake Schema
Two ways to organize a data warehouse — performance vs storage.
Both schemas have a central fact table connected to dimensions. The difference is whether those dimensions are denormalized (star) or normalized (snowflake).
Star schema
A central fact table connects directly to denormalized dimension tables. Each dimension is a single table containing all related attributes.
Star Schema
DateDim
│
│
CustomerDim ────► SalesFact ◄──── ProductDim
│
│
StoreDim
Each dimension is one denormalized table holding all attributes.
ProductDim has columns: ProductID, Name, Category, Brand, Manufacturer, etc.
Snowflake schema
Dimensions are normalized into multiple related tables. Where a star schema has one ProductDim, a snowflake has ProductDim → CategoryDim, ProductDim → BrandDim, etc.
Snowflake Schema
DateDim
│
CountryDim │ CategoryDim
│ │ │
│ │ │
CityDim ─► CustomerDim ─► SalesFact ◄─ ProductDim ─► BrandDim
│
│
StoreDim
Each dimension is split into normalized sub-tables.
Side-by-side comparison
| Aspect | Star Schema | Snowflake Schema |
|---|---|---|
| Structure | Simple, flat dimensions | Normalized, hierarchical dimensions |
| Normalization | Denormalized | Normalized (3NF or higher) |
| Storage | More (data is redundant) | Less (no duplication) |
| Query performance | Faster (fewer joins) | Slower (more joins) |
| Maintenance | Easier — single dim to update | Harder — cascading updates |
| Query simplicity | Simpler — fewer tables | More complex SQL |
| Data integrity | Lower (denormalized risks) | Higher (enforced relationships) |
| Use case | OLAP, BI dashboards | Storage-constrained warehouses |
Concrete example
Star — product dimension as one table
ProductDim
ProductID │ ProductName │ CategoryID │ CategoryName │ BrandID │ BrandName
1 │ Laptop X │ 10 │ Electronics │ B1 │ TechCo
2 │ Phone Y │ 10 │ Electronics │ B2 │ MobileCo
3 │ Chair Z │ 20 │ Furniture │ B3 │ HomeCo
"Electronics" repeats for every electronic product. Fast to query, more storage.
Snowflake — product, category, brand as separate tables
ProductDim CategoryDim BrandDim
ProductID │ Name │ CatID │ BrandID CatID │ CatName BrandID │ BrandName
1 │ Laptop X │ 10 │ B1 10 │ Electronics B1 │ TechCo
2 │ Phone Y │ 10 │ B2 20 │ Furniture B2 │ MobileCo
3 │ Chair Z │ 20 │ B3 B3 │ HomeCo
"Electronics" stored once. Less storage, but every query needs 2-3 more joins.
Query complexity comparison
Star — one join
SELECT p.CategoryName,
SUM(s.Amount)
FROM SalesFact s
JOIN ProductDim p
ON s.ProductID = p.ProductID
GROUP BY p.CategoryName;
SUM(s.Amount)
FROM SalesFact s
JOIN ProductDim p
ON s.ProductID = p.ProductID
GROUP BY p.CategoryName;
Snowflake — two joins
SELECT c.CatName,
SUM(s.Amount)
FROM SalesFact s
JOIN ProductDim p ON s.ProductID = p.ProductID
JOIN CategoryDim c ON p.CatID = c.CatID
GROUP BY c.CatName;
SUM(s.Amount)
FROM SalesFact s
JOIN ProductDim p ON s.ProductID = p.ProductID
JOIN CategoryDim c ON p.CatID = c.CatID
GROUP BY c.CatName;
When to pick which
Pick Star when…
- Query performance is the top priority
- You're building dashboards and BI reports
- Storage is cheap (typical cloud warehouse)
- Your team prefers simpler SQL
Pick Snowflake when…
- Storage is genuinely constrained
- Dimensions are huge with lots of repeated text
- Data integrity rules are critical
- You're integrating with normalized source systems
The 90% answer
For most modern cloud data warehouses (Snowflake, BigQuery, Redshift), star schemas win. Storage is cheap; query speed matters more. Use snowflake only when you have a specific reason to normalize a particular dimension.