Star Schema vs Snowflake Schema: How to Choose and When Each Wins

Both star schema and snowflake schema organize data warehouses around a central fact table surrounded by dimension tables. The difference is what happens to those dimensions — and that difference has real consequences for query performance, ETL complexity, storage cost, and maintainability.

This post gives you the full picture: what each pattern actually does, the technical trade-offs, which cloud warehouses favor which, and the interview answers that demonstrate you've used both in production.

What is a star schema?

In a star schema, each dimension is a single, denormalized table. All the attributes of a dimension — including what might logically belong to sub-entities — live in one flat table. The diagram looks like a star: fact table in the center, dimension tables radiating outward.

Star schema — fact_sales and dimensions

              dim_customer
              ─────────────
              customer_key
              customer_name
              city
              state         ← state and country are IN the same table as city
              country
                    │
                    │
fact_sales ─────────────────── dim_product
───────────       │            ──────────────
sales_key         │            product_key
customer_key ─────┘            product_name
product_key  ─────────────────► category_name  ← category is in same table
time_key     ─────────────────► dim_time
store_key    ─────────────────► dim_store
revenue
units_sold

Notice that dim_customer contains city, state, AND country in one flat table — even though in a relational DB you'd normalize these into a separate geography table. Star schema intentionally denormalizes for query speed.

What is a snowflake schema?

In a snowflake schema, dimension tables are normalized. Large dimensions are broken into multiple related tables. The diagram looks like a snowflake: the fact table connects to dimensions, which connect to sub-dimensions.

Snowflake schema — dim_customer split

dim_city         dim_state        dim_country
──────────       ──────────       ──────────────
city_key    ───► state_key   ───► country_key
city_name        state_name       country_name
                                  country_code

dim_customer
──────────────
customer_key
customer_name
city_key     ───► dim_city (which links to state, then country)

Query performance: star wins almost always

A typical warehouse query joins the fact table to 3-5 dimensions. With star schema, that's 3-5 joins — one per dimension. With snowflake schema, each normalized dimension adds more joins — a 3-level geography hierarchy means 3 additional joins just to get country from a customer.

Aspect	Star	Snowflake
Joins per query	Fewer (one per dimension)	More (multiple per dimension chain)
Query speed	Faster for typical BI queries	Slower (more joins, more I/O)
Storage	More (duplicate values in dims)	Less (normalized, no duplicates)
ETL complexity	Simpler (one table per dimension)	More complex (maintain relationships)
Maintenance	Harder when reference data changes	Easier (update once)
BI tool compatibility	Excellent — tools expect star	Good, but requires more configuration

Storage: snowflake wins but less than you'd think

Snowflake schema stores category names once; star schema stores them once per product row. But modern cloud warehouses use columnar compression — repeating string values compress extremely well. In practice, the storage difference between star and snowflake is often less than 5-10% in a typical warehouse with modern columnar storage.

THE VERDICT ON STORAGE In 2026, with S3 storage at $0.023/GB-month and modern columnar compression, the storage argument for snowflake is weak. The query performance argument for star is strong. Default to star unless you have a specific reason not to.

Which cloud warehouses favor which?

Snowflake (the warehouse, not the schema) — optimized for both patterns, but star schema queries typically execute faster due to fewer joins.

Amazon Redshift — star schema is the documented best practice. Redshift's distribution keys and sort keys are designed around the star pattern.

Google BigQuery — denormalized tables (star schema or even wider fact tables) with nested/repeated fields outperform normalized schemas. BigQuery's pricing is per bytes scanned, not per join.

Databricks Delta Lake — either works. Delta's columnar format + caching makes star queries very fast.

When to use snowflake

Use snowflake (normalized dimensions) when:

Dimension tables are very large (millions of rows) and reference data changes frequently
Storage is genuinely constrained (on-prem, limited disk)
Your ETL team has strong relational DB experience and finds normalized dims easier to maintain
You're implementing a Data Vault methodology

Interview model answer

"Star schema denormalizes dimensions into flat tables, which means fewer joins and faster BI queries at the cost of some storage redundancy. Snowflake normalizes dimensions into hierarchies, which saves storage but adds joins. For most cloud DWH implementations in 2026, I default to star schema — storage is cheap, BI tools expect it, and query performance is better. I'd switch to snowflake only if dimension tables were very large and frequently updated, or if we were on-prem with strict storage constraints."

RELATED See the Star vs Snowflake lesson in the Data Modeling section for a hands-on walkthrough with schema diagrams and SQL examples.

What is a star schema?

What is a snowflake schema?

Query performance: star wins almost always

Storage: snowflake wins but less than you'd think

Which cloud warehouses favor which?

When to use snowflake

Interview model answer

Raman Sharma