Both star schema and snowflake schema organize data warehouses around a central fact table surrounded by dimension tables. The difference is what happens to those dimensions โ and that difference has real consequences for query performance, ETL complexity, storage cost, and maintainability.
This post gives you the full picture: what each pattern actually does, the technical trade-offs, which cloud warehouses favor which, and the interview answers that demonstrate you've used both in production.
What is a star schema?
In a star schema, each dimension is a single, denormalized table. All the attributes of a dimension โ including what might logically belong to sub-entities โ live in one flat table. The diagram looks like a star: fact table in the center, dimension tables radiating outward.
dim_customer
โโโโโโโโโโโโโ
customer_key
customer_name
city
state โ state and country are IN the same table as city
country
โ
โ
fact_sales โโโโโโโโโโโโโโโโโโโ dim_product
โโโโโโโโโโโ โ โโโโโโโโโโโโโโ
sales_key โ product_key
customer_key โโโโโโ product_name
product_key โโโโโโโโโโโโโโโโโโบ category_name โ category is in same table
time_key โโโโโโโโโโโโโโโโโโบ dim_time
store_key โโโโโโโโโโโโโโโโโโบ dim_store
revenue
units_sold
Notice that dim_customer contains city, state, AND country in one flat table โ even though in a relational DB you'd normalize these into a separate geography table. Star schema intentionally denormalizes for query speed.
What is a snowflake schema?
In a snowflake schema, dimension tables are normalized. Large dimensions are broken into multiple related tables. The diagram looks like a snowflake: the fact table connects to dimensions, which connect to sub-dimensions.
dim_city dim_state dim_country
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโโโโโ
city_key โโโโบ state_key โโโโบ country_key
city_name state_name country_name
country_code
dim_customer
โโโโโโโโโโโโโโ
customer_key
customer_name
city_key โโโโบ dim_city (which links to state, then country)
Query performance: star wins almost always
A typical warehouse query joins the fact table to 3-5 dimensions. With star schema, that's 3-5 joins โ one per dimension. With snowflake schema, each normalized dimension adds more joins โ a 3-level geography hierarchy means 3 additional joins just to get country from a customer.
| Aspect | Star | Snowflake |
|---|---|---|
| Joins per query | Fewer (one per dimension) | More (multiple per dimension chain) |
| Query speed | Faster for typical BI queries | Slower (more joins, more I/O) |
| Storage | More (duplicate values in dims) | Less (normalized, no duplicates) |
| ETL complexity | Simpler (one table per dimension) | More complex (maintain relationships) |
| Maintenance | Harder when reference data changes | Easier (update once) |
| BI tool compatibility | Excellent โ tools expect star | Good, but requires more configuration |
Storage: snowflake wins but less than you'd think
Snowflake schema stores category names once; star schema stores them once per product row. But modern cloud warehouses use columnar compression โ repeating string values compress extremely well. In practice, the storage difference between star and snowflake is often less than 5-10% in a typical warehouse with modern columnar storage.
Which cloud warehouses favor which?
Snowflake (the warehouse, not the schema) โ optimized for both patterns, but star schema queries typically execute faster due to fewer joins.
Amazon Redshift โ star schema is the documented best practice. Redshift's distribution keys and sort keys are designed around the star pattern.
Google BigQuery โ denormalized tables (star schema or even wider fact tables) with nested/repeated fields outperform normalized schemas. BigQuery's pricing is per bytes scanned, not per join.
Databricks Delta Lake โ either works. Delta's columnar format + caching makes star queries very fast.
When to use snowflake
Use snowflake (normalized dimensions) when:
- Dimension tables are very large (millions of rows) and reference data changes frequently
- Storage is genuinely constrained (on-prem, limited disk)
- Your ETL team has strong relational DB experience and finds normalized dims easier to maintain
- You're implementing a Data Vault methodology
Interview model answer
"Star schema denormalizes dimensions into flat tables, which means fewer joins and faster BI queries at the cost of some storage redundancy. Snowflake normalizes dimensions into hierarchies, which saves storage but adds joins. For most cloud DWH implementations in 2026, I default to star schema โ storage is cheap, BI tools expect it, and query performance is better. I'd switch to snowflake only if dimension tables were very large and frequently updated, or if we were on-prem with strict storage constraints."