🛠 Learn ERWIN

E-commerce Star Schema

A complete design for an e-commerce analytics warehouse — fact + dimensions, end to end.

The interview question "Design a data model for an e-commerce platform."

This is a common open-ended design question. Here's a complete star schema covering orders, customers, products, and dates — the kind of design that powers BI dashboards at any modern e-commerce company.

The big picture

DateDimension │ │ CustomerDim ────────► OrderFacts ◄──────── ProductDim │ │ │ │ │ │ LocationDim PaymentDim CategoryDimBrandDim

The fact table

OrderFacts -- the central transactional fact table ├── OrderFactSK (PK) -- surrogate key for the fact row ├── OrderID -- degenerate dimension (real business order #) ├── CustomerSK (FK) -- → CustomerDim ├── ProductSK (FK) -- → ProductDim ├── OrderDateSK (FK) -- → DateDim (when order was placed) ├── ShipDateSK (FK) -- → DateDim (when order shipped) ├── LocationSK (FK) -- → LocationDim (ship-to address) ├── PaymentSK (FK) -- → PaymentDim │ ├── Quantity -- measure ├── UnitPrice -- measure ├── Discount -- measure ├── Tax -- measure ├── ShippingCost -- measure └── OrderAmount -- measure (total revenue)

The dimensions

CustomerDimension (Type 2 SCD)

├── CustomerSK (PK) -- surrogate, unique per version ├── CustomerID -- natural key, stable across versions ├── CustomerName ├── Email ├── PhoneNumber ├── AddressSK (FK) -- → LocationDim ├── RegistrationDate ├── CustomerSegment -- e.g. "Premium", "Regular", "VIP" ├── EffectiveDate ├── EndDate └── IsCurrent

ProductDimension (Type 2 SCD)

├── ProductSK (PK) ├── ProductID -- SKU / natural key ├── ProductName ├── CategoryID (FK) -- → CategoryDim ├── CategoryName -- denormalized for performance ├── BrandID (FK) -- → BrandDim ├── BrandName -- denormalized ├── UnitPrice ├── EffectiveDate ├── EndDate └── IsCurrent

DateDimension

├── DateSK (PK) -- e.g. 20240115 ├── Date -- actual date value ├── Year -- 2024 ├── Quarter -- Q1 ├── Month -- 1 ├── MonthName -- January ├── DayOfMonth -- 15 ├── DayOfWeek -- Monday ├── Week -- 3 ├── IsWeekend -- 'N' └── IsHoliday -- 'Y'/'N'

LocationDimension

├── LocationSK (PK) ├── Address ├── City ├── State ├── PostalCode ├── Country └── Region -- e.g. "APAC", "EMEA"

Key design decisions

1. Separate location dimension for reusability

Both customer address and shipping address need location data. A reusable LocationDim avoids duplication in CustomerDim.

2. Product dimension tracks historical changes

Type 2 SCD on products means we can answer "what was this product's price in January 2023?" — useful for revenue attribution.

3. Multiple date references in the fact

OrderDateSK and ShipDateSK both point to DateDim. This is a role-playing dimension — same table, different roles in the fact.

4. Degenerate dimensions in the fact

OrderID stays in OrderFacts even though there's no OrderDim. It's a degenerate dimension — an attribute that exists only in the fact, not in any separate dimension. Useful for grouping fact rows back into their original business transactions.

5. Selective denormalization

CategoryName and BrandName are duplicated in ProductDim. This sacrifices some normalization purity for fewer joins on common queries. Star schemas explicitly trade storage for read speed.

6. Surrogate keys everywhere

Every dimension uses a surrogate key. This lets Type 2 SCDs work, makes joins fast, and decouples the warehouse from operational system ID changes.

Sample query: monthly revenue by category

SQL
SELECT d.Year, d.Month, p.CategoryName, SUM(f.OrderAmount) AS TotalRevenue, COUNT(DISTINCT f.OrderID) AS OrderCount FROM OrderFacts f JOIN DateDimension d ON f.OrderDateSK = d.DateSK JOIN ProductDimension p ON f.ProductSK = p.ProductSK WHERE d.Year = 2024 GROUP BY d.Year, d.Month, p.CategoryName ORDER BY d.Month, TotalRevenue DESC;

What to mention in the interview

  1. Identify grain first — what does one row in the fact represent? (Here: one order line item.)
  2. Choose surrogate keys — for SCD support and join performance.
  3. Pick SCD types — Type 2 for Customer and Product (history matters), Type 1 for things like email corrections.
  4. Handle role-playing dimensions — OrderDate and ShipDate both reference DateDim.
  5. Explain measures vs dimensions — measures are additive numbers; foreign keys point to descriptive context.
  6. Be ready for "how would you partition this?" — by OrderDate, monthly or yearly.
The structured walkthrough Start with the central question: "what business event are we measuring?" (orders). That's your fact. Then ask "what context describes those events?" (who, what, where, when) — those are your dimensions. Add surrogate keys, decide SCD types, and you have a star schema.