🛠 Learn ERWIN

Modeling Changing Addresses

A common interview problem — when an attribute changes over time, how do you preserve history?

The interview question "You have customer data where attributes like address change frequently. How would you model this?"

The scenario

You have a customer named John who lives at 123 Main Street. Three years later, he moves to 456 Oak Avenue. The business needs to know:

This is a classic Type 2 Slowly Changing Dimension scenario.

The solution

Add columns for date ranges and a "current" flag. Add a surrogate key so each version of the customer has a unique identifier.

CustomerDimension ├── CustomerSK (PK) -- surrogate key — unique per VERSION ├── CustomerID -- natural key — same for all versions of one person ├── CustomerName ├── Address ├── City ├── State ├── ZipCode ├── EffectiveDate -- when this version started being valid ├── EndDate -- when this version stopped being valid └── IsCurrent -- 'Y'/'N' flag for fast filtering

How the data looks

CustomerSK │ CustomerID │ Address │ EffectiveDate │ EndDate │ IsCurrent 1001C001 │ 123 Main │ 2020-01-01 │ 2023-06-30 │ N 1002C001 │ 456 Oak │ 2023-07-01 │ 9999-12-31 │ Y Same customer (CustomerID = C001), two versions (CustomerSK 1001, 1002). Sales recorded between 2020 and 2023 reference CustomerSK 1001. Sales after July 2023 reference CustomerSK 1002.

Querying the dimension

"Where does John live now?"

SQL
SELECT Address, City, State FROM CustomerDimension WHERE CustomerID = 'C001' AND IsCurrent = 'Y';

"Where did John live on July 1, 2022?"

SQL
SELECT Address, City, State FROM CustomerDimension WHERE CustomerID = 'C001' AND '2022-07-01' BETWEEN EffectiveDate AND EndDate;

"Total sales per city for 2022"

Joining sales to the dimension automatically picks up the address valid at the time of each sale:

SQL
SELECT c.City, SUM(s.Amount) AS TotalSales FROM SalesFact s JOIN CustomerDimension c ON s.CustomerSK = c.CustomerSK -- joins on the historical version WHERE YEAR(s.SaleDate) = 2022 GROUP BY c.City;

The ETL process

When a customer's address changes, your nightly ETL job:

  1. Close the old row — set EndDate = yesterday, IsCurrent = 'N'
  2. Insert a new row — generate new CustomerSK, set EffectiveDate = today, EndDate = '9999-12-31', IsCurrent = 'Y'
SQL — SCD Type 2 update
-- Step 1: close out the old version UPDATE CustomerDimension SET EndDate = CURRENT_DATE - 1, IsCurrent = 'N' WHERE CustomerID = 'C001' AND IsCurrent = 'Y'; -- Step 2: insert the new version INSERT INTO CustomerDimension (CustomerID, CustomerName, Address, City, EffectiveDate, EndDate, IsCurrent) VALUES ('C001', 'John Smith', '456 Oak Ave', 'Mumbai', CURRENT_DATE, '9999-12-31', 'Y');

Why this design wins

Trade-offs to discuss

The structured answer for interviews "I'd use a Type 2 Slowly Changing Dimension. Each customer change creates a new row with a unique surrogate key (CustomerSK) — the natural key (CustomerID) stays the same to identify the real person. EffectiveDate, EndDate, and IsCurrent columns let you find the current version, a specific historical version, or join sales to the right historical state."