Modeling Changing Addresses
A common interview problem — when an attribute changes over time, how do you preserve history?
The interview question
"You have customer data where attributes like address change frequently. How would you model this?"
The scenario
You have a customer named John who lives at 123 Main Street. Three years later, he moves to 456 Oak Avenue. The business needs to know:
- Where does John live today?
- Where did John live in July 2022?
- Which sales happened while he lived at the old address?
This is a classic Type 2 Slowly Changing Dimension scenario.
The solution
Add columns for date ranges and a "current" flag. Add a surrogate key so each version of the customer has a unique identifier.
CustomerDimension
├── CustomerSK (PK) -- surrogate key — unique per VERSION
├── CustomerID -- natural key — same for all versions of one person
├── CustomerName
├── Address
├── City
├── State
├── ZipCode
├── EffectiveDate -- when this version started being valid
├── EndDate -- when this version stopped being valid
└── IsCurrent -- 'Y'/'N' flag for fast filtering
How the data looks
CustomerSK │ CustomerID │ Address │ EffectiveDate │ EndDate │ IsCurrent
1001 │ C001 │ 123 Main │ 2020-01-01 │ 2023-06-30 │ N
1002 │ C001 │ 456 Oak │ 2023-07-01 │ 9999-12-31 │ Y
Same customer (CustomerID = C001), two versions (CustomerSK 1001, 1002).
Sales recorded between 2020 and 2023 reference CustomerSK 1001.
Sales after July 2023 reference CustomerSK 1002.
Querying the dimension
"Where does John live now?"
SQL
SELECT Address, City, State
FROM CustomerDimension
WHERE CustomerID = 'C001'
AND IsCurrent = 'Y';
"Where did John live on July 1, 2022?"
SQL
SELECT Address, City, State
FROM CustomerDimension
WHERE CustomerID = 'C001'
AND '2022-07-01' BETWEEN EffectiveDate AND EndDate;
"Total sales per city for 2022"
Joining sales to the dimension automatically picks up the address valid at the time of each sale:
SQL
SELECT c.City,
SUM(s.Amount) AS TotalSales
FROM SalesFact s
JOIN CustomerDimension c ON s.CustomerSK = c.CustomerSK -- joins on the historical version
WHERE YEAR(s.SaleDate) = 2022
GROUP BY c.City;
The ETL process
When a customer's address changes, your nightly ETL job:
- Close the old row — set
EndDate = yesterday,IsCurrent = 'N' - Insert a new row — generate new CustomerSK, set
EffectiveDate = today,EndDate = '9999-12-31',IsCurrent = 'Y'
SQL — SCD Type 2 update
-- Step 1: close out the old version
UPDATE CustomerDimension
SET EndDate = CURRENT_DATE - 1,
IsCurrent = 'N'
WHERE CustomerID = 'C001'
AND IsCurrent = 'Y';
-- Step 2: insert the new version
INSERT INTO CustomerDimension
(CustomerID, CustomerName, Address, City, EffectiveDate, EndDate, IsCurrent)
VALUES
('C001', 'John Smith', '456 Oak Ave', 'Mumbai', CURRENT_DATE, '9999-12-31', 'Y');
Why this design wins
- Historical accuracy — 2022 reports still show John in Bangalore
- Current view is fast —
WHERE IsCurrent = 'Y'is a simple index lookup - Point-in-time queries work — date range BETWEEN is straightforward
- No data loss — every state of every customer is preserved
Trade-offs to discuss
- Dimension table grows over time (one row per customer per change)
- ETL is more complex than a simple UPDATE
- Joins must use the surrogate key, not the natural key
- Reports must explicitly filter for
IsCurrent = 'Y'or by date range
The structured answer for interviews
"I'd use a Type 2 Slowly Changing Dimension. Each customer change creates a new row with a unique surrogate key (CustomerSK) — the natural key (CustomerID) stays the same to identify the real person. EffectiveDate, EndDate, and IsCurrent columns let you find the current version, a specific historical version, or join sales to the right historical state."