Interview Questions

The questions interviewers actually ask — with model answers you can adapt.

These are the questions that come up most often in data engineer, data architect, and ETL developer interviews when ERWIN appears on the JD.

Q1. What's the difference between conceptual, logical, and physical data models?

Answer:

The Conceptual model shows just entities and relationships — no attributes or keys. It's for business stakeholders to agree on what the business cares about.
The Logical model adds attributes, primary keys, foreign keys, and cardinality. It's database-agnostic and typically normalized to 3NF.
The Physical model targets a specific database — tables, columns with vendor data types, indexes, tablespaces, constraints, partitioning. It's what generates the actual DDL.

I move from one to the next, validating with the appropriate audience: conceptual with business, logical with data architects, physical with DBAs.

Q2. What's the difference between an identifying and non-identifying relationship?

Answer: An identifying relationship means the child entity cannot exist without the parent — the parent's PK becomes part of the child's PK. ERWIN draws this as a solid line. Example: OrderLine cannot exist without an Order.

A non-identifying relationship means the child can exist independently and just references the parent. The parent's PK appears in the child as a regular foreign key, not part of the PK. ERWIN draws this as a dashed line. Example: an Order references a Customer, but the Order has its own independent OrderID as its PK.

Most foreign keys are non-identifying. Use identifying only for true weak entities.

Q3. What is FK migration in ERWIN?

Answer: When you draw a relationship between two entities, ERWIN automatically propagates ("migrates") the parent's primary key into the child as a foreign key. You don't have to add it manually. If you later rename the PK, the FK renames everywhere. If you move the relationship to point at a different parent, the old FK is removed and the new one appears. This automatic propagation is one of ERWIN's biggest productivity wins over hand-drawn ER diagrams.

Q4. Walk me through how you'd model an e-commerce platform.

Answer: I'd identify the core entities first — Customer, Order, Product, Category, Address — and the major relationships. Customer places many Orders. Order contains many Products (resolved with an OrderLine bridge table because it's many-to-many). Product belongs to one Category. Order ships to one Address.

For the logical model I'd add a surrogate primary key to each entity, attributes for the business data, and audit columns (created_at, updated_at) everywhere. I'd normalize to 3NF — Category in its own table, Address in its own table not inline on Customer.

For the physical model I'd pick the target database (say PostgreSQL), set exact data types (BIGINT for IDs, NUMERIC(12,2) for money, TIMESTAMPTZ for audit columns), add indexes on every foreign key plus business-critical query columns, and set up CHECK constraints for enums like order_status.

Q5. What's a surrogate key and why use one?

Answer: A surrogate key is an artificially generated identifier — usually an auto-incrementing BIGINT or a UUID — that has no business meaning. I use surrogate keys as primary keys for every dimension and fact table for several reasons:

Performance — joining on a single 8-byte integer is faster than joining on a composite key or a long string
Stability — natural keys like email or SSN can change; surrogate keys never do
SCD Type 2 support — you can have multiple "versions" of the same customer (different addresses over time), each with a unique surrogate key but the same business identifier
Security — surrogate keys don't leak business information

I still store the natural key as a UNIQUE column, just not as the PK.

Q6. How does ERWIN handle many-to-many relationships?

Answer: In the conceptual model you can draw a direct M:N relationship between two entities. But relational databases can't store M:N directly. When you move to the logical model, ERWIN resolves this by creating an associative entity (bridge table) between the two original entities. The bridge holds foreign keys to both parents plus any attributes specific to the relationship — for example, an Order-Product bridge called OrderLine that also stores quantity, unit_price, and line_total.

The bridge typically gets its own surrogate primary key rather than a composite of the two FKs — single-column PKs are faster to join on.

Q7. What is forward engineering vs reverse engineering?

Answer:

Forward engineering generates SQL DDL (CREATE TABLE, CREATE INDEX, etc.) from your ERWIN model so you can deploy it to a database. You configure options like "include drops", "include indexes", "include comments", then either save to a .sql file or push directly to a connected database.
Reverse engineering goes the opposite way — ERWIN reads an existing database via JDBC/ODBC and builds a model from the schema. Useful when inheriting a legacy system without a model, documenting a vendor database, or syncing direct-database changes back into the model.

The Complete Compare feature lets you do both directions to keep model and database in sync over time.

Q8. What's the difference between a star schema and a snowflake schema?

Answer: Both are data warehouse patterns with a central fact table surrounded by dimensions, but they differ in how the dimensions are structured.

Star schema: dimensions are denormalized — each dimension is one flat table. Product dimension has all product info including category and brand directly. Simpler to query, fewer joins, larger storage.
Snowflake schema: dimensions are normalized — broken into multiple related tables. Product → Category → Department, where each level is its own table. More joins, less storage, easier to maintain when reference data changes.

Most modern cloud warehouses prefer star because storage is cheap and query performance matters more. I'd default to star unless there's a specific reason to normalize a dimension.

Q9. What are Slowly Changing Dimensions and which type would you use?

Answer: Slowly Changing Dimensions (SCDs) handle how dimension data changes over time.

Type 1 overwrites the old value — no history. Use for corrections or attributes where history doesn't matter.
Type 2 adds a new row with effective dates and an is_current flag — full history. Use when historical analysis matters (most common).
Type 3 adds a "previous value" column — limited history. Rarely used.

For customer addresses in an e-commerce warehouse, I'd use Type 2 — when a customer moves, I want orders from before the move attributed to the old address for tax and reporting purposes. ERWIN supports modeling Type 2 with effective_from, effective_to, and is_current columns plus a surrogate key separate from the natural customer ID.

Q10. How do you handle data type changes in production with ERWIN?

Answer: I change the data type in the model, then use ERWIN's Complete Compare feature against the production database. ERWIN generates the ALTER TABLE statement showing the diff. I review it manually — type changes can be lossy (NUMERIC(10,2) to NUMERIC(8,2) may truncate values) — and decide whether to apply directly or stage the change with a data migration step.

For risky changes (anything that could lose data), I do a four-step migration: (1) add new column, (2) backfill with a transformation, (3) deploy code to use the new column, (4) drop the old column. The model reflects the end state but the migration scripts handle the intermediate steps.

Q11. What naming conventions do you follow?

Answer: I configure ERWIN's naming standards to enforce consistency:

Tables singular, snake_case ("customer", not "Customers")
Primary keys: {table}_id
Foreign keys: same name as the referenced PK
Booleans: is_* or has_* prefix
Timestamps: _at suffix; dates: _date suffix
Indexes: idx_{table}_{columns}
Foreign key constraints: fk_{child}_{parent}

Then I apply the standard across the whole model with one command. Consistent names make queries readable and reduce mistakes.

Q12. What's the difference between an ERWIN domain and a data type?

Answer: A data type is the database-level type — VARCHAR(50), INTEGER, NUMERIC(12,2). A domain is an ERWIN-level abstraction that wraps a data type with additional semantics — a name, default value, validation rule, and definition.

For example, I might define a domain called "money" that maps to NUMERIC(12,2) NOT NULL CHECK (value >= 0). Every "price", "amount", "total" column references this domain. If the business later decides money needs more precision, I change the domain in one place and every column updates.

Domains are essentially type aliases with policy — they enforce consistency and make changes easy.

Q13. Have you used ERWIN's subject area feature?

Answer: Yes — subject areas are how I organize large models. A single diagram with 100+ entities is unreadable. I create separate subject area diagrams for each business domain (Customer Management, Catalog, Order Management, Inventory, etc.). An entity can appear in multiple subject areas — useful for "boundary" entities like Customer that touch several domains.

The Model Explorer shows all subject areas, and I can navigate from a subject area diagram to any individual entity. This is how enterprise teams manage models with 500+ tables without going insane.

Q14. How do you version control ERWIN models?

Answer: I save models in XML format rather than the binary .erwin format. XML diffs cleanly in Git — you can see exactly which entities or columns changed in a pull request. Binary .erwin files just show "binary file changed" in code reviews, which is useless.

I commit at meaningful checkpoints (typically aligned with ticket numbers): "MODEL-1234: Add audit columns to order table". For team workflows with simultaneous edits, I'd use ERWIN Workgroup edition which has built-in check-in/check-out, but for solo or small-team work, Git + XML is sufficient.

Q15. What are the common pitfalls you watch out for in data modeling?

Answer: A few I've learned the hard way:

Storing comma-separated values in a single column — always normalize to a child table
Magic numbers as status codes — use a lookup table or readable strings
Mixing concerns in one column — currency and amount in one VARCHAR, for example
Forgetting indexes on foreign key columns — every FK needs an index on it for join performance
Using FLOAT for money — rounding errors compound; always use NUMERIC with explicit precision
Not adding audit columns — created_at, updated_at on every business table from day one
Using natural keys as PKs — they can change; surrogate keys are stable
Skipping definitions on entities — future-you needs to know what each table means

Q16. Tell me about a time you had to redesign a data model.

Answer (template — customize to your experience): "Early in a project we had Customer storing primary_address as inline columns — street, city, state, zip. Then product asked for support for multiple shipping addresses, then billing addresses, then preferred address types. The inline-columns design wasn't going to scale.

I redesigned: created an Address table with a surrogate key, moved all address columns out of Customer, added a customer_address bridge table with address_type ('shipping', 'billing'), is_default, and effective dates. Then I used ERWIN's forward engineering to generate the ALTER scripts and a separate Python migration to backfill existing data into the new structure. We deployed in three stages — add new tables, backfill, then deprecate old columns — to avoid downtime.

The lesson: model the relationship even for cases that 'look' 1:1 today, because business requirements often discover the M:N later."

How to use these Read each question, then close your eyes and try to answer it out loud before reading the model answer. Hearing yourself say the words is what gets you ready for the actual interview — reading silently doesn't.

What this section pairs with The Data Modeling concepts section covers the theory behind these answers in more depth — fact vs dimension, normalization, SCDs, denormalization. Walk through both this and the concepts section before any data engineering interview.

← Best Practices