🏛 Data Modeling

3. Physical Data Model

Turn the logical design into a real database — exact data types, indexes, tablespaces, the works.

What is a Physical Data Model?

The Physical Data Model (PDM) is the database-specific implementation of your logical model. It contains:

The audience is DBAs and the database engine itself. This is what your forward-engineering will turn into CREATE TABLE statements.

Setting up the physical layer

1

Pick a target database

Choose from the dropdown. Common options:

  • Microsoft SQL Server 2019/2022
  • Oracle 19c/21c
  • PostgreSQL 14/15/16
  • MySQL 8
  • IBM Db2
  • Snowflake — for data warehouses
  • Amazon Redshift — for AWS warehouses

Pick the one matching your actual deployment target. For this walkthrough, choose PostgreSQL 15.

ERWIN now knows the data types and SQL dialect to use for generation.

2

Switch to Physical view

The canvas updates. Entities are now tables. Attributes are now columns. The names follow your physical naming convention (which is "same as logical with spaces removed" by default).

You might see "Customer ID" become "CustomerID" or "CUSTOMER_ID" depending on the default naming standard. We'll customize this in the next step.

3

Set up a naming standard

You'll get a dialog with rules. Common conventions:

  • Tables: snake_case singular ("customer") or plural ("customers")
  • Columns: snake_case ("customer_id", "first_name")
  • Indexes: idx_table_columns ("idx_orders_customer_id")
  • Foreign keys: fk_child_parent ("fk_orders_customer")
  • Primary keys: pk_table ("pk_customer")

Apply the standard:

All your tables and columns are renamed at once, consistently.

4

Refine column data types

The auto-mapped types are usually reasonable but rarely perfect. Open each table and verify:

customer table

  • customer_id → BIGINT (use BIGINT not INT for IDs — INT overflows at ~2.1B rows)
  • first_name → VARCHAR(50)
  • last_name → VARCHAR(50)
  • email → VARCHAR(254) (RFC max email length)
  • phone → VARCHAR(20)
  • created_date → TIMESTAMP WITH TIME ZONE (always with TZ)

order table

  • order_id → BIGINT
  • customer_id → BIGINT NOT NULL
  • shipping_address_id → BIGINT NOT NULL
  • order_date → TIMESTAMP WITH TIME ZONE NOT NULL
  • order_status → VARCHAR(20) NOT NULL
  • total_amount → NUMERIC(12,2) NOT NULL (cents matter)
Avoid FLOAT for money Never use FLOAT or REAL for money — they have rounding errors that compound over many transactions. Always use NUMERIC / DECIMAL with explicit precision and scale.
5

Add indexes

Indexes speed up reads but slow down writes. Add them strategically.

For each table, right-click → Indexes → New. Recommended indexes for our e-commerce model:

TableIndexColumnsWhy
customeridx_customer_emailemailLogin lookups
orderidx_order_customer_idcustomer_id"Show my orders"
orderidx_order_dateorder_date DESCRecent orders queries
order_lineidx_order_line_orderorder_id"Show this order's items"
order_lineidx_order_line_productproduct_id"Where did this product sell?"
productidx_product_skusku UNIQUESKU lookup is constant
productidx_product_categorycategory_id, activeBrowse by category

For each, set the Index Type (B-tree is default; PostgreSQL also has Hash, GIN, BRIN — use B-tree unless you have a specific reason).

6

Set defaults and check constraints

For each column where appropriate, set:

  • created_date DEFAULT CURRENT_TIMESTAMP
  • active on Product, DEFAULT TRUE
  • order_status CHECK order_status IN ('Pending','Shipped','Delivered','Cancelled')
  • quantity on Order Line, CHECK quantity > 0
  • unit_price on Order Line, CHECK unit_price >= 0

In Property Editor for each column → Default and Validation fields.

7

Configure surrogate-key generation

Your primary key columns need to auto-generate values. PostgreSQL uses GENERATED BY DEFAULT AS IDENTITY (or sequences):

For each PK column:

  • Open column properties
  • Set Identity = Yes
  • Or attach a sequence (Postgres way): right-click table → Sequences → New

SQL Server uses IDENTITY(1,1), MySQL uses AUTO_INCREMENT, Oracle uses sequences. ERWIN handles the dialect for you — you just say "Identity" and it generates the right syntax.

8

Set up tablespaces / file groups (optional)

For larger databases, you'll split tables across storage tiers:

Create tablespaces:

  • ts_data — for table data
  • ts_index — for indexes (often on faster storage)
  • ts_archive — for old partitions on cheap storage

Then assign each table/index to a tablespace in its properties. Skip this for development; do it for production-grade models.

9

Validate the physical model

The physical-mode validator catches:

  • Tables without indexes on foreign keys (FK lookups will be slow)
  • Tables without any indexes at all (full scans on every query)
  • Columns with type-precision mismatches
  • Naming convention violations
  • Missing tablespace assignments (if you use them)

Fix anything critical, save as ecommerce_physical.erwin.

Logical vs Physical — side by side

Logical (Customer entity)

Customer ├ PK Customer ID ├ First Name : String ├ Last Name : String ├ Email : String └ Created : Datetime

Physical (customer table)

customer ├ PK customer_id BIGINT IDENTITY ├ first_name VARCHAR(50) NOT NULL ├ last_name VARCHAR(50) NOT NULL ├ email VARCHAR(254) UNIQUE NOT NULL └ created_at TIMESTAMPTZ DEFAULT NOW()

What's next: generate the SQL

Your physical model now has everything needed to create a real database. In the next lesson, we'll use forward engineering to turn this model into a .sql file ready to run against PostgreSQL.

Talking point for interviews "I always design in three layers: Conceptual to align with business, Logical to capture the structure database-agnostically, then Physical to optimize for the specific database. Logical normalizes to 3NF; physical may denormalize selectively for read performance, especially in warehouse contexts."