The data engineering interview has changed significantly over the last two years. Cloud-native architectures, the rise of the analytics engineer, and new tools like dbt, Databricks, and Airflow have shifted what hiring managers actually test for.
This is a guide based on reviewing hundreds of real data engineering interview loops โ from FAANG to startups to Indian IT services companies. Here's what they actually ask in 2026.
Round 1: SQL โ still the gatekeeper
SQL remains non-negotiable. You will write queries in a shared editor, not just describe them. The topics that appear in over 80% of interviews:
Window functions (asked in almost every interview)
-- Find the second-highest salary in each department
SELECT department, name, salary
FROM (
SELECT department, name, salary,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr
FROM employees
) ranked
WHERE dr = 2;
Know: ROW_NUMBER vs RANK vs DENSE_RANK. Running totals with SUM OVER. LAG and LEAD. PARTITION BY vs no partition. Practice at least 5 window function problems before any interview.
Aggregation with HAVING
Know the difference between WHERE (filters before grouping) and HAVING (filters groups after aggregation). Know that aggregates like SUM() and COUNT() can't appear in WHERE.
CTEs and subqueries
Be able to write a multi-step analysis as a clean CTE chain. Interviewers use this to test whether you can break complex problems into readable steps.
Round 2: Data modeling
At enterprise companies (banks, telcos, healthcare, IT services), this is often the most heavily weighted round. Topics:
- CDM vs LDM vs PDM โ know the difference cold. See the CDM vs LDM vs PDM article.
- Star vs Snowflake schema โ when to use each, trade-offs, your default choice
- Slowly Changing Dimensions โ Type 1 (overwrite), Type 2 (versioned rows), Type 3 (previous column)
- Surrogate keys โ why they're always better than natural keys as PKs
- Normalization โ 1NF, 2NF, 3NF โ know how to identify violations
- ERWIN / data modeling tools โ if it's on the JD, you'll be asked to sketch an ER diagram live
Round 3: Pipeline design
This is a whiteboard/discussion round where you design an end-to-end data pipeline for a given scenario. The format: "We get raw sales data from an API every hour. Design a pipeline to make it available in our warehouse for BI reporting."
What they're looking for:
- Ingestion โ how you pull from the source (API, Kafka, file drop), how you handle pagination, rate limits, schema changes
- Storage โ raw โ staging โ curated layer (bronze/silver/gold in medallion architecture)
- Transformation โ batch (dbt, Spark) vs streaming (Flink, Spark Streaming)
- Orchestration โ Airflow DAG design, dependency management, failure handling
- Data quality โ how you validate data at each stage, what happens when validation fails
- Monitoring โ SLAs, alerting, lineage
Round 4: Cloud and infrastructure
Know one cloud platform well (AWS or GCP or Azure). Know:
- Object storage (S3/GCS/ADLS) as the foundation of modern lakes
- Your warehouse's architecture (Snowflake/BigQuery/Redshift) at least at the conceptual level
- IAM/permissions basics โ how you'd grant read access to an analyst without exposing PII
- What containers/Docker are and why they matter for reproducible pipeline environments
- Git branching strategy for data engineering code
The behavioural questions most candidates underprepare
These matter more than most candidates realize. Prepare a specific story for each:
- "Tell me about a data quality issue you caught (or didn't catch) in production."
- "Describe a time you had to redesign a data model after requirements changed."
- "How do you handle it when a business stakeholder wants something that isn't technically feasible?"
- "Walk me through how you prioritize work when you have three urgent requests at once."
- "Tell me about the most complex SQL query you've ever written โ why was it complex and how did you solve it?"
A realistic 30-day prep plan
Weeks 1โ2: SQL. Do all 14 SQL practice problems in this site. Then do 20โ30 more on LeetCode/HackerRank. Focus on window functions and CTEs.
Weeks 2โ3: Data modeling. Read every lesson in the Data Modeling section. Build an actual model in ERWIN or dbdiagram.io. Write the DDL by hand too.
Week 4: Mock interviews. Get a friend or use an AI to interview you with the questions above. Saying your answers aloud matters โ it reveals gaps that reading doesn't.