The data engineering interview has changed significantly over the last two years. Cloud-native architectures, the rise of the analytics engineer, and new tools like dbt, Databricks, and Airflow have shifted what hiring managers actually test for.

This is a guide based on reviewing hundreds of real data engineering interview loops โ€” from FAANG to startups to Indian IT services companies. Here's what they actually ask in 2026.

Round 1: SQL โ€” still the gatekeeper

SQL remains non-negotiable. You will write queries in a shared editor, not just describe them. The topics that appear in over 80% of interviews:

Window functions (asked in almost every interview)

SQL โ€” the canonical interview question
-- Find the second-highest salary in each department
SELECT department, name, salary
FROM (
  SELECT department, name, salary,
         DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr
  FROM   employees
) ranked
WHERE  dr = 2;

Know: ROW_NUMBER vs RANK vs DENSE_RANK. Running totals with SUM OVER. LAG and LEAD. PARTITION BY vs no partition. Practice at least 5 window function problems before any interview.

Aggregation with HAVING

Know the difference between WHERE (filters before grouping) and HAVING (filters groups after aggregation). Know that aggregates like SUM() and COUNT() can't appear in WHERE.

CTEs and subqueries

Be able to write a multi-step analysis as a clean CTE chain. Interviewers use this to test whether you can break complex problems into readable steps.

Round 2: Data modeling

At enterprise companies (banks, telcos, healthcare, IT services), this is often the most heavily weighted round. Topics:

Round 3: Pipeline design

This is a whiteboard/discussion round where you design an end-to-end data pipeline for a given scenario. The format: "We get raw sales data from an API every hour. Design a pipeline to make it available in our warehouse for BI reporting."

What they're looking for:

THE REAL TEST They don't care which specific tools you name. They care that you think systematically about failure modes, data quality, and scalability. A candidate who says "I'd add checks to ensure the row count is within 10% of yesterday's" is more impressive than one who lists every AWS service.

Round 4: Cloud and infrastructure

Know one cloud platform well (AWS or GCP or Azure). Know:

The behavioural questions most candidates underprepare

These matter more than most candidates realize. Prepare a specific story for each:

A realistic 30-day prep plan

Weeks 1โ€“2: SQL. Do all 14 SQL practice problems in this site. Then do 20โ€“30 more on LeetCode/HackerRank. Focus on window functions and CTEs.

Weeks 2โ€“3: Data modeling. Read every lesson in the Data Modeling section. Build an actual model in ERWIN or dbdiagram.io. Write the DDL by hand too.

Week 4: Mock interviews. Get a friend or use an AI to interview you with the questions above. Saying your answers aloud matters โ€” it reveals gaps that reading doesn't.

THE MOST COMMON MISS Candidates prepare hard for SQL and data modeling but walk into the pipeline design round unprepared. Practice describing a pipeline end-to-end out loud at least 3 times before your interview.