SQL Server T-SQL SELECT Internals and Logical Query Processing (2026)
Every SQL Server developer has written a query that looked perfectly reasonable and still failed with a cryptic error — or worse, silently returned wrong results. Nine times out of ten, the culprit is a mismatch between how you wrote the query and how SQL Server evaluates it. Understanding the logical query processing order is the single most powerful mental model you can carry as a T-SQL developer.
This article walks you through every step SQL Server takes to evaluate a SELECT statement, explains why column aliases aren't visible in WHERE clauses, demystifies NULL's three-valued logic, and shows you how to rewrite common broken queries once you see the engine's perspective.
- Written Order vs Processing Order
- The 8-Step Logical Processing Order
- Step 1–2: FROM and JOIN Evaluation
- Step 3: WHERE Clause — Why Aliases Fail
- Step 4: GROUP BY
- Step 5: HAVING — Filtering Groups
- Step 6: SELECT — Expressions, DISTINCT, TOP
- Step 7: ORDER BY — The Alias Exception
- NULL Handling and Comparisons
- 3-Valued Logic: TRUE / FALSE / UNKNOWN
- Common Mistakes and How to Fix Them
- Subqueries and Correlated Subqueries
- Practical Rewrites: 5 Broken Queries Fixed
- Read Next
Written Order vs Processing Order
When you write a SELECT statement you naturally read it top to bottom: SELECT the columns, FROM the table, WHERE the condition, GROUP BY the groups, HAVING the group filter, ORDER BY the sort. That's the syntax order — the order the SQL parser needs so it can unambiguously read your intent.
The logical processing order is completely different. It defines the sequence in which each clause conceptually produces its virtual result table, which becomes the input for the next clause. The two orders are:
| Syntax (Written) Order | Logical Processing Order |
|---|---|
| 1. SELECT | 1. FROM |
| 2. FROM | 2. ON (join predicate) |
| 3. JOIN … ON | 3. JOIN (outer row addition) |
| 4. WHERE | 4. WHERE |
| 5. GROUP BY | 5. GROUP BY |
| 6. HAVING | 6. HAVING |
| 7. ORDER BY | 7. SELECT |
| 8. ORDER BY |
The 8-Step Logical Processing Order
Each step takes a virtual table as input and produces a virtual table as output. Only ORDER BY and TOP/FETCH NEXT actually produce a cursor rather than an unordered set. Here is a concise map of all eight steps:
- FROM — identifies source tables and applies cross joins, producing the cartesian product.
- ON — applies the join predicate, filtering rows from the cartesian product.
- JOIN — for outer joins, adds back the preserved (outer) rows with NULLs for unmatched columns.
- WHERE — filters rows based on a row-level predicate. Aggregates are not yet available.
- GROUP BY — collapses rows into groups; from this point only grouped columns and aggregate expressions are valid.
- HAVING — filters groups using aggregate predicates.
- SELECT — evaluates expressions, applies DISTINCT, computes window functions, assigns column aliases, applies TOP.
- ORDER BY — sorts the final result set. The only step that can reference SELECT aliases.
Logical order is a conceptual model. SQL Server's optimizer is free to rearrange physical operations for performance — but the result must be equivalent to what logical order would produce. Think of logical order as a correctness contract, not an execution plan.
Step 1–2: FROM and JOIN Evaluation
The FROM clause is where everything begins. SQL Server first identifies the set of rows that are candidates for the query. For a single table, the virtual table VT1 simply contains every row. For multiple tables, the engine starts with a cartesian product (cross join) of all the sources, producing VT1.
-- INNER JOIN: only rows where the ON predicate is TRUE survive step 2
SELECT o.OrderID, c.CustomerName
FROM Orders AS o
INNER JOIN Customers AS c ON o.CustomerID = c.CustomerID;
The ON predicate in step 2 filters VT1. Rows where o.CustomerID = c.CustomerID evaluates to TRUE move forward; all others are discarded.
-- LEFT OUTER JOIN: step 3 adds back unmatched left rows with NULL rights
SELECT c.CustomerName, o.OrderID
FROM Customers AS c
LEFT JOIN Orders AS o ON c.CustomerID = o.CustomerID;
-- Customers with no orders appear with NULL OrderID
For LEFT JOIN, step 3 puts back every row from the left table (Customers) that was filtered out in step 2, padding the right-side columns with NULL. This is why OUTER JOINs can feel like they "ignore" the ON predicate — they never truly discard the outer rows.
-- These two queries return DIFFERENT results:
-- Condition in ON: preserves all customers (outer join intent kept)
SELECT c.CustomerName, o.OrderID
FROM Customers AS c
LEFT JOIN Orders AS o
ON c.CustomerID = o.CustomerID
AND o.OrderDate >= '2026-01-01';
-- Condition in WHERE: only customers with 2026 orders (acts like INNER JOIN)
SELECT c.CustomerName, o.OrderID
FROM Customers AS c
LEFT JOIN Orders AS o ON c.CustomerID = o.CustomerID
WHERE o.OrderDate >= '2026-01-01';
Step 3: WHERE Clause — Why You Can't Use SELECT Aliases
After JOIN processing, the WHERE clause filters individual rows. Each row is evaluated against the predicate independently. If the predicate returns TRUE the row is kept; FALSE or UNKNOWN and it is discarded.
Here is the classic gotcha — trying to filter on a column alias defined in SELECT:
-- THIS FAILS: 'full_name' alias does not exist at WHERE evaluation time
SELECT first_name + ' ' + last_name AS full_name,
department
FROM Employees
WHERE full_name LIKE 'John%'; -- Error: Invalid column name 'full_name'
The fix is to either repeat the expression in the WHERE clause, or wrap the query in a subquery/CTE so the alias is materialised before filtering:
-- FIX 1: Repeat the expression in WHERE
SELECT first_name + ' ' + last_name AS full_name,
department
FROM Employees
WHERE first_name + ' ' + last_name LIKE 'John%';
-- FIX 2: Use a CTE (alias is visible after the CTE is evaluated)
WITH cte AS (
SELECT first_name + ' ' + last_name AS full_name,
department
FROM Employees
)
SELECT full_name, department
FROM cte
WHERE full_name LIKE 'John%';
full_name is a real column of the derived table — SELECT has already run inside the CTE.
Step 4: GROUP BY
GROUP BY collapses the filtered row set into groups. After this step, each group is represented by exactly one row in the virtual table. A critical rule follows from this: every column in the SELECT list must either appear in the GROUP BY clause or be wrapped in an aggregate function (SUM, COUNT, AVG, MAX, MIN, etc.).
-- Error: 'last_name' is neither grouped nor aggregated
SELECT department, last_name, COUNT(*) AS headcount
FROM Employees
GROUP BY department;
-- Msg 8120: Column 'Employees.last_name' is invalid in the select list
-- because it is not contained in either an aggregate function or the GROUP BY clause.
-- Correct: all non-aggregate columns are in GROUP BY
SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM Employees
GROUP BY department;
Note that GROUP BY treats NULL as a single group key — all rows where the grouping column is NULL collapse into one group, even though NULL = NULL is UNKNOWN. This is a deliberate exception in the grouping rules.
-- NULLs in the group-by column form their own group
SELECT manager_id, COUNT(*) AS reports
FROM Employees
GROUP BY manager_id;
-- manager_id = NULL (top-level employees) appears as one group
Step 5: HAVING — Filtering Groups
HAVING is to GROUP BY what WHERE is to FROM. It filters the groups produced by step 4. The key difference from WHERE is that HAVING can use aggregate functions because it runs after grouping.
-- Keep only departments with more than 5 employees earning over 60k on average
SELECT department,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary
FROM Employees
WHERE salary > 30000 -- row-level filter BEFORE grouping
GROUP BY department
HAVING COUNT(*) > 5 -- group-level filter AFTER grouping
AND AVG(salary) > 60000;
HAVING vs WHERE: Performance Matters
WHERE filters rows before grouping — fewer rows enter the GROUP BY engine. HAVING filters after grouping — all rows are grouped first, then some groups are discarded. Always push filters that do not require aggregate results into WHERE for better performance.
-- Inefficient: filters after grouping a potentially huge set
SELECT department, COUNT(*) AS headcount
FROM Employees
GROUP BY department
HAVING department = 'Engineering'; -- should be WHERE
-- Efficient: row-level filter eliminates rows before grouping
SELECT department, COUNT(*) AS headcount
FROM Employees
WHERE department = 'Engineering'
GROUP BY department;
Step 6: SELECT — Expressions, DISTINCT, TOP
The SELECT clause runs sixth in logical order — long after WHERE, GROUP BY, and HAVING have already done their work. In this step the engine:
- Evaluates scalar expressions and computed columns.
- Evaluates window functions (OVER clause).
- Assigns column aliases (these become the output column names).
- Applies DISTINCT to remove duplicate rows.
- Applies TOP (together with ORDER BY if present).
-- Window function computed in SELECT step (step 6), not WHERE (step 3)
-- You cannot filter on a window function result in the same query's WHERE
SELECT employee_id,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM Employees
WHERE salary_rank <= 3; -- ERROR: Invalid column name 'salary_rank'
-- Fix: wrap in a subquery or CTE so salary_rank exists before filtering
WITH ranked AS (
SELECT employee_id,
salary,
department,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM Employees
)
SELECT employee_id, salary, department, salary_rank
FROM ranked
WHERE salary_rank <= 3;
DISTINCT is applied after all expressions are evaluated. This means SELECT DISTINCT first_name, last_name looks for duplicate combinations of first and last name, not just duplicates of first_name alone.
TOP without ORDER BY returns an arbitrary set of rows — the engine picks whatever is cheapest to retrieve. Always pair TOP with ORDER BY when the specific rows matter.
Step 7: ORDER BY — The Alias Exception
ORDER BY is the last logical step and the only one that can reference column aliases defined in SELECT. This is the one deliberate exception SQL Server makes to the "aliases not visible before SELECT" rule.
-- ORDER BY can use the SELECT alias
SELECT first_name + ' ' + last_name AS full_name,
hire_date,
salary * 12 AS annual_salary
FROM Employees
ORDER BY annual_salary DESC, -- alias works here
full_name ASC; -- alias works here
Why does ORDER BY get this privilege? Because it is defined to run after SELECT in the logical model. By the time ORDER BY executes, the SELECT-produced virtual table with named columns already exists.
You can also ORDER BY column position (ordinal), though this is generally discouraged because it creates fragile queries:
-- Column ordinal: 1 = full_name, 2 = hire_date, 3 = annual_salary
SELECT first_name + ' ' + last_name AS full_name,
hire_date,
salary * 12 AS annual_salary
FROM Employees
ORDER BY 3 DESC, 1 ASC; -- works but fragile; prefer column names/aliases
NULL Handling and Comparisons
NULL represents an unknown or missing value. It is not zero, not an empty string, not false. SQL Server follows the ANSI SQL standard for NULL semantics, which produces results that surprise developers coming from procedural languages.
NULL Arithmetic and Concatenation
-- Any arithmetic with NULL produces NULL
SELECT 10 + NULL; -- NULL
SELECT NULL * 5; -- NULL
SELECT 'Hello' + NULL; -- NULL (string concatenation)
-- ISNULL and COALESCE replace NULL with a default
SELECT ISNULL(bonus, 0) AS safe_bonus FROM Employees;
SELECT COALESCE(phone1, phone2, 'No phone') AS contact FROM Employees;
Comparing NULL: IS NULL vs = NULL
-- WRONG: = NULL always evaluates to UNKNOWN, never TRUE
SELECT * FROM Employees WHERE manager_id = NULL; -- returns 0 rows
SELECT * FROM Employees WHERE manager_id != NULL; -- returns 0 rows
-- CORRECT: use IS NULL / IS NOT NULL
SELECT * FROM Employees WHERE manager_id IS NULL;
SELECT * FROM Employees WHERE manager_id IS NOT NULL;
The reason = NULL never works is that NULL means "unknown value." If you ask whether an unknown value equals another unknown value, the answer is — unknown. SQL Server evaluates that to UNKNOWN, which causes the row to be filtered out by WHERE.
SET ANSI_NULLS OFF, SQL Server allows = NULL to match NULL values. This is a deprecated behavior. All new code should assume ANSI_NULLS ON (the default) and use IS NULL.
3-Valued Logic: TRUE / FALSE / UNKNOWN
Standard two-valued (Boolean) logic has TRUE and FALSE. SQL's predicate logic adds a third value: UNKNOWN. Any comparison involving NULL evaluates to UNKNOWN. This has far-reaching consequences for WHERE and HAVING, which only pass rows or groups where the predicate is TRUE — not FALSE, and crucially not UNKNOWN.
-- Demonstrate UNKNOWN behavior
-- Assume employee_id 7 has salary = NULL
SELECT employee_id FROM Employees WHERE salary > 50000;
-- employee_id 7: NULL > 50000 = UNKNOWN → row excluded
SELECT employee_id FROM Employees WHERE salary <= 50000;
-- employee_id 7: NULL <= 50000 = UNKNOWN → row excluded
SELECT employee_id FROM Employees WHERE NOT (salary > 50000);
-- employee_id 7: NOT UNKNOWN = UNKNOWN → row excluded
-- The same row is excluded from BOTH the positive and the negated query!
This last example is the hardest to internalize: a row with NULL salary is excluded from both salary > 50000 AND NOT (salary > 50000). To catch NULL-salary rows you must add an explicit IS NULL check.
AND / OR Truth Tables with UNKNOWN
| A | B | A AND B | A OR B |
|---|---|---|---|
| TRUE | UNKNOWN | UNKNOWN | TRUE |
| FALSE | UNKNOWN | FALSE | UNKNOWN |
| UNKNOWN | UNKNOWN | UNKNOWN | UNKNOWN |
-- UNKNOWN in IN list: NULL in a NOT IN list blocks all results
-- Assume bonus column can be NULL
-- Safe: no NULLs in the list
SELECT * FROM Employees WHERE department NOT IN ('HR', 'Legal');
-- Dangerous: if any subquery row returns NULL, NOT IN returns 0 rows
SELECT * FROM Employees
WHERE department NOT IN (SELECT department FROM Contractors);
-- If Contractors has a row where department IS NULL, the entire NOT IN
-- evaluates to UNKNOWN for every employee → 0 rows returned
-- Fix: use NOT EXISTS or filter NULLs from the subquery
SELECT * FROM Employees e
WHERE NOT EXISTS (
SELECT 1 FROM Contractors c
WHERE c.department = e.department
);
Common Mistakes and How to Fix Them
Mistake 1: WHERE on a SELECT alias
Already covered above — repeat the expression or use a CTE/subquery.
Mistake 2: Non-aggregated column in SELECT after GROUP BY
-- Error
SELECT department, last_name, COUNT(*) AS cnt
FROM Employees GROUP BY department;
-- Fix: add last_name to GROUP BY (if it makes sense), or aggregate it
SELECT department, MAX(last_name) AS sample_name, COUNT(*) AS cnt
FROM Employees GROUP BY department;
Mistake 3: Aggregate function in WHERE
-- Error: aggregates cannot appear in WHERE
SELECT department, COUNT(*) AS cnt
FROM Employees
WHERE COUNT(*) > 10
GROUP BY department;
-- Fix: use HAVING
SELECT department, COUNT(*) AS cnt
FROM Employees
GROUP BY department
HAVING COUNT(*) > 10;
Mistake 4: Forgetting NULL in NOT IN
-- Risky: breaks silently if subquery can return NULL
SELECT * FROM Orders
WHERE customer_id NOT IN (SELECT customer_id FROM Blacklist);
-- Safe: explicitly exclude NULLs from the subquery
SELECT * FROM Orders
WHERE customer_id NOT IN (
SELECT customer_id FROM Blacklist WHERE customer_id IS NOT NULL
);
Mistake 5: OUTER JOIN filter in WHERE instead of ON
-- Accidentally converts LEFT JOIN to INNER JOIN
SELECT c.name, o.order_total
FROM Customers c
LEFT JOIN Orders o ON c.id = o.customer_id
WHERE o.order_total > 1000; -- NULL order_total (no orders) → excluded
-- Fix: move filter to the ON clause to preserve non-buying customers
SELECT c.name, o.order_total
FROM Customers c
LEFT JOIN Orders o ON c.id = o.customer_id AND o.order_total > 1000;
Subqueries and Correlated Subqueries
Subqueries are queries nested inside another query. They are evaluated at different points depending on their type and position.
Non-correlated (Independent) Subqueries
A non-correlated subquery does not reference the outer query. SQL Server can evaluate it once and cache the result.
-- Subquery evaluated once; result reused for each outer row comparison
SELECT employee_id, salary
FROM Employees
WHERE salary > (SELECT AVG(salary) FROM Employees);
Correlated Subqueries
A correlated subquery references columns from the outer query. Logically it is evaluated once for every row produced by the outer query's FROM clause — though the optimizer may convert it to a more efficient join internally.
-- For each employee, check whether they have placed any orders
SELECT e.employee_id, e.first_name
FROM Employees e
WHERE EXISTS (
SELECT 1
FROM Orders o
WHERE o.sales_rep_id = e.employee_id -- references outer 'e'
AND o.order_date >= '2026-01-01'
);
Correlated subqueries in the SELECT clause are evaluated during step 6 — once per output row, referencing the current row's values:
-- Correlated subquery in SELECT: computes manager name for each employee
SELECT e.employee_id,
e.first_name,
(SELECT m.first_name + ' ' + m.last_name
FROM Employees m
WHERE m.employee_id = e.manager_id) AS manager_name
FROM Employees e;
Practical Rewrites: 5 Broken Queries Fixed
Query 1: Filter on a computed alias
-- Broken
SELECT order_id,
unit_price * quantity AS line_total
FROM OrderItems
WHERE line_total > 500;
-- Fixed with CTE
WITH items AS (
SELECT order_id, unit_price * quantity AS line_total
FROM OrderItems
)
SELECT order_id, line_total FROM items WHERE line_total > 500;
Query 2: Top-N per group using window function
-- Broken: WHERE references window function alias
SELECT department,
employee_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM Employees
WHERE rn = 1;
-- Fixed
WITH ranked AS (
SELECT department,
employee_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM Employees
)
SELECT department, employee_id, salary
FROM ranked
WHERE rn = 1;
Query 3: Aggregate filter in WHERE
-- Broken
SELECT product_id, SUM(quantity) AS total_sold
FROM OrderItems
WHERE SUM(quantity) > 100
GROUP BY product_id;
-- Fixed
SELECT product_id, SUM(quantity) AS total_sold
FROM OrderItems
GROUP BY product_id
HAVING SUM(quantity) > 100;
Query 4: NOT IN with nullable subquery
-- Broken (silently returns 0 rows when subquery has NULLs)
SELECT customer_id FROM Orders
WHERE customer_id NOT IN (SELECT customer_id FROM CancelledOrders);
-- Fixed with NOT EXISTS
SELECT o.customer_id FROM Orders o
WHERE NOT EXISTS (
SELECT 1 FROM CancelledOrders co
WHERE co.customer_id = o.customer_id
);
Query 5: LEFT JOIN converted to INNER JOIN by WHERE
-- Broken: excludes customers with no recent orders (defeats LEFT JOIN)
SELECT c.customer_id, c.name, o.order_date
FROM Customers c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2026-01-01';
-- Fixed: move date filter into the ON clause
SELECT c.customer_id, c.name, o.order_date
FROM Customers c
LEFT JOIN Orders o
ON c.customer_id = o.customer_id
AND o.order_date >= '2026-01-01';
-- Customers with no 2026 orders still appear, with NULL order_date
Each of these fixes applies the same principle: identify which logical processing step needs to see the value, and place the expression in a clause that runs at or after that step.
Read Next
SQL Server JOINs Deep Dive
Master INNER, LEFT, RIGHT, FULL, CROSS, and SELF JOINs with execution plan insights and performance tips.
Read ArticleCTEs and Recursive Queries
Learn Common Table Expressions, recursive CTEs for hierarchical data, and when to use CTEs vs subqueries.
Read Article