SQL Server Memory Internals: Buffer Pool, Plan Cache and Memory Grants (2026)

SQL Server is a memory-hungry engine by design. Given enough RAM, it caches every frequently read data page and every reusable query plan — eliminating physical I/O entirely for warm workloads. Understanding how the buffer pool, plan cache, and workspace memory pools work lets you size memory correctly, diagnose pressure before it becomes a crisis, and tune the right knobs when something goes wrong.

Table of Contents

Memory Architecture Overview
The Buffer Pool: Data and Index Page Cache
Buffer Pool Internals: Lazywriter and Clock Algorithm
NUMA-Aware Memory
Plan Cache: Structure and Memory Clerks
Plan Cache Pressure and Single-Use Plans
Workspace Memory: Grants for Sort and Hash
RESOURCE_SEMAPHORE: Memory Grant Waits
max server memory: Sizing and Configuration
Diagnosing Memory Pressure
Key Memory Clerks and What They Mean
Read Next

Memory Architecture Overview

SQL Server manages its own memory within the limits set by max server memory. The total memory pool is divided between several consumers:

Total SQL Server Memory (bounded by max server memory)
├── Buffer Pool
│   ├── Data / index page cache (largest consumer)
│   ├── Free buffer list
│   └── Stolen memory (non-page-pool allocations below)
├── Plan Cache (stolen from buffer pool)
├── Workspace Memory (sort, hash, index build grants)
├── Connection memory (per-connection overhead)
├── Lock manager memory
├── CLR / Extended Stored Procedures
└── Other clerks (XML, full-text, etc.)

The key insight is that stolen memory (everything except data pages) comes out of the same pool as the buffer pool. When plan cache or workspace memory grows, fewer data pages can be cached. SQL Server's memory broker tries to balance these consumers dynamically.

The Buffer Pool: Data and Index Page Cache

The buffer pool is a hash-table-indexed cache of 8 KB data and index pages. Every read from disk goes through the buffer pool:

SQL Server checks the buffer pool hash table for the requested page (database_id + file_id + page_id)
Buffer hit: page found → return immediately (no I/O)
Buffer miss: allocate a free buffer frame, read the page from disk into it, update the hash table, return to the caller (PAGEIOLATCH wait)

The buffer hit ratio is the most important single memory metric:

SELECT
    cntr_value AS buffer_cache_hit_ratio
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Buffer cache hit ratio'
  AND object_name LIKE '%Buffer Manager%';

A healthy OLTP system should have a hit ratio above 99%. Values below 95% indicate that the working set does not fit in the buffer pool — add RAM or reduce the data footprint.

-- Buffer pool usage by database
SELECT
    DB_NAME(database_id)            AS db_name,
    COUNT(*) * 8 / 1024             AS cached_mb,
    SUM(CAST(is_modified AS INT)) * 8 / 1024 AS dirty_mb
FROM sys.dm_os_buffer_descriptors
WHERE database_id > 4  -- exclude system databases
GROUP BY database_id
ORDER BY cached_mb DESC;

Buffer Pool Internals: Lazywriter and Clock Algorithm

When the buffer pool is full and a new page needs to be loaded, SQL Server must evict an existing page. It uses a clock sweep (second-chance LRU) algorithm:

Each buffer frame has a reference count that increments each time the page is accessed
The lazywriter background thread sweeps through buffer frames, decrementing reference counts
Frames that reach zero reference count become candidates for eviction
Dirty frames (modified pages not yet written to disk) must be written to disk before eviction
The lazywriter also writes dirty pages proactively to keep a pool of clean, evictable frames ready

When the lazywriter cannot keep up with eviction demand (too many dirty pages to flush), SQL Server may start issuing synchronous I/O to free frames — causing latency spikes. This shows up as elevated PAGEIOLATCH_EX and increased checkpoint pages/sec.

-- Buffer pool pressure indicators
SELECT
    cntr_value AS value,
    counter_name
FROM sys.dm_os_performance_counters
WHERE counter_name IN (
    'Lazy writes/sec',
    'Checkpoint pages/sec',
    'Page life expectancy',
    'Free pages'
)
AND object_name LIKE '%Buffer Manager%';

Page Life Expectancy (PLE): The average time (in seconds) a page stays in the buffer pool without being referenced. A healthy OLTP system typically has PLE > 300 seconds. A PLE under 60 seconds with high I/O activity is a strong signal that the buffer pool is too small for the working data set.

NUMA-Aware Memory

Modern multi-socket servers use Non-Uniform Memory Access (NUMA) architecture — each CPU socket has a local memory bank that it accesses faster than remote memory on another socket. SQL Server is fully NUMA-aware:

The buffer pool is partitioned into NUMA nodes — each node has its own free list and its own lazywriter thread
Worker threads are assigned to schedulers on the same NUMA node as the memory they access where possible
Large memory grants are allocated from the NUMA node local to the executing thread

-- NUMA node memory breakdown
SELECT
    memory_node_id,
    virtual_address_space_reserved_kb / 1024  AS reserved_mb,
    virtual_address_space_committed_kb / 1024 AS committed_mb,
    locked_page_allocations_kb / 1024         AS locked_pages_mb,
    pages_kb / 1024                           AS pages_mb
FROM sys.dm_os_memory_nodes
ORDER BY memory_node_id;

Soft NUMA: On systems with many cores per socket, SQL Server can create soft NUMA nodes — logical partitions within a single physical NUMA node — to improve scheduler locality and reduce spinlock contention on single-socket high-core-count systems.

Plan Cache: Structure and Memory Clerks

Compiled query plans are cached in the plan cache so subsequent executions can skip the compilation step. The plan cache is a set of hash buckets within the buffer pool's stolen memory region.

Each cached plan has a usecounts (reference count) and an objtype:

-- Top 20 plans by memory usage
SELECT TOP 20
    cp.usecounts,
    cp.size_in_bytes / 1024     AS size_kb,
    cp.objtype,
    cp.cacheobjtype,
    SUBSTRING(qt.text, 1, 200)  AS sql_text
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) qt
ORDER BY cp.size_in_bytes DESC;

-- Plan cache summary by object type
SELECT
    objtype,
    COUNT(*)                    AS plan_count,
    SUM(size_in_bytes)/1048576  AS total_mb,
    AVG(usecounts)              AS avg_usecounts
FROM sys.dm_exec_cached_plans
GROUP BY objtype
ORDER BY total_mb DESC;

Object types:

Proc — stored procedure plans
Prepared — parameterized queries via sp_executesql
Adhoc — literal SQL strings (may be single-use plans)
View, Trigger, UsrTab — other object types

Plan Cache Pressure and Single-Use Plans

On systems running ad-hoc SQL with literal values (not parameterized), the plan cache fills with thousands of single-use plans — compiled once, never reused. This wastes memory and CPU on compilation.

-- Find single-use plans consuming memory
SELECT
    SUM(CASE WHEN usecounts = 1 THEN size_in_bytes ELSE 0 END) / 1048576 AS single_use_mb,
    SUM(size_in_bytes) / 1048576                                          AS total_plan_cache_mb,
    100.0 * SUM(CASE WHEN usecounts = 1 THEN size_in_bytes ELSE 0 END)
          / NULLIF(SUM(size_in_bytes), 0)                                 AS single_use_pct
FROM sys.dm_exec_cached_plans
WHERE objtype = 'Adhoc';

If single-use plans are >30% of the plan cache, enable optimize for ad hoc workloads:

-- Store only a plan stub on first execution; full plan only if query runs again
EXEC sp_configure 'optimize for ad hoc workloads', 1;
RECONFIGURE;

This reduces plan cache memory by 50–80% on ad-hoc-heavy systems with no performance downside for queries that run more than once.

Forced parameterization: As a stronger measure, ALTER DATABASE YourDB SET PARAMETERIZATION FORCED tells SQL Server to auto-parameterize literal values in ad-hoc queries, reusing plans across different literal values. Use carefully — it can cause parameter sniffing issues for queries with skewed distributions.

Workspace Memory: Grants for Sort and Hash

Queries with sort (ORDER BY, GROUP BY) or hash (hash join, hash aggregate) operations request a memory grant before executing. The grant comes from the workspace memory pool — a separate region from the buffer pool, bounded by max server memory × 75% total, with individual query grants capped.

The query optimizer estimates the required grant based on cardinality estimates. If the estimate is accurate, the query runs in-memory. If the actual data is larger than estimated, the query spills to TempDB.

-- Active memory grants
SELECT
    session_id,
    requested_memory_kb / 1024   AS requested_mb,
    granted_memory_kb / 1024     AS granted_mb,
    used_memory_kb / 1024        AS used_mb,
    max_used_memory_kb / 1024    AS max_used_mb,
    queue_id,
    wait_order,
    is_next_candidate,
    grant_time
FROM sys.dm_exec_query_memory_grants
ORDER BY granted_memory_kb DESC;

If max_used_memory_kb is much less than granted_memory_kb, the query received an over-estimated grant — memory is being wasted while other queries queue for grants.

RESOURCE_SEMAPHORE: Memory Grant Waits

When the workspace memory pool is exhausted, new queries requiring memory grants queue behind a resource semaphore. The wait type RESOURCE_SEMAPHORE accumulates on these queued sessions.

-- Check resource semaphore state
SELECT
    resource_semaphore_id,
    target_memory_kb / 1024     AS target_mb,
    max_target_memory_kb / 1024 AS max_target_mb,
    total_memory_kb / 1024      AS total_mb,
    available_memory_kb / 1024  AS available_mb,
    granted_memory_kb / 1024    AS granted_mb,
    used_memory_kb / 1024       AS used_mb,
    grantee_count,
    waiter_count,
    timeout_error_count
FROM sys.dm_exec_query_resource_semaphores;

If waiter_count > 0 persistently, you have memory grant contention. Root causes:

Too many concurrent queries with large grants (reduce MAXDOP to reduce per-query parallelism)
Bad cardinality estimates causing over-sized grants (update statistics)
Insufficient max server memory leaving too little for workspace
In-memory OLTP or other large non-buffer-pool allocations squeezing workspace

max server memory: Sizing and Configuration

The default max server memory is 2,147,483,647 MB (unlimited) — SQL Server will consume all available RAM, leaving nothing for the OS. Always set this explicitly.

Recommended sizing formula:

max server memory = Total RAM
                  - OS reservation (4–8 GB for Windows)
                  - 1 GB per 4 GB RAM above 16 GB (OS paging overhead)
                  - Other SQL Server instances (if multiple)
                  - SSIS, SSRS, SSAS if co-located

Practical examples:

Total RAM	Recommended max server memory
16 GB	12 GB
32 GB	26 GB
64 GB	54 GB
128 GB	110 GB
256 GB	220 GB

-- Set max server memory
EXEC sp_configure 'show advanced options', 1; RECONFIGURE;
EXEC sp_configure 'max server memory (MB)', 54000;  -- 54 GB for a 64 GB server
RECONFIGURE;

-- Verify
SELECT name, value_in_use FROM sys.configurations WHERE name = 'max server memory (MB)';

min server memory: Defaults to 0 — SQL Server can release memory to the OS under memory pressure. Raising min server memory prevents buffer pool shrinkage during brief OS pressure (e.g., a backup agent spike). Set it to 25–50% of max server memory on dedicated database servers.

Diagnosing Memory Pressure

-- Overall memory state
SELECT
    physical_memory_in_use_kb / 1024   AS sql_using_mb,
    locked_page_allocations_kb / 1024  AS locked_pages_mb,
    page_fault_count,
    memory_utilization_percentage
FROM sys.dm_os_process_memory;

-- Memory grants and pending grants (workspace pressure)
SELECT waiter_count, grantee_count, available_memory_kb / 1024 AS avail_mb
FROM sys.dm_exec_query_resource_semaphores;

-- Page life expectancy trend
SELECT
    cntr_value AS ple_seconds,
    GETDATE()  AS captured_at
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Page life expectancy'
  AND object_name LIKE '%Buffer Manager%';

-- Memory clerk breakdown
SELECT TOP 15
    type,
    SUM(pages_kb) / 1024  AS pages_mb,
    SUM(virtual_memory_committed_kb) / 1024 AS virtual_mb
FROM sys.dm_os_memory_clerks
GROUP BY type
ORDER BY pages_mb DESC;

sys.dm_os_memory_clerks breaks down memory usage by purpose. Key clerks to watch: MEMORYCLERK_SQLBUFFERPOOL (buffer pool — should be largest), CACHESTORE_SQLCP (SQL plans), CACHESTORE_OBJCP (object plans), MEMORYCLERK_SQLQUERYPLAN (workspace grants).

Key Memory Clerks and What They Mean

Clerk Type	What It Holds	Concern Threshold
MEMORYCLERK_SQLBUFFERPOOL	Data and index pages	Should be 70–85% of total; lower = not enough RAM
CACHESTORE_SQLCP	Ad-hoc SQL plans	>500 MB suggests single-use plan bloat
CACHESTORE_OBJCP	Stored procedure / trigger plans	>1 GB may indicate large plan counts
MEMORYCLERK_SQLQUERYPLAN	Workspace memory grants	Spikes during large sort/hash operations
MEMORYCLERK_SQLSTORENG	Storage engine per-thread structures	High with many concurrent connections
MEMORYCLERK_XTP	In-memory OLTP (Hekaton) tables	Grows with in-memory table data; monitor growth
MEMORYCLERK_SOSNODE	SQLOS node-level structures	Fixed overhead per NUMA node