SQL Server Always On Availability Groups — Setup, Configuration, Monitoring, and Failover


June 2026  |  20 min read  |  SQL Server, High Availability, Always On, HADR  |  views

SQL Server Always On Availability Groups

The definitive hands-on guide to SQL Server's premier high-availability feature — from architecture to T-SQL setup to day-2 operations


Always On Availability Groups (AG) is SQL Server's flagship high-availability and disaster-recovery solution. It protects one or more databases by maintaining synchronized copies on up to eight secondary replicas, provides sub-minute automatic failover within a data centre, and can simultaneously offload heavy read workloads to secondary servers. This guide walks you through everything: the architecture, prerequisites, complete T-SQL setup from scratch, the AG listener, readable secondaries, health monitoring DMVs, failover procedures, troubleshooting common failures, and day-2 maintenance practices including rolling-upgrade patching.


 1. What Is an Availability Group?

An Availability Group is a logical container for a set of user databases (called availability databases) that fail over together. Every AG has exactly one primary replica — the SQL Server instance that accepts read-write connections — and one or more secondary replicas that keep synchronized copies of every database in the AG.

SQL Server ships the transaction log records generated on the primary to each secondary replica over a dedicated TCP database mirroring endpoint. The secondary applies those log records asynchronously (redo) to its local copy of the database, keeping it in sync.

Primary and Secondary Replicas

  • Primary replica — accepts all DML and DDL. Sends log stream to all secondaries.
  • Secondary replica — read-only (if configured as readable secondary). Applies the redo log continuously. Can be promoted to primary on failover.

Synchronous vs Asynchronous Commit

Each secondary can be configured with one of two commit modes:

Mode How it works Data loss on failover Automatic failover? Best for
Synchronous commit Primary waits for secondary to harden log before committing Zero (RPO = 0) Yes (if health conditions met) Same datacenter, sub-5 ms latency
Asynchronous commit Primary does not wait — secondary catches up when it can Possible (RPO > 0) No — manual only DR replicas across WAN / cloud
Rule of thumb: Use synchronous commit for up to three replicas in the same or nearby datacenter. Use asynchronous commit for geographically distant DR replicas where network latency would stall every transaction.

Automatic vs Manual Failover

Automatic failover occurs without human intervention when the primary replica becomes unavailable and the designated synchronous secondary detects the failure via the Windows Server Failover Cluster (WSFC) health detection mechanism. A maximum of two automatic-failover partners are allowed.

Manual failover is administrator-initiated (planned or unplanned) and can target any secondary regardless of commit mode.

Readable Secondaries

By default secondaries do not accept user connections. When you configure ALLOW_CONNECTIONS = READ_ONLY on a replica, applications that include ApplicationIntent=ReadOnly in their connection string are automatically routed to that secondary by the AG listener. This lets you offload heavy reporting queries without impacting the primary.

The AG Listener

The listener is a Virtual Network Name (VNN) + one or more IP addresses registered in DNS. Applications connect to the listener name; the cluster routes them to whichever server is currently primary. After a failover the listener IP moves to the new primary automatically — applications simply reconnect without any configuration change.

AG vs Other HA Technologies

Technology Granularity RPO RTO Readable secondary Shared storage? Edition
Always On AG Database group 0 (sync) < 30 s (auto) Yes No Enterprise (Basic: Standard)
FCI (Failover Cluster Instance) Entire SQL instance 0 ~1–5 min No Yes (SAN/S2D) Standard / Enterprise
Database Mirroring Single database 0 (sync) < 30 s Snapshot only No Deprecated in 2012+
Log Shipping Single database Minutes Manual Standby mode only No All editions
Replication Table/article Seconds–minutes N/A Yes (subscriber reads) No Standard / Enterprise

 2. Prerequisites

Windows Server Failover Clustering (WSFC)

Always On AGs require a Windows Server Failover Cluster for health detection and automatic failover. Every node that hosts a replica must be a member of the same WSFC. The cluster itself does not need shared storage — AGs use shared-nothing architecture.

Domain-Independent AG (SQL Server 2016+): Since SQL 2016 you can create a WSFC without Active Directory using a workgroup cluster or a DNS-only cluster. This is useful for cloud VMs or DMZ scenarios where domain membership is not possible.

SQL Server Edition and Version

  • Enterprise Edition — required for full AGs (up to 8 secondaries, readable secondaries, multiple databases per AG).
  • Standard Edition 2016+ — supports Basic Availability Groups (1 database per AG, no readable secondary, 2 replicas max).
  • Same SQL Server version on all replicas — patch level can differ temporarily during rolling upgrades but the major version must match.
  • Same collation on all instances — the server-level collation of the primary and all secondaries must be identical.

Network and Port Requirements

  • TCP port 5022 open between all replicas (the default database mirroring endpoint port — choose any unused port).
  • SQL Server service port 1433 open to application servers.
  • AG listener IP must be reachable on the network segment (open firewall rules).
  • All replicas need static IP addresses or reliable DNS resolution between nodes.

Hadr Feature Must Be Enabled

Always On is disabled by default. Enable it on every SQL Server instance that will host a replica:

-- Run in SQL Server Configuration Manager (GUI) OR via PowerShell:
-- Enable-SqlAlwaysOn -ServerInstance "NODE1\MSSQLSERVER" -Force

-- Verify it is enabled:
SELECT SERVERPROPERTY('IsHadrEnabled') AS IsHadrEnabled;
-- Returns 1 when enabled, 0 when disabled
Note: Enabling or disabling HADR requires a restart of the SQL Server service. Plan a maintenance window before enabling it on a production instance.

Databases Must Be in Full Recovery Model

-- Check current recovery model
SELECT name, recovery_model_desc
FROM sys.databases
WHERE name IN ('SalesDB','HRDatabase','InventoryDB');

-- Switch to FULL recovery model (required for AG)
ALTER DATABASE SalesDB SET RECOVERY FULL;
ALTER DATABASE HRDatabase SET RECOVERY FULL;

-- Take a FULL backup before adding to AG
-- (secondary cannot join without a backup chain)
BACKUP DATABASE SalesDB
    TO DISK = N'\\BACKUPSHARE\SalesDB_full.bak'
    WITH COMPRESSION, STATS = 10;

BACKUP DATABASE HRDatabase
    TO DISK = N'\\BACKUPSHARE\HRDatabase_full.bak'
    WITH COMPRESSION, STATS = 10;

 3. Step-by-Step AG Setup with T-SQL

The following scripts assume a two-node setup: NODE1 (primary) and NODE2 (synchronous secondary), both joined to a WSFC named WSFC-CLUSTER. Run each block on the indicated server.

Step 3.1 — Enable Always On HADR on Both Nodes

-- Run on NODE1 via SQL Server Configuration Manager (GUI) OR:
-- PowerShell on NODE1:
--   Enable-SqlAlwaysOn -ServerInstance "NODE1" -Force
--   (restarts the SQL Server service)

-- PowerShell on NODE2:
--   Enable-SqlAlwaysOn -ServerInstance "NODE2" -Force

-- Verify on each node after restart:
SELECT SERVERPROPERTY('IsHadrEnabled') AS IsHadrEnabled;
-- Expected: 1

Step 3.2 — Create the Database Mirroring Endpoint (Both Nodes)

The endpoint is the TCP channel over which log records and control messages flow between replicas. Run this script on every instance that will host a replica.

-- Run on NODE1
CREATE ENDPOINT [Hadr_endpoint]
    STATE = STARTED
    AS TCP (
        LISTENER_PORT = 5022,          -- choose any unused port
        LISTENER_IP   = ALL            -- listen on all IPs; restrict to specific IP if needed
    )
    FOR DATA_MIRRORING (
        ROLE    = ALL,                 -- can act as principal or mirror
        AUTHENTICATION = WINDOWS NEGOTIATEΒ,
        ENCRYPTION     = REQUIRED ALGORITHM AES
    );

-- Verify endpoint is running
SELECT name, state_desc, type_desc, role_desc
FROM sys.database_mirroring_endpoints;

------------------------------------------------------
-- Run the IDENTICAL script on NODE2 as well
------------------------------------------------------
Service Account Authentication: If all SQL Server services run under the same domain service account, no extra GRANT is needed. If they run under different accounts, grant CONNECT on the endpoint to the other node's service account:
GRANT CONNECT ON ENDPOINT::Hadr_endpoint TO [DOMAIN\SqlSvcNode2];

Step 3.3 — Create the Availability Group (Run on NODE1 — Primary)

-- Run on NODE1 (will become primary replica)
CREATE AVAILABILITY GROUP [AG_Sales]
WITH (
    AUTOMATED_BACKUP_PREFERENCE  = SECONDARY,   -- prefer secondary for backups
    FAILURE_CONDITION_LEVEL      = 3,            -- 1=very conservative .. 5=most aggressive
    HEALTH_CHECK_TIMEOUT         = 30000,        -- ms before declaring failure (default 30s)
    DB_FAILOVER                  = ON,           -- fail over AG if any database goes DOWN
    DTC_SUPPORT                  = NONE          -- set PER_DB for distributed transactions
)
FOR DATABASE
    [SalesDB],       -- add every database that should fail over together
    [HRDatabase]
REPLICA ON
    -- PRIMARY REPLICA (this server)
    N'NODE1' WITH (
        ENDPOINT_URL              = N'TCP://NODE1.corp.local:5022',
        AVAILABILITY_MODE         = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE             = AUTOMATIC,
        SEEDING_MODE              = AUTOMATIC,       -- SQL 2016+: seeds secondary automatically
        SECONDARY_ROLE (
            ALLOW_CONNECTIONS     = NO               -- primary role: not readable as secondary
        ),
        PRIMARY_ROLE (
            ALLOW_CONNECTIONS     = ALL
        ),
        SESSION_TIMEOUT           = 10               -- seconds before marking replica as failed
    ),
    -- SECONDARY REPLICA
    N'NODE2' WITH (
        ENDPOINT_URL              = N'TCP://NODE2.corp.local:5022',
        AVAILABILITY_MODE         = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE             = AUTOMATIC,
        SEEDING_MODE              = AUTOMATIC,
        SECONDARY_ROLE (
            ALLOW_CONNECTIONS     = READ_ONLY        -- readable secondary
        ),
        PRIMARY_ROLE (
            ALLOW_CONNECTIONS     = ALL
        ),
        SESSION_TIMEOUT           = 10
    );

-- Verify AG was created
SELECT ag.name, ag.automated_backup_preference_desc,
       ag.failure_condition_level, ag.health_check_timeout
FROM sys.availability_groups ag;

-- Check replica configuration
SELECT ar.replica_server_name, ar.availability_mode_desc,
       ar.failover_mode_desc, ar.seeding_mode_desc,
       ars.role_desc, ars.operational_state_desc, ars.connected_state_desc
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_availability_replica_states ars
     ON ar.replica_id = ars.replica_id
WHERE ar.group_id = (SELECT group_id FROM sys.availability_groups WHERE name = 'AG_Sales');

Step 3.4 — Join the Secondary Replica (Run on NODE2)

-- Run on NODE2
ALTER AVAILABILITY GROUP [AG_Sales] JOIN;

-- With SEEDING_MODE = AUTOMATIC, SQL Server seeds databases automatically.
-- Monitor seeding progress:
SELECT start_time, completion_time, is_source, current_state,
       transferred_size_bytes, estimate_total_size
FROM sys.dm_hadr_physical_seeding_stats;

-- Alternatively (for SEEDING_MODE = MANUAL):
-- 1. Restore a copy of each database to NODE2 WITH NORECOVERY
-- 2. Then run on NODE2:
ALTER DATABASE [SalesDB]    SET HADR AVAILABILITY GROUP = [AG_Sales];
ALTER DATABASE [HRDatabase] SET HADR AVAILABILITY GROUP = [AG_Sales];
Automatic Seeding (SQL 2016+): When SEEDING_MODE = AUTOMATIC, the primary streams a live backup directly to the secondary over the mirroring endpoint — no manual backup/restore required. Grant the SQL Server service account on the secondary the ALTER ANY AVAILABILITY GROUP permission:
ALTER AVAILABILITY GROUP [AG_Sales] GRANT CREATE ANY DATABASE; (run on NODE2 after joining).

Step 3.5 — Verify the AG is Healthy

-- Run on PRIMARY (NODE1)
-- Overall AG health
SELECT ag.name                              AS ag_name,
       ags.primary_replica                  AS current_primary,
       ags.synchronization_health_desc      AS ag_sync_health,
       ags.operational_state_desc           AS operational_state
FROM sys.availability_groups ag
JOIN sys.dm_hadr_availability_group_states ags
     ON ag.group_id = ags.group_id;

-- Per-database synchronization state
SELECT DB_NAME(drs.database_id)             AS database_name,
       ar.replica_server_name,
       drs.synchronization_state_desc,
       drs.synchronization_health_desc,
       drs.is_suspended,
       drs.log_send_queue_size,
       drs.redo_queue_size
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id;
-- All databases should show: SYNCHRONIZED / HEALTHY

 4. The AG Listener

The listener is the single endpoint your applications should always connect to. It consists of a Virtual Network Name (VNN) and one or more IP addresses (one per subnet). After any failover the cluster brings the listener IP online on the new primary node — applications reconnect to the same name transparently.

Create the Listener (Run on PRIMARY)

-- Run on NODE1 (primary)
ALTER AVAILABILITY GROUP [AG_Sales]
ADD LISTENER N'AG-SALES-LN' (
    WITH IP (
        ( N'10.10.1.50', N'255.255.255.0' )   -- static IP on subnet 1
        -- add a second IP for multi-subnet AG:
        -- , ( N'10.20.1.50', N'255.255.255.0' )
    ),
    PORT = 1433   -- standard SQL port; use different port if 1433 is taken
);

-- Verify listener was created
SELECT agl.dns_name, agl.port,
       agla.ip_address, agla.ip_subnet_mask, agla.state_desc
FROM sys.availability_group_listeners agl
JOIN sys.availability_group_listener_ip_addresses agla
     ON agl.listener_id = agla.listener_id;
DNS Registration: The listener VNN is registered in DNS by the cluster. Ensure your DNS server allows the cluster computer account to create/update records, or pre-create the DNS record manually and grant the cluster account Full Control on that record.

Connection String — Applications

-- ADO.NET connection string using the listener
-- "Data Source=AG-SALES-LN,1433;Initial Catalog=SalesDB;
--  Integrated Security=SSPI;MultiSubnetFailover=True"

-- JDBC (Java) connection string:
-- jdbc:sqlserver://AG-SALES-LN:1433;databaseName=SalesDB;
-- multiSubnetFailover=true;integratedSecurity=true;

-- Read-only (routes to readable secondary via listener):
-- "Data Source=AG-SALES-LN,1433;Initial Catalog=SalesDB;
--  Integrated Security=SSPI;MultiSubnetFailover=True;
--  ApplicationIntent=ReadOnly"
MultiSubnetFailover=True is critical. Without it, the driver tries each listener IP one at a time, which can cause a 20–30 second connection delay after failover. With it, the driver attempts all IPs in parallel, reducing reconnect time to under 2 seconds. Always include this flag for AG connections.

RegisterAllProvidersIP Cluster Setting

-- PowerShell — verify RegisterAllProvidersIP setting
Import-Module FailoverClusters
Get-ClusterResource "AG-SALES-LN" | Get-ClusterParameter RegisterAllProvidersIP

-- For multi-subnet AGs, RegisterAllProvidersIP should be 1 (default).
-- For single-subnet AGs, set to 0 and lower HostRecordTTL for faster DNS failover:
Get-ClusterResource "AG-SALES-LN" | Set-ClusterParameter RegisterAllProvidersIP 0
Get-ClusterResource "AG-SALES-LN" | Set-ClusterParameter HostRecordTTL 120

 5. Readable Secondaries

A readable secondary allows reporting queries, SSRS reports, and ETL reads to run on a secondary replica — completely independent of the primary's I/O. This is one of the most valuable features of an AG in high-traffic environments.

Configure Read-Only Routing

Read-only routing tells the listener which secondary to send ApplicationIntent=ReadOnly connections to. You must define a routing URL for each replica and a routing list on the primary.

-- Run on PRIMARY (NODE1)

-- Step 1: Set the read-only routing URL for each replica
ALTER AVAILABILITY GROUP [AG_Sales]
    MODIFY REPLICA ON N'NODE2' WITH (
        SECONDARY_ROLE (
            ALLOW_CONNECTIONS   = READ_ONLY,
            READ_ONLY_ROUTING_URL = N'TCP://NODE2.corp.local:1433'
        )
    );

-- Also set the routing URL for NODE1 (used when NODE2 is primary):
ALTER AVAILABILITY GROUP [AG_Sales]
    MODIFY REPLICA ON N'NODE1' WITH (
        SECONDARY_ROLE (
            ALLOW_CONNECTIONS   = READ_ONLY,
            READ_ONLY_ROUTING_URL = N'TCP://NODE1.corp.local:1433'
        )
    );

-- Step 2: Set the routing list on the PRIMARY replica
--         (ordered list — first available replica in the list receives read connections)
ALTER AVAILABILITY GROUP [AG_Sales]
    MODIFY REPLICA ON N'NODE1' WITH (
        PRIMARY_ROLE (
            READ_ONLY_ROUTING_LIST = (N'NODE2', N'NODE1')
            -- Try NODE2 first; fall back to NODE1 if NODE2 is unavailable
        )
    );

-- Also set a routing list for when NODE2 becomes primary:
ALTER AVAILABILITY GROUP [AG_Sales]
    MODIFY REPLICA ON N'NODE2' WITH (
        PRIMARY_ROLE (
            READ_ONLY_ROUTING_LIST = (N'NODE1', N'NODE2')
        )
    );

-- Verify routing configuration
SELECT ar.replica_server_name,
       ar.secondary_role_allow_connections_desc,
       ar.read_only_routing_url,
       ar.primary_role_allow_connections_desc
FROM sys.availability_replicas ar
WHERE ar.group_id = (SELECT group_id FROM sys.availability_groups WHERE name = 'AG_Sales');

Testing the Read-Only Route

-- Connect via the listener with ApplicationIntent=ReadOnly
-- and verify you land on the secondary:
SELECT @@SERVERNAME AS ServerName,
       DATABASEPROPERTYEX(DB_NAME(),'Updateability') AS Updateability;
-- Expected: ServerName = NODE2, Updateability = READ_ONLY

-- This query returns an error on a readable secondary (confirms read-only):
CREATE TABLE Test_RO_Check (id INT);
-- Msg 3906, The database is in read-only access mode.
Version Store on Readable Secondary: SQL Server uses row versioning on the secondary to provide read consistency (snapshot isolation). The version store lives in tempdb on the secondary. Monitor tempdb usage on busy readable secondaries and size tempdb appropriately.

 6. Monitoring AG Health

SQL Server exposes a rich set of DMVs and catalog views for monitoring availability group health. Understanding these views is essential for proactive monitoring and rapid incident response.

Key DMVs and Catalog Views

ObjectWhat it shows
sys.availability_groupsAG names, backup preference, failure condition level, health check timeout
sys.availability_replicasReplica configuration: server name, endpoint URL, availability mode, failover mode
sys.dm_hadr_availability_group_statesLive AG state: primary replica, synchronization health (HEALTHY/PARTIALLY_HEALTHY/NOT_HEALTHY)
sys.dm_hadr_availability_replica_statesLive per-replica state: role, operational state, connected state, recovery health
sys.dm_hadr_database_replica_statesPer-database per-replica metrics: sync state, log send queue, redo queue, last commit time
sys.dm_hadr_database_replica_cluster_statesCluster-level database join state and synchronization health visible from any node

Complete AG Health Dashboard Query

-- Run on PRIMARY — comprehensive AG health dashboard
SELECT
    ag.name                                          AS availability_group,
    ags.primary_replica                              AS primary_replica,
    ags.synchronization_health_desc                  AS ag_sync_health,

    ar.replica_server_name,
    ar.availability_mode_desc,
    ar.failover_mode_desc,

    ars.role_desc                                    AS replica_role,
    ars.operational_state_desc,
    ars.connected_state_desc,
    ars.synchronization_health_desc                  AS replica_sync_health,
    ars.recovery_health_desc,

    DB_NAME(drs.database_id)                         AS database_name,
    drs.synchronization_state_desc                   AS db_sync_state,
    drs.synchronization_health_desc                  AS db_sync_health,
    drs.is_suspended,
    drs.suspend_reason_desc,
    drs.log_send_queue_size                          AS log_send_queue_KB,
    drs.log_send_rate                                AS log_send_rate_KB_s,
    drs.redo_queue_size                              AS redo_queue_KB,
    drs.redo_rate                                    AS redo_rate_KB_s,
    -- Estimated catch-up time in seconds
    CASE WHEN drs.redo_rate > 0
         THEN drs.redo_queue_size / drs.redo_rate
         ELSE NULL
    END                                              AS est_redo_catchup_sec,
    drs.last_commit_time,
    -- Data loss exposure: how far behind is the secondary?
    DATEDIFF(SECOND, drs.last_commit_time,
             (SELECT MAX(last_commit_time)
              FROM sys.dm_hadr_database_replica_states
              WHERE database_id = drs.database_id
                AND is_primary_replica = 1))         AS lag_behind_primary_sec

FROM sys.availability_groups ag
JOIN sys.dm_hadr_availability_group_states  ags ON ag.group_id = ags.group_id
JOIN sys.availability_replicas              ar  ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
JOIN sys.dm_hadr_database_replica_states    drs ON ar.replica_id = drs.replica_id
ORDER BY ag.name, ar.replica_server_name, DB_NAME(drs.database_id);

Key Metrics to Alert On

MetricColumnAlert threshold
Log send queuelog_send_queue_size> 50 MB (network or IO bottleneck on secondary)
Redo queueredo_queue_size> 100 MB (secondary redo thread falling behind)
AG sync healthsynchronization_health_descNOT_HEALTHY or PARTIALLY_HEALTHY
Data movement suspendedis_suspended = 1Any database suspended
Commit laglag_behind_primary_sec> 30 sec on synchronous replica
Connected stateconnected_state_descDISCONNECTED

AG Dashboard in SSMS

SSMS has a built-in graphical dashboard: right-click the AG in Object Explorer → Show Dashboard. It displays a real-time synchronization health timeline, replica states, and database synchronization status. Useful for a quick visual check, though the DMV queries above are better for automated monitoring.

-- Check for any AGs in a non-healthy state (good for monitoring job):
SELECT ag.name AS availability_group, ags.synchronization_health_desc
FROM sys.availability_groups ag
JOIN sys.dm_hadr_availability_group_states ags ON ag.group_id = ags.group_id
WHERE ags.synchronization_health_desc <> 'HEALTHY';
-- Alert if this returns any rows

 7. Manual and Automatic Failover

Automatic Failover

Automatic failover requires: (1) the failing replica is in synchronous commit mode, (2) the secondary is in a SYNCHRONIZED state, and (3) the WSFC health detection fires. When the primary goes offline and the quorum sees the failure, the cluster promotes the designated synchronous secondary in roughly 20–30 seconds — no DBA action needed.

Planned Manual Failover (Zero Data Loss)

Use this for planned maintenance: patching the primary, hardware upgrades, etc. The secondary must be SYNCHRONIZED.

-- Pre-check: confirm secondary is synchronized
SELECT ar.replica_server_name,
       drs.synchronization_state_desc,
       drs.synchronization_health_desc
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
WHERE ar.group_id = (SELECT group_id FROM sys.availability_groups WHERE name = 'AG_Sales')
  AND drs.is_primary_replica = 0;
-- synchronization_state_desc must be SYNCHRONIZED

-- Perform planned failover from the SECONDARY node (NODE2)
-- Connect to NODE2 and run:
ALTER AVAILABILITY GROUP [AG_Sales] FAILOVER;

-- After failover: verify NODE2 is now primary
SELECT primary_replica FROM sys.dm_hadr_availability_group_states
WHERE group_id = (SELECT group_id FROM sys.availability_groups WHERE name = 'AG_Sales');

Forced Failover (Potential Data Loss)

Use FORCE_FAILOVER_ALLOW_DATA_LOSS only when the primary is completely unavailable and you need to restore service immediately. This is a last-resort operation that accepts possible data loss.

-- STEP 1: Assess data loss exposure BEFORE forcing failover
-- Run on the target secondary (NODE2):
SELECT
    DB_NAME(drs.database_id)                AS database_name,
    drs.last_commit_time                    AS secondary_last_commit,
    drs.redo_queue_size                     AS outstanding_redo_KB,
    drs.log_send_queue_size                 AS unsent_log_KB
FROM sys.dm_hadr_database_replica_states drs
WHERE drs.is_primary_replica = 0;
-- unsent_log_KB > 0 means transactions on primary are not yet on this secondary
-- Those transactions WILL be lost on a forced failover

-- STEP 2: If you accept the data loss, run on NODE2:
ALTER AVAILABILITY GROUP [AG_Sales] FORCE_FAILOVER_ALLOW_DATA_LOSS;

-- STEP 3: After forced failover, verify databases are accessible:
SELECT name, state_desc FROM sys.databases WHERE name IN ('SalesDB','HRDatabase');

-- STEP 4: If the old primary comes back, it will be in RESOLVING state.
-- Re-join it as a secondary:
-- (on the old primary, after its SQL service restarts)
ALTER AVAILABILITY GROUP [AG_Sales] JOIN;
ALTER DATABASE [SalesDB]    SET HADR AVAILABILITY GROUP = [AG_Sales];
ALTER DATABASE [HRDatabase] SET HADR AVAILABILITY GROUP = [AG_Sales];
After a forced failover, the AG enters an unprotected state. Reseed or re-join the old primary as quickly as possible to restore redundancy. Also check application data for any uncommitted or partially committed transactions that may have been lost.

Failover with SSMS Wizard

For planned failovers, SSMS provides a wizard: right-click the AG in Object Explorer → Failover…. The wizard confirms synchronization health, estimates data loss, and walks you through the failover steps. It is equivalent to running ALTER AVAILABILITY GROUP … FAILOVER but with a safety checklist.

 8. Common AG Problems and Fixes

Problem 1: Data Movement Suspended

-- Identify suspended databases
SELECT DB_NAME(database_id) AS db_name,
       replica_id, is_suspended, suspend_reason_desc
FROM sys.dm_hadr_database_replica_states
WHERE is_suspended = 1;

-- Resume data movement
ALTER DATABASE [SalesDB] SET HADR RESUME;

-- Common suspend reasons:
-- USER_SUSPENDED      → DBA manually suspended (ALTER DATABASE … SET HADR SUSPEND)
-- SUSPEND_FROM_REDO   → redo thread hit an error (check SQL error log)
-- SUSPEND_FROM_APPLY  → apply thread error on secondary

Problem 2: Secondary Falling Behind (Large Redo Queue)

-- Identify redo queue size per database per replica
SELECT ar.replica_server_name,
       DB_NAME(drs.database_id)  AS database_name,
       drs.redo_queue_size       AS redo_queue_KB,
       drs.redo_rate             AS redo_rate_KB_s,
       CASE WHEN drs.redo_rate > 0
            THEN drs.redo_queue_size / drs.redo_rate
            ELSE NULL
       END                       AS est_catchup_sec
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
WHERE drs.is_primary_replica = 0
ORDER BY drs.redo_queue_size DESC;

-- Root causes:
-- 1. High write workload on primary (large transactions, bulk loads)
-- 2. Secondary I/O bottleneck (slow disk on secondary)
-- 3. Long-running read queries on readable secondary blocking redo thread

-- Fix for readable-secondary redo blocking (SQL 2019+):
-- Enable accelerated database recovery (ADR) to reduce redo thread stalls:
ALTER DATABASE [SalesDB] SET ACCELERATED_DATABASE_RECOVERY = ON;

Problem 3: Listener Not Resolving

-- Test listener from application server:
-- nslookup AG-SALES-LN
-- Should return the current primary's IP address

-- Verify listener state in cluster:
-- PowerShell: Get-ClusterResource | Where-Object {$_.Name -like "*AG-SALES*"}

-- Check listener IP addresses in SQL Server:
SELECT agl.dns_name, agl.port, agla.ip_address, agla.state_desc
FROM sys.availability_group_listeners agl
JOIN sys.availability_group_listener_ip_addresses agla
     ON agl.listener_id = agla.listener_id;
-- state_desc should be ONLINE for the current primary's IP

-- Common fixes:
-- 1. Listener IP not in same subnet as client → check IP configuration
-- 2. Firewall blocking port 1433 to listener IP → open firewall rule
-- 3. DNS TTL too high → lower HostRecordTTL on the cluster resource
-- 4. SQL Browser not running → needed for named instances (not default instance)

Problem 4: Synchronization Stuck — NOT_SYNCHRONIZING

-- Check replica connection state
SELECT ar.replica_server_name, ars.connected_state_desc, ars.operational_state_desc
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id;

-- If DISCONNECTED:
-- 1. Check SQL Server error log on secondary for endpoint errors
-- 2. Verify endpoint is started: SELECT state_desc FROM sys.database_mirroring_endpoints
-- 3. Test TCP connectivity: Test-NetConnection -ComputerName NODE2 -Port 5022
-- 4. Check service account permissions on endpoint

-- Restart the endpoint if needed:
ALTER ENDPOINT [Hadr_endpoint] STATE = STOPPED;
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;

Problem 5: Certificate Expiry on Mirroring Endpoint

If you use certificate-based authentication on the mirroring endpoint (common in workgroup clusters or cross-domain setups), certificates expire. Monitor with:

-- Check certificates used for mirroring endpoints
SELECT c.name, c.expiry_date, c.subject, c.pvt_key_encryption_type_desc
FROM sys.certificates c
JOIN sys.database_mirroring_endpoints ep
     ON ep.certificate_id = c.certificate_id;
-- Alert if expiry_date < DATEADD(DAY, 30, GETDATE())

-- To renew: create a new certificate, update the endpoint, and distribute
-- the public key to all partner instances

 9. Maintenance with an Availability Group

Rolling Upgrade / Patching Pattern

Always patch secondaries first, then fail over to an upgraded secondary and patch the (now) former primary. This minimises downtime to a single planned failover.

  1. Patch NODE2 (secondary) — apply CU/SP and restart SQL Server.
  2. Verify NODE2 re-synchronises after restart (watch synchronization_state_desc).
  3. Perform a planned manual failover from NODE2 (ALTER AVAILABILITY GROUP … FAILOVER) — NODE2 becomes primary.
  4. Patch NODE1 (now secondary) — apply CU/SP and restart.
  5. Verify NODE1 re-synchronises. Optionally fail back to NODE1 if desired.
Version skew: SQL Server allows different patch levels (CU/SP) between replicas temporarily during rolling upgrades, but major version differences are not supported. The patched secondary will be in "upgrading" operational state until the primary is also patched.

Index Rebuilds and Replication

-- Index rebuilds on the PRIMARY replicate to secondaries automatically.
-- The redo thread applies all DDL, DML, and index operations.
-- You do NOT rebuild indexes directly on the secondary.

-- For ONLINE index rebuilds (reduces blocking on primary):
ALTER INDEX [IX_Sales_OrderDate] ON [dbo].[SalesOrders]
    REBUILD WITH (ONLINE = ON, SORT_IN_TEMPDB = ON, MAXDOP = 4);

-- Check that the redo queue doesn't spike after a large index rebuild:
SELECT ar.replica_server_name, drs.redo_queue_size
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
WHERE drs.is_primary_replica = 0;

DBCC CHECKDB on a Readable Secondary

-- Run DBCC CHECKDB on the readable secondary to offload I/O from primary
-- Connect to NODE2 (secondary) with ApplicationIntent=ReadOnly or directly
DBCC CHECKDB ('SalesDB') WITH NO_INFOMSGS, ALL_ERRORMSGS;

-- Benefits:
-- 1. Does not compete with production I/O on primary
-- 2. Reads data pages locally (no network transfer)
-- 3. Detects page-level corruption on the secondary's copy
-- Note: DBCC checks the secondary's copy — corruption on primary
--       that was already replicated will show up here too.

Backup Preference — Prefer Secondary

-- AG was created with AUTOMATED_BACKUP_PREFERENCE = SECONDARY
-- Ola Hallengren / native backup jobs should respect this.

-- Check current preference:
SELECT name, automated_backup_preference_desc
FROM sys.availability_groups;

-- In your backup job, use sys.fn_hadr_backup_is_preferred_replica()
-- to dynamically decide which replica runs the backup:
IF sys.fn_hadr_backup_is_preferred_replica('SalesDB') = 1
BEGIN
    BACKUP DATABASE [SalesDB]
        TO DISK = N'\\BACKUPSHARE\SalesDB_diff.bak'
        WITH DIFFERENTIAL, COMPRESSION, STATS = 10;
END
-- This ensures only the preferred replica (usually the secondary) runs the backup job,
-- even though the job is scheduled on all nodes.

 10. Always On vs Basic Availability Group (Standard Edition)

SQL Server 2016 Standard Edition introduced the Basic Availability Group — a limited version of AG that provides high availability without the Enterprise Edition licence cost.

Feature Basic AG (Standard Edition) Full AG (Enterprise Edition)
Max replicas2 (primary + 1 secondary)9 (primary + 8 secondaries)
Databases per AG1Unlimited
Readable secondaryNoYes
Read-only routingNoYes
Backup on secondaryNoYes
SSAS / SSRS integrationNoYes
Distributed AGNoYes
Automatic seedingYes (2016+)Yes
AG listenerYesYes
Automatic failoverYesYes
When is Basic AG sufficient? If you need simple automatic failover for a single production database, are on a tight budget, and don't need to offload reads to a secondary — Basic AG on Standard Edition is a cost-effective choice. For any of the following, you need Enterprise: multiple databases failing over together, readable secondaries for reporting, more than one standby, or distributed AGs spanning multiple WSFC clusters.
-- Create a Basic AG on Standard Edition
-- Syntax is identical to full AG — SQL Server enforces limits automatically
CREATE AVAILABILITY GROUP [BasicAG_Finance]
WITH (BASIC)   -- ← this keyword enables Basic AG mode
FOR DATABASE [FinanceDB]
REPLICA ON
    N'NODE1' WITH (
        ENDPOINT_URL      = N'TCP://NODE1.corp.local:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE     = AUTOMATIC,
        SEEDING_MODE      = AUTOMATIC
    ),
    N'NODE2' WITH (
        ENDPOINT_URL      = N'TCP://NODE2.corp.local:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE     = AUTOMATIC,
        SEEDING_MODE      = AUTOMATIC
    );

-- Note: attempting to add SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
-- on a Basic AG will fail with an error.

Frequently Asked Questions


What SQL Server edition is required for Always On AGs?
What is the difference between synchronous and asynchronous commit?
Can I run DBCC CHECKDB on a readable secondary?
How many databases can I add to an Availability Group?
Does the AG listener support Kerberos authentication?

Was this guide helpful?



Continue Learning — SQL Server



AG Quick Facts
  • Max replicas: 9 (1 primary + 8 secondary)
  • Auto failover: Synchronous commit only
  • Data loss: 0 with synchronous commit
  • Endpoint port: 5022 (default)
  • Edition: Enterprise (full) / Standard (Basic)
  • Requires WSFC: Yes (or workgroup cluster)
  • Shared storage: Not required
Article Info
  • Level: Intermediate / Advanced
  • Read time: ~20 min
  • Edition: SQL Server Enterprise
  • Versions: 2012 – 2022
  • Updated: June 2026
Stay Updated with Techoral

Get the latest SQL Server, HA, and DBA tips delivered to your inbox.