Azure Zone Redundancy vs. Multi-Region High Availability

✍️ By Abhishek Kumar | #FirstCrazyDeveloper

This document explores two powerful patterns in Azure for improving application availability: Zone Redundancy (ZR) within a region and Multi-Region High Availability (MR-HA) across regions. It provides a decision matrix to help you choose the right approach based on cost, latency, RTO/RPO, and operational complexity, along with reference architectures, code examples, and testing strategies. The document also covers common pitfalls and offers practical tips for cost optimization and performance.

Downtime can significantly impact revenue and erode customer trust. Azure offers two primary strategies to enhance application availability: Zone Redundancy (ZR) and Multi-Region High Availability (MR-HA). Selecting the appropriate strategy is crucial, as it directly affects cost, latency, Recovery Time Objective (RTO), Recovery Point Objective (RPO), and operational complexity.

⚡Decision Matrix

CriterionZone Redundancy (ZR)Multi-Region HA (MR-HA)
Scope of failure toleratedSingle datacenter (AZ)Full regional outage
LatencyLowest (in-region)Higher (cross-region)
ComplexityLow–MediumMedium–High
CostLowerHigher (duplicate infra + data egress)
RTO/RPO (typical)Minutes/near-zeroSeconds–minutes / seconds–minutes
Ideal forMost prod workloads needing 99.99%Mission-critical, regulatory, DR mandates

⚡Core Concepts (in 60 seconds)

  • Availability Zone (AZ): Physically separate datacenters inside one Azure region.
  • Zone Redundancy: Distribute replicas across 3 AZs in the same region (e.g., westeurope AZ1/2/3).
  • Multi-Region HA: Replicate to a paired or secondary region (e.g., westeuropenortheurope). Survives full regional failure.

⚡Reference Architectures

A) Zone-Redundant Web App (Same Region)

  • Front Door or Azure Application Gateway (zonal or zone-redundant)
  • App Service (Zone Redundant) on PremiumV3/IsolatedV2 (scale across AZs)
  • Azure SQL DB (Zone Redundant) or Cosmos DB (multi-AZ)
  • Storage (ZRS) for static assets
  • Private DNS + Private Endpoints for data plane

When to use: Low latency, resilient to a datacenter loss, simpler ops.

B) Multi-Region Active/Passive (Two Regions)

  • Region A (Primary) + Region B (Secondary)
  • Azure Front Door for global anycast + health probes + failover
  • App Service in both regions (slot warm in secondary)
  • SQL: Auto-failover group (geo-replication)
  • Cosmos DB: Multi-region write/read (or single-write + failover)
  • Blob: RA-GZRS or GZRS + failover
  • Key Vault: Geo-redundant (soft-delete, purge-protection)
  • Traffic failover: Front Door or DNS TTL short

When to use: Regulated industries, strict SLAs, DR tests, geo-users.

⚡Bicep: Zone-Redundant Azure SQL + App Service (Same Region)

param location string = 'westeurope'
param rgName string = resourceGroup().name
param sqlAdmin string
@secure()
param sqlPwd string

resource plan 'Microsoft.Web/serverfarms@2023-12-01' = {
  name: 'fcdev-app-plan'
  location: location
  sku: {
    name: 'P1v3'
    tier: 'PremiumV3'
    capacity: 2
  }
  zoneRedundant: true
}

resource app 'Microsoft.Web/sites@2023-12-01' = {
  name: 'fcdev-zr-app'
  location: location
  properties: {
    httpsOnly: true
    serverFarmId: plan.id
  }
}

resource sql 'Microsoft.Sql/servers@2023-08-01-preview' = {
  name: 'fcdev-sql-${uniqueString(rgName)}'
  location: location
  properties: {
    administratorLogin: sqlAdmin
    administratorLoginPassword: sqlPwd
    publicNetworkAccess: 'Disabled'
  }
}

resource db 'Microsoft.Sql/servers/databases@2023-08-01-preview' = {
  name: '${sql.name}/appdb'
  location: location
  sku: {
    name: 'GP_Gen5_2'
    tier: 'GeneralPurpose'
  }
  properties: {
    zoneRedundant: true
  }
}

⚡Azure CLI: Multi-Region SQL Auto-Failover Group (A→B)

# Variables
PRIMARY_RG=rg-we
SECONDARY_RG=rg-ne
PRIMARY_LOC=westeurope
SECONDARY_LOC=northeurope
SQL_PRIMARY=fcdevsqlwe
SQL_SECONDARY=fcdevsqlne
DB=appdb
FOG=myfog

# Create secondary server
az sql server create -g $SECONDARY_RG -n $SQL_SECONDARY -l $SECONDARY_LOC \
  -u $SQL_ADMIN -p $SQL_PWD

# Geo-replicate DB
az sql db replica create -g $PRIMARY_RG -s $SQL_PRIMARY -n $DB \
  --partner-server $SQL_SECONDARY --partner-resource-group $SECONDARY_RG

# Failover group
az sql failover-group create -g $PRIMARY_RG -s $SQL_PRIMARY \
  -n $FOG --partner-server $SQL_SECONDARY \
  --add-db $DB --failover-policy Automatic --grace-period 1

⚡Cosmos DB: Multi-Region with Preferred Writes (C# & Python)

🔹C# SDK

using Microsoft.Azure.Cosmos;

var accountEndpoint = Environment.GetEnvironmentVariable("COSMOS_URI");
var key = Environment.GetEnvironmentVariable("COSMOS_KEY");

var client = new CosmosClient(accountEndpoint, key, new CosmosClientOptions {
    ApplicationPreferredRegions = new[] { "West Europe", "North Europe" },
    AllowBulkExecution = true,
    EnableTcpConnectionEndpointRediscovery = true
});

// Read with region preference
var container = client.GetContainer("appdb", "items");
var response = await container.ReadItemAsync<dynamic>("id1", new PartitionKey("pk1"));
Console.WriteLine($"RU: {response.RequestCharge}");

🔹Python SDK

import os
from azure.cosmos import CosmosClient, PartitionKey

endpoint = os.getenv("COSMOS_URI")
key = os.getenv("COSMOS_KEY")

client = CosmosClient(endpoint, key, consistency_level="Session")
database = client.get_database_client("appdb")
container = database.get_container_client("items")

item = container.read_item(item="id1", partition_key="pk1")
print(item)

Tip: For MR-HA, enable multi-region writes if you need active/active; otherwise keep write region single and use automatic failover.

⚡Storage Strategy

  • ZR only: ZRS for hot path; lifecycle to Cool/Archive.
  • MR-HA: GZRS/RA-GZRS for geo-redundant + zone-redundant.
  • Avoid cross-region chatter in hot paths; cache near users.

🔹Front Door + App Service (Multi-Region Routing)

  • Use Azure Front Door (Standard/Premium) for:
    • Anycast global entry, WAF, path routing
    • Health probes to fail from Region A to B
    • Origin groups: App Service endpoints in each region
  • Keep session state stateless or use Redis Cache geo-replication / sticky only if necessary.

🔹Testing Playbook (Print-worthy)

  1. Zone fail test: Stop instances in AZ1; verify app stays up.
  2. Region fail test: Block Front Door origin A; observe failover to B (<1–2 min).
  3. Data RPO test: Write storm → trigger failover → validate last commit.
  4. DNS/Certs: Ensure wildcard certs in both origins; short TTL.
  5. Runbooks: Document manual failover (SQL FOG, Cosmos manual) and rollback.
  6. Chaos drills: Quarterly game days; record RTO/RPO evidence.

🔹Cost & Performance Tips

  • ZR → best value for 95% of apps.
  • MR-HA → budget for duplicate compute + egress; turn down secondary to warm (autoscale min=1), not cold.
  • Prefer read-heavy offload to secondary region where users are.
  • Log Analytics retention by need (e.g., 30–90 days) and export to Blob for long-term.

🔹Common Gotchas

  • Stateful services without cross-region session design.
  • Private Endpoints not duplicated in secondary (breaks failover).
  • Key Vault not enabled with soft-delete + purge protection.
  • Regional features: some SKUs not GA in both regions—check parity.

✨Abhishek Take

Start with Zone Redundancy as your default. Add Multi-Region HA only when there’s a real business driver (RTO/RPO, regulatory DR, global latency). Keep apps stateless, standardize runbooks, and practice failovers—that’s what separates check-box DR from real resilience.

#Azure #CloudComputing #HighAvailability #DisasterRecovery #CloudArchitecture #AzureZoneRedundancy #MultiRegionHighAvailability #FirstCrazyDeveloper #AbhishekKumar

Posted in , , , , ,

Leave a comment