Broadcom recently published an updated technical paper on VMware vSAN Stretched Clusters, written by Pete Koehler (December 2025). This comprehensive guide covers everything you need to know about designing, deploying, and operating stretched clusters across two geographically separated sites using VMware Cloud Foundation (VCF) 9.0 with vSAN 9.
In this article, I summarize the key concepts, architecture decisions, and best practices from the whitepaper, and share my own thoughts on what matters most when planning a vSAN stretched cluster deployment.
You can download the full whitepaper from Broadcom here.
What Is a vSAN Stretched Cluster?
A vSAN stretched cluster extends a single vSAN cluster across two physical sites (referred to as the preferred and secondary fault domains), with a lightweight witness appliance running at a third location. This architecture provides site-level disaster recovery with near-zero RPO and automated failover using native vSphere HA — without requiring third-party replication software.
The key benefit is that data is synchronously mirrored between the two sites, meaning that if one site goes down completely, the surviving site has a full copy of all data and workloads can be restarted automatically.
Supported Architectures: OSA and ESA
vSAN stretched clusters are supported on both the Original Storage Architecture (OSA) and the Express Storage Architecture (ESA). However, there are important differences.
With ESA, the storage architecture is more efficient by design. ESA uses a single-tier storage pool (NVMe only), which simplifies disk group management and eliminates the caching tier complexity found in OSA. For stretched clusters specifically, ESA supports adaptive erasure coding within each site — meaning you can use RAID-5 or RAID-6 locally at each site while still mirroring data across sites. This combination provides excellent space efficiency without sacrificing resilience.
OSA stretched clusters still work well, but ESA is the recommended path forward for new deployments.
The Witness Appliance
The witness appliance is a critical component of any vSAN stretched cluster. It does not store actual VM data — instead, it holds witness components that act as a tiebreaker during network partitions or site failures. The witness must be deployed at a third location, separate from both data sites.
Key points about the witness:
- It runs as a small virtual appliance (tiny, medium, or large, depending on the number of components)
- It requires network connectivity to both data sites, but latency requirements are more relaxed (up to 200ms RTT to each site)
- It should never run inside the stretched cluster itself
- Multiple stretched clusters can share a single witness host, but each cluster needs its own witness appliance
- The witness does not need high bandwidth — 100 Mbps is sufficient for most deployments
A common mistake is placing the witness at one of the data sites. This defeats the purpose of having a third fault domain and creates a single point of failure during site isolation events.
Network Requirements
Networking is arguably the most critical design consideration for vSAN stretched clusters. The two data sites must have low-latency, high-bandwidth connectivity between them.
The requirements are straightforward but non-negotiable. Between the two data sites, you need a maximum of 5ms RTT latency for vSAN traffic. A minimum of 10 Gbps bandwidth for vSAN traffic is required, though 25 Gbps is recommended. Between each data site and the witness, up to 200ms RTT latency is acceptable, and 100 Mbps bandwidth is sufficient.
vSAN traffic between sites should be on a dedicated or isolated network segment, and it is highly recommended to use jumbo frames (MTU 9000) for optimal performance. For ESA deployments with RDMA, both sites need to support RoCE v2 within each site (RDMA is not used across sites).
Fault Domains and Site Affinity
In a vSAN stretched cluster, you define two fault domains — one for each data site. Every ESXi host is assigned to either the preferred or secondary fault domain. The witness appliance operates as its own implicit fault domain.
When a VM is created, vSAN places one full copy of data at the preferred site and one full copy at the secondary site, plus a witness component on the witness appliance. This ensures that any single site failure still leaves a quorum of components available.
Site affinity is an important concept for workloads that should preferably run at a specific site. You can use VM-Host affinity rules in DRS to keep VMs at their designated site during normal operations, while still allowing failover to the other site during a disaster.
vSphere HA and DRS Configuration
Getting the HA and DRS settings right is essential for proper stretched cluster behavior.
For vSphere HA, the recommendation is to set the admission control policy to 50% for both CPU and memory. This reserves enough capacity at each site to absorb the full workload from the other site during a failure. Host monitoring should use vSAN network heartbeating, and the isolation response should be set to "Power off and restart VMs."
For DRS, the automation level should be set to "Fully Automated." Use VM-Host affinity rules (should rules, not must rules) to prefer VMs at their designated site. This allows DRS to override placement during a failover event. During normal operations, DRS will respect the affinity rules and keep VMs at their preferred site.
Storage Policies for Stretched Clusters
vSAN stretched clusters use a specific storage policy setting: the site disaster tolerance policy. This is set separately from the standard FTT (failures to tolerate) setting.
The site disaster tolerance defines how data is mirrored across sites. The most common configuration is "Site mirroring" (also called dual site mirroring), which creates one full copy at each data site.
Within each site, you can additionally configure local FTT protection. For example, on ESA you can set FTT=1 with RAID-5 at each site, combined with site mirroring across sites. This means data is protected against both a full site failure and an additional host failure at the surviving site.
This layered protection model is one of the major advantages of vSAN stretched clusters — you get both local resilience and site-level disaster recovery in a single, unified storage policy.
Maintenance and Day-2 Operations
Operating a stretched cluster requires understanding the impact of maintenance activities at each site.
When placing a host in maintenance mode, you should choose "Ensure accessibility" for routine maintenance tasks. This avoids unnecessary data migrations between sites. For permanent host removal, use "Full data migration" to evacuate all components.
During a planned site maintenance event (such as power maintenance at one data center), you can use the "Decommission Fault Domain" workflow. This gracefully migrates all VMs to the other site and ensures data integrity before the site goes offline.
Lifecycle management through VCF SDDC Manager handles firmware and software upgrades in a rolling fashion, maintaining availability throughout the update process.
Failure Scenarios and Recovery
The whitepaper covers multiple failure scenarios in detail. Here are the most important ones.
In a single host failure, VMs are restarted on remaining hosts at the same site. vSAN rebuilds missing components using hosts at the same site if possible.
In a full site failure, vSphere HA restarts all affected VMs at the surviving site. Because a full data copy exists at the surviving site, VMs can restart immediately without waiting for data rebuilds.
In a network partition between sites, the preferred site retains quorum (because it has data + witness), and VMs at the secondary site are powered off and restarted at the preferred site. This is why the "preferred" designation matters.
In a witness isolation, both data sites continue operating normally. No VM impact occurs. The witness is only needed during site-level events.
My Recommendations
Based on my experience with vSAN stretched clusters, here are a few practical recommendations.
First, invest in network quality. The number one cause of stretched cluster issues is network problems between sites. Ensure you have redundant, low-latency links with proper QoS for vSAN traffic.
Second, use ESA if you are deploying new infrastructure. The performance and efficiency benefits of ESA are significant, especially the ability to use erasure coding within each site.
Third, test your failure scenarios. Before going into production, simulate site failures, network partitions, and witness outages. Verify that HA and DRS behave as expected.
Fourth, document your affinity rules. Keep a clear mapping of which VMs belong to which site, and review this regularly as workloads change.
Finally, size your witness appropriately. For large environments with many VMs, use the large witness appliance to handle the additional component count.
Conclusion
vSAN stretched clusters remain one of the most elegant solutions for site-level disaster recovery in a VMware environment. With VCF 9.0 and vSAN 9, the technology has matured significantly — particularly with ESA bringing better performance and simpler operations.
The official Broadcom whitepaper is an excellent resource for anyone planning or operating a stretched cluster. I highly recommend reading it in full.
