VMware Stretched Cluster — One‑Page Decision Matrix (Architect View)

Use this table to determine whether a stretched cluster is the right architectural choice for a given environment.

How to Use This Matrix

  • If you answer YES to all items in the Use column, a stretched cluster may be appropriate.
  • A single NO in a critical row should trigger consideration of alternatives (SRM, async replication, per‑site clusters).

✅ Decision Matrix

Decision AreaQuestion to AskStretched Cluster: YES When…Stretched Cluster: NO When…
Business RequirementIs zero data loss (0 RPO) a documented business requirement?Any data loss is unacceptable, including seconds.Minutes or even seconds of data loss are acceptable.
Outage ToleranceCan the business not tolerate restart‑based recovery?Workloads must survive site loss with only HA restarts.Brief downtime during recovery is acceptable.
Recovery StrategyIs disaster avoidance preferred over recovery?Goal is to continue running through site failure.Goal is clean recovery to a known‑good state.
Data Protection ModelDo you understand synchronous replication mirrors bad data too?You explicitly accept this and have separate backups.You expect the platform to help with recovery/rollback.
Network LatencyCan you guarantee < 5 ms RTT between sites long‑term?Low, stable latency is contractual and monitored.Latency is variable, shared, or “usually acceptable.”
Network BandwidthCan you sustain 10+ Gbps under failure and resync?Capacity is sized for worst‑case resync scenarios.Bandwidth is sized for steady‑state only.
Network PredictabilityCan latency remain stable during congestion or failure?Network behavior is deterministic under stress.Congestion or jitter is expected during incidents.
Site SymmetryCan both sites remain operationally symmetric for years?Hardware, firmware, capacity, and ops are aligned.One site is newer, smaller, or differently managed.
Scope ControlIs this for a small, well‑defined workload set?Only a few Tier‑1 workloads will use it.Platform is intended as a general‑purpose cluster.
Operational MaturityDo teams understand HA/DRS/fault‑domain interactions?Staff are trained and failure scenarios are tested.Teams prefer simple, explicit recovery workflows.
Failure ClarityDo you accept more complex failure diagnostics?Complex behavior is acceptable for less downtime.Simplicity during incidents is a priority.
Application DesignAre apps unable to tolerate restart or short outage?Apps require near‑continuous availability.Apps can restart or be redesigned to tolerate failure.
Change VelocityIs the environment relatively stable over time?Changes are controlled and infrequent.Frequent changes, migrations, or platform churn.
Security/RansomwareDo you already rely on backups for data recovery?Backups are primary recovery mechanism.You expect the infrastructure to aid recovery.
Cost AcceptanceAre you willing to pay for permanent complexity?Cost and complexity are justified by the requirement.Complexity is acceptable only if it’s removable.

✅ When the Matrix Says “YES”

A VMware stretched cluster is appropriate when:

  • Zero RPO is non‑negotiable
  • Network guarantees are strict and sustainable
  • Workloads are few and clearly scoped
  • The organization understands this is disaster avoidance, not DR

❌ When the Matrix Says “NO”

Favor alternatives when:

  • Recovery clarity matters more than uninterrupted runtime
  • Networks are shared or unpredictable
  • Long‑term site symmetry cannot be guaranteed
  • The platform is expected to compensate for application limitations

Common alternatives:

  • Independent clusters + SRM
  • Async replication
  • Purpose‑built clusters per site
  • Explicit, tested recovery workflows

Leave a Reply

Your email address will not be published. Required fields are marked *

Share on Social Media