The ResOps Readiness Assessment: 10 Questions to Gauge Your Survivability
Scoring:
0–4 "Yes": Crisis Impending. Your "Uptime" is a mask for deep systemic fragility.
5–7 "Yes": Fragile Foundation. You have the tools, but lack the strategic orchestration.
- 8–10 "Yes": Resilience Leader. You are ready for the Black Swan.
Phase 1: Visibility & Mapping
1. The N-th Party Map: Do we have a live, automated map of not just our direct vendors, but the infrastructure they rely on (e.g., knowing which of our SaaS tools share the same AWS region)?
2. Shadow Discovery: Can we identify every external API call made by our critical path services within 60 seconds of a failure?
Phase 2: Tactical Execution (The Playbook)
3. MTRL vs. MTTR: Do we measure Mean Time to Recovery of Logic (how fast the business functions) separately from how fast the tech is "fixed"?
4. Unannounced Chaos: Have we successfully executed a "Chaos Experiment" in production within the last 90 days without causing an unintended customer outage?
5. Multi-Vector Scenarios: Does our Disaster Recovery plan include a scenario where a technical failure happens simultaneously with a human crisis (e.g., a cyber attack during a holiday or a regional power outage)?
Phase 3: Culture & Governance
6. The Veto Power: Does our ResOps lead (or equivalent) have the formal authority to "veto" a high-speed feature release if it exceeds our established Resilience Debt threshold?
7. Neutral Reporting: Does the person responsible for resilience report to a Risk or Operations executive (CRO/COO) rather than a person incentivized primarily by shipping speed (CTO/VP Eng)?
8. Impact Tolerance: Have we defined the "Maximum Tolerable Period of Disruption" for our top three revenue-generating services in terms of dollars, not just "9s"?
Phase 4: The Remediation Loop
9. P0 Resilience: Are vulnerabilities found during "Game Days" automatically converted into P0/P1 tickets that take priority over the new feature roadmap?
10. Graceful Degradation: Can our core business transaction (e.g., checkout, data upload, search) survive a total failure of its primary database by switching to a "Read-Only" or "Cached" mode?
Comments
Post a Comment