1. Purpose
vCISO Aegis AI™ is an AI-Native AI Agent-as-a-Service (AaaS) product. Every output is derived from live telemetry. That architectural choice is what makes the product defensible in front of a DoD assessor, an auditor, or a court. It also creates one load-bearing failure mode: if telemetry stops flowing, the system must not fabricate coverage.
This page defines how telemetry loss is detected, how the platform responds automatically, how you are informed, and what continuity options are available so your regulatory posture does not go dark during an outage. This plan is executed by automated agents with a human-in-the-loop executive approval gate before any bridge service is extended or modified.
2. Core principle: fail closed, never fabricate
The single rule that governs every decision in this document:
as_of timestamp. The Services will never produce a synthesized, estimated, or questionnaire-derived answer in its place.
Everything downstream of this rule — alerts, dashboards, bridge services, SLAs — exists to make "fail closed" a survivable experience for the customer rather than a cliff.
3. Telemetry sources under scope
Each customer environment feeds vCISO Aegis AI™ through a combination of collectors and integrations. The continuity plan applies to all of them:
- EDR / XDR agents (CrowdStrike, SentinelOne, Defender for Endpoint, etc.)
- SIEM and log pipelines (Splunk, Sentinel, Elastic, Chronicle)
- Cloud provider control-plane APIs (AWS, Azure, GCP, OCI)
- Identity providers (Entra ID, Okta, Ping, AD)
- Vulnerability scanners (Tenable, Qualys, Rapid7)
- Configuration management and MDM (Intune, Jamf, Kandji)
- Network telemetry (firewall logs, NDR, DNS, flow)
- Physical access and environmental sensors where subscribed
- Direct collector agents deployed by ElasticD3M
Each source is tagged with the CMMC and NIST SP 800-171 control scopes it evidences. Loss of a source automatically marks every control in its scope as stale until telemetry resumes or a fallback source covers the gap.
4. Detection layer
4.1 Heartbeat and staleness thresholds
Every telemetry source has three timers maintained by the platform:
| Timer | Default | Purpose |
|---|---|---|
heartbeat_max | 5 min | Missed heartbeat window before the source is marked degraded. |
stale_window | 30 min | Time without fresh events before the source is marked stale. |
freeze_trigger | 2 hours | Time after which last-known-good evidence is frozen and the control is marked unknown. |
Thresholds are overridable per source and per customer. Higher tiers (Guardian, Vanguard, Fortress, Sovereign) may purchase tighter thresholds and faster response SLAs; all tiers receive the default detection regardless.
4.2 Automatic reconciliation before alert
Before any customer-facing alert fires, the platform attempts re-authentication against the source, retry with exponential backoff for transient API failures, failover to a redundant collector where one is deployed, and comparison against a secondary read-only source if one is mapped to the same control scope. If all of the above fail within stale_window, the source is declared degraded and the response sequence in §5 begins.
4.3 Integrity check
Detection is not just "is data arriving." The platform also watches for volume anomalies (sudden drop-off without a corresponding source change), schema drift (fields disappearing mid-stream), clock skew that would invalidate evidence timestamps, and silent credential downgrade (source authenticating but returning empty results). Any of these will trigger the same response path as an outright outage.
5. Automatic response sequence
When a source crosses the stale_window:
- Freeze last-known-good evidence. The most recent valid evidence bundle for every control in the affected scope is written to an immutable store with an
as_oftimestamp, the triggering reason, and the operator identity (the automated agent). - Mark affected controls as
stale. Dashboards and API responses covering those controls return statusstalewith theas_oftimestamp surfaced in every response. Downstream customer reports show the freeze condition, not a green check. - Raise a customer alert. Three channels fire in parallel: dashboard banner on the customer console, email to designated security contacts, and webhook to the customer's SIEM or ITSM endpoint (if configured).
- Open an internal incident. An incident record is created in the ElasticD3M ops system with severity auto-assigned by control criticality. Watchman and Sentinel tiers receive best-effort response within business hours. Guardian and above receive the SLA defined in their order form.
- Begin tiered remediation (see §6).
- Continue reporting the freeze condition in every customer-facing surface until telemetry resumes and a normal evidence bundle is produced for the affected scope. No back-fill. No synthetic coverage.
This entire sequence is executed by automated agents. A human executive approval is required before the sequence escalates past Tier 3 (see §6).
6. Tiered remediation
Automatic reconnect
The platform retries the source, refreshes credentials, and fails over to redundant collectors. No human action is required. You see a transient degraded state on the dashboard and, if the issue clears within stale_window, no alert is sent.
Expedited collector redeploy
If Tier 1 fails, an automated agent attempts to redeploy the collector from the ElasticD3M deployment pipeline using your existing consent and scope. This is only available for collectors that ElasticD3M owns (our lightweight agent, our cloud-API pollers). Third-party EDR outages fall through to Tier 3.
Fallback source mapping
Every control scope is optionally mapped to one or more fallback telemetry sources at onboarding. Example: if the primary EDR telemetry goes dark, the platform can pivot to SIEM-forwarded EDR logs, or to cloud workload protection data from the cloud provider's control plane. The fallback is always read-only and lower-fidelity; the dashboard clearly marks any control evidenced from a fallback source as fallback_evidence so you and your auditor can see exactly what is covered and by what. Fallback sources are free to configure at onboarding. Customers without fallback mapping skip this tier.
Human-assisted bridge service
If Tier 1–3 cannot restore compliance evidence within the SLA for your tier and you want to avoid a regulatory gap, ElasticD3M may offer a bridge service under a separate statement of work. Bridge services may include on-site or remote collector re-deployment with your approval, temporary direct ingest from customer-provided log exports, structured interviews with your security team to capture evidence of compensating controls (documented explicitly as customer attestations, never as system-produced compliance answers), and expedited review of existing audit artifacts you are willing to share.
Bridge services require a signed SOW and an executive approval from ElasticD3M before the work begins. They are not included in any subscription tier by default and do not change the telemetry-only nature of the underlying product. Any evidence produced via bridge services is labeled as bridge_evidence with the human operator identity attached.
Continuity package
Fortress and Sovereign customers may pre-purchase a Continuity Package that pre-authorizes bridge services, guarantees a response window, and includes a standing set of fallback source mappings maintained by ElasticD3M. This converts Tier 4 from reactive-SOW to a contracted option that can be triggered without negotiation during an incident.
7. What you get during the gap
The promise during telemetry loss is specific and narrow:
- Frozen evidence. The last-known-good evidence bundle for every affected control is preserved with an
as_oftimestamp, the reason for the freeze, and a cryptographic hash so it cannot be tampered with. This is produced as a downloadable report on request. - A clear alert trail. Every alert raised, every remediation step attempted, and every status transition is logged and available for export as an incident report you can hand to your auditor.
- A clear "unknown" signal. Controls that have gone past the freeze trigger show as
unknownin every API and report. No green checks. No synthesized coverage. Auditors see exactly what is and is not known. - A recovery plan. An incident record shows which tier the response is currently in, what the next step is, and what the expected window is.
- Optional bridge services per §6.4 and §6.5 if you want to fill the gap with human-assisted continuity rather than accept a temporary unknown state.
What you explicitly do not get:
- No questionnaire-based fill-in of missing controls.
- No synthetic evidence or AI-inferred "best guess" coverage.
- No backdating of evidence once telemetry resumes.
- No implication to an auditor that the gap did not exist.
This is the product's defensive posture. It is also the reason customers in regulated industries buy it.
8. Recovery and unfreeze
When telemetry for an affected source resumes, the platform waits for stable_window (default 15 min) of clean data before declaring the source recovered. The stale and unknown statuses for the affected control scope are then cleared on the next evidence cycle. The frozen evidence bundle is retained in the immutable store and linked from the recovered control's history. The incident record is closed with a full timeline, including the freeze window and any bridge services provided during the gap. The next customer report shows the full timeline — recovery does not erase the gap from the record.
Backdating is prohibited. If an auditor asks about the gap, you can produce the full incident record and the frozen evidence bundle.
9. Human-in-the-loop controls
Every decision that alters the customer relationship during a telemetry outage requires a human executive approval at ElasticD3M: opening a Tier 4 bridge SOW, extending an existing bridge service, waiving or modifying SLA timers, declaring an incident resolved, or granting any form of courtesy credit.
Automated agents handle detection, alerting, freeze, retry, and Tier 1–3 remediation without human involvement. Anything that changes contracts, pricing, or the compliance posture of record requires a named human with authority to approve it. This is a deliberate design choice and is not negotiable per the ElasticD3M operating principle that an executive always sits in the loop on decisions of record.
10. Testing and drills
- Weekly. Automated failure-injection drill against a representative telemetry source in the staging environment. The drill verifies detection timers, alert fan-out, freeze behavior, and unfreeze on recovery.
- Monthly. Game-day drill that walks a simulated customer through Tiers 1–3 end-to-end.
- Quarterly. Executive table-top covering Tier 4 bridge activation, including SOW drafting, approval flow, and post-incident review.
- Annually. Full third-party review of this plan against the current Terms of Use and customer SOWs to confirm legal alignment.
Results of drills are recorded in the ElasticD3M ops journal and are available to any customer during their next business review.
11. Cross-references
- Terms of Use §6.2 — Telemetry-Only Operation
- Terms of Use §6.3 — Telemetry Continuity and Customer Obligations
- Terms of Use §6.4 — Bridge Services During Telemetry Loss
- Terms of Use §10 — Disclaimers (best efforts, no guarantees)
- Privacy Policy §1 — AaaS scope statement
12. Change control
This document is versioned in the vCISO LP project. Any change to detection timers, response tiers, or bridge service definitions must be reflected in both this page and the corresponding Terms of Use section in the same release. The legal and operational surfaces stay in sync by design.
13. Contact
Questions about this plan or about telemetry continuity for your environment? Contact us at support@ai4ciso.ai.