Using Cross-Region Disaster Recovery for OCC and DORA compliance

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

In the ever-evolving landscape of financial regulations, organizations are required to maintain robust digital resiliency to comply with standards set by regulatory bodies such as the Office of the Comptroller of the Currency (OCC) and the Digital Operational Resilience Act (DORA). These regulations mandate that institutions not only implement strong cybersecurity measures but also ensure continuity of operations in the face of potential disruptions.

Companies operating across multiple regions face the challenge of safeguarding critical data and maintaining uninterrupted service amid increasing regulatory demands and potential cyber threats. With operations spanning different jurisdictions, companies must ensure that data is replicated and recoverable to prevent financial and reputational damage during an unexpected outage or disaster.

To address these challenges, you can use Cross-Region Disaster Recovery (XRDR) to enhance your digital resiliency. By implementing XRDR, you can replicate critical data across multiple geographic locations, ensuring data availability and operational continuity even in the event of regional disruptions. This strategic approach not only supports compliance with OCC and DORA regulations but also strengthens your overall security posture, providing confidence to stakeholders and ensuring your company's ability to swiftly respond to regulatory demands and audits.

Although the examples and subject matter of this article focus on the financial services industry, the concepts transcend industry.

Prerequisites

Splunk Cloud Platform environment using Victoria experience and hosted in Amazon Web Services US-East-1.
Backup data

How to use Splunk software for this use case

Cross-Region Disaster Recovery is a service for Splunk Cloud Platform that ensures continuity during disasters by maintaining an environment replica in a different cloud service provider (CSP) region. In the event of a regional disaster, the service automatically fails over to the backup environment, which continues data ingestion. After the primary region's services are restored, Splunk Support assists in transitioning back to the original environment.

See Cross-Region Disaster Recovery service level agreements and limitations for specific details on the backup Splunk Cloud Platform.

Splunk actively monitors the health of the cloud service provider region hosting your Splunk Cloud Platform environment. If significant service degradation is detected and attributed to cloud service failure, Splunk automatically declares a disaster and initiates recovery procedures without requiring your input. Conditions for recognizing a regional disaster include blocked data ingestion, severe indexing or search outages, inability to log into the platform, and confirmed cloud provider failure.

After Splunk identifies a qualified regional disaster, it alerts customers and initiates a failover, which can be planned or unplanned. A planned failover involves testing the disaster recovery process at your request without disrupting normal operations. Unplanned failovers occur due to unexpected issues, prompting Splunk to shift your environment from the failing primary region to a secondary one by redirecting network caches. Continuous data replication ensures minimal data loss during this process.

After Splunk confirms the recovery of the primary cloud service provider (CSP) region, it ends the disaster phase and coordinates with you to perform a failback to the primary region. This process is initiated promptly, requiring your input to schedule a maintenance window, and must occur within two weeks of the primary region's full restoration. During a failback, there is no expected loss of ingested data. After this is complete, all operations resume on the primary CSP region hosting your Splunk Cloud Platform environment.

Active/standby - normal operations

During normal operations, Splunk Cloud Platform functions in an active/standby mode. In this configuration, data that is ingested into the system is continuously and asynchronously replicated to a secondary site. This means that while the primary site is actively handling operations, the secondary site remains on standby, ready to take over if necessary. Additionally, ongoing monitoring of the secondary site occurs to ensure it is prepared for a failover, ensuring the system can quickly and efficiently switch to the standby site if needed.

Splunk XRDR-Active-Standby.png

Unplanned failover to recovery region

During an unplanned failover, operations are transitioned from the primary cloud service provider region to a secondary recovery region. This process involves redirecting network connections to ensure continuous access to Splunk Cloud Platform. Due to ongoing data replication, the transition minimizes data loss and maintains service continuity. After the secondary region is operational, it temporarily handles all platform activities until the primary region is restored and a failback can be scheduled. This ensures that critical operations remain unaffected during unforeseen incidents.

Splunk XRDR-Failover.png

Failback to primary region

The failback to the primary region occurs after the primary cloud service provider region has fully recovered and is ready to host your Splunk Cloud Platform environment again. This process involves shifting operations back from the secondary recovery region to the original primary region. Splunk coordinates closely with you to schedule a maintenance window for this transition, ensuring minimal disruption and continuity of service. The aim is to complete the failback swiftly, typically within two weeks of the primary region's restoration, without any loss of ingested data. After the failback is complete, all standard operations resume in the primary region.

Splunk XRDR-Failback-Recovery.png

With the successful failback to the primary region, your Splunk Cloud Platform environment is fully restored to its original operational state. This seamless transition ensures minimal disruption and data integrity throughout the process.

Next steps

As you move forward, consider reviewing your disaster recovery protocols to optimize future resilience and preparedness. The use case Establishing disaster recovery and business continuity can help with this.

To learn more about how Cross-Region Disaster Recovery can fortify your organization's infrastructure against disruptions and align with the highest standards of operational resilience, contact your Splunk sales representative today. They can provide detailed information and guide you through the process of implementing this powerful solution to meet your compliance and resilience objectives. You can contact your account team through the Contact Us page.

In addition, these resources might help you understand and implement this guidance:

Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their Success Plan. Engage the ODS team at ondemand@cisco.com if you would like assistance.