High availability and security for cloud-based systems

High availability and security for cloud-based systems

IT organizations that strive to ensure compliance with HIPAA, Sarbanes-Oxley, BASEL II, and similar regulations generally have a good understanding of the security considerations that apply to key hardware and software systems running in the cloud. They need to manage user authentication and access control, disk encryption, upgrade planning, and backup/restore. But there are high availability (HA) and disaster recovery (DR) considerations that may not be so obvious. Not only may there be a need to ensure that key systems are available no less than 99.99% of the time, but there may also be a need to ensure a DR infrastructure is in place to ensure continuity of operations in the event of a regional catastrophe. which removes the core cloud infrastructure. While there are numerous options for configuring HA and DR solutions in the cloud, not all are suitable for solutions designed with compliance in mind.

Configuration for HA in a regulated environment

The first thing any organization in a regulated environment needs to understand is that your IT team is ultimately responsible for data and application security, especially if you use an infrastructure-as-a-service (IaaS) offering from a service provider in the cloud. . A provider like AWS, Azure, or Google Cloud Platform (GCP) may be responsible for maintaining the virtual machine (VM) infrastructure you’re using, but you’re responsible for patching the operating system and applications, configuring access and maintain the integrity of software and solutions running on top of that virtual machine.

DevOps Connection: DevSecOps @ RSAC 2022

Then, depending on business, industry, or regulatory requirements, you may need to configure certain critical applications for HA, meaning they will be available no less than 99.99% of the time. In the cloud, you’ll want to run those critical applications on VMs configured as nodes in a failover cluster, which is a software-intelligent, multi-node compute cluster that will immediately move workloads from one VM to another if the first VM fails. becomes numb. To ensure 99.99% availability, you’ll need to configure the virtual machines in your failover cluster in at least two different Availability Zones (AZs), which are, in practical terms, separate data centers. That way, if an entire AZ goes offline, the primary and secondary VMs won’t go offline; failover clustering can move critical workloads to a virtual machine in the availability zone that remains online.

However, what is critical from a security and regulatory perspective is that all secondary virtual machines are configured identically to their primary virtual machines. The ACLs and audit controls that you apply to VMs in one AZ must also be applied to VMs in the second AZ. You’ll also need to ensure that any security infrastructure updates that affect virtual machines in one Availability Zone are also applied to virtual machines in the other Availability Zone.

Cloud data protection

Ensuring the security and integrity of your data is always of the utmost importance in a regulated environment, but in the cloud, you may need to manage your storage differently than you would in an on-premises setting. Some cloud service providers offer shared storage options that seemingly allow you to set up a failover cluster the same way you would an on-premises setup. However, not all shared storage options can be configured to support a failover cluster that spans multiple Availability Zones. Not all of these shared storage options allow the level of data encryption that regulatory compliance (or your board of directors) may require.

Consequently, many organizations create cloud failover clusters with storage attached to each virtual machine. This approach provides the best combination of security, availability, and flexibility to meet business and regulatory requirements. So the question is what is the best way to replicate data, securely and quickly, from primary to secondary storage so that the secondary infrastructure steps in immediately if the primary infrastructure unexpectedly goes offline.

Some ERP and database applications offer built-in data replication services, but they are often designed to replicate only native data (eg SQL Server database). These services ignore any other data in storage, and that data may be critical from a business or regulatory standpoint. Some of these tools also provide only data replication support, not the cluster failover management support that is critical for high availability. More complete solutions can be found in application-independent SANless clustering products, which provide complete failover management and synchronous data replication services. If your data is encrypted, you’ll want a solution that provides block-level replication, as block-level replication tools don’t care about the nature or origin of the data being replicated. They simply replicate blocks of data from one storage system to another. Also look for synchronous replication services, as this ensures that the data replicated to the secondary virtual machines is always identical to the data attached to the primary virtual machine. In a failover scenario, the secondary VM can immediately take over the primary VM’s workloads and continue without data loss or corruption.

Disaster recovery considerations

Even if your critical applications don’t need the 99.99% availability that a high availability configuration provides, the regulations that govern your industry may require you to protect your regulated data against catastrophic loss. The same approach for cloud SAN-less clustering can be used to set up a disaster recovery (DR) solution. Instead of configuring your VMs in a failover cluster that spans two AZs in the same region, you would configure your VMs in AZ in two geographically different regions.

The distance between Availability Zones in this configuration will likely require the use of asynchronous data replication services between the primary and DR storage infrastructures. This can create windows during which the primary and secondary infrastructures are out of sync for a few seconds. If your region’s Availability Zones were to go down during this period, you could bring your DR infrastructure online quickly and with only a few seconds of data loss, ensuring minimal operational disruption in the event of a disaster.

Leave a Comment