Tips for stateful Kubernetes data backup and recovery

Tips for stateful Kubernetes data backup and recovery

All systems need a plan for data backup and recovery. It doesn’t matter if your application runs in the cloud, on premises, or in a refrigerator at the edge of a network; you probably need to store and access the data somewhere. But in our highly connected and distributed world, there’s always the chance that a ransomware attack or misconfiguration could put this persistent storage at risk. Therefore, each instance requires a plan to protect and restore data when something goes wrong.

Lately, we’ve seen a growing interest in using stateful deployments with Kubernetes. The problem is that although microservices and containers are inherently distributed and ephemeral, the underlying data storage must remain intact. Additionally, organizations need a robust disaster recovery plan that accommodates new cloud-native layers and has a decoupled lifecycle. Also, to perform a full restore, you really need to recover not only the data, but also the metadata that surrounds it, such as secrets, permissions, maps, certificates, and network information.

DevOps Connection: DevSecOps @ RSAC 2022

I recently sat down with Gaurav Rishi, VP of Products at Kasten by Veeam, to explore the state of data protection for stateful Kubernetes deployments. According to Rishi, DevOps teams working on cloud-native platforms need a new backup and recovery process that goes beyond legacy modes based on virtual machines (VMs). Next, we’ll consider the history of data management and outline some tips for bringing data backup and recovery to your cloud-native platforms and applications.

Kubernetes data maturity

There are a few key reasons why data protection might be necessary in stateful Kubernetes applications. The first is the rise of stateful apps instead of stateless apps, says Rishi. From a technology standpoint, many users started their container and Kubernetes journey using a stateless approach. But while the intent was first to create stateless building blocks, the cloud-native community soon realized that many business applications needed to be stateful.

You’re also seeing increasingly dynamic application architectures. Today, “polygon persistence” is everywhere: companies no longer support just one relational database, but you often come across an application that uses multiple hidden databases. You may be running databases within Kubernetes clusters or work with managed databases or databases as a service (DBaaS). Ironically, “we’re at a point where databases are the most popular workloads in containers,” says Rishi.

Now that use of Persistent Volumes (PV) is commonplace, the second aspect is the backup of storage volumes behind persistent volumes. Doing so will require bridging the gap between development and operations teams to accommodate ever-changing roles and scopes in IT. “We need to make sure we keep the business application intact from a business continuity perspective,” says Rishi.

Tips for protecting K8 data with state

To build more stable modern systems with high fault tolerance, DevOps must incorporate strong data backup and recovery tactics. So what are some of the best practices DevOps operators should keep in mind when backing up and recovering data? Rishi shares some helpful tips:

Use a native Kubernetes backup architecture. First of all, you really need a backup solution that is specially designed for cloud nativesRishi says. Due to the nuances inherent in cloud-native architecture, virtual machine-based data management does not work well in this world. A Kubernetes-native backup solution must know what is running in the K8 environment and understand the dependencies within an ecosystem of microservices.

Use automation within your data recovery plan. The number of apps is expanding exponentially and most apps will soon be cloud native. However, most analysts identify a growing talent gap to meet the needs of this new paradigm. Therefore, further automation will be required to detect new applications, back them up, and automatically rehydrate data during the recovery process. “Your backup is only serving as a recovery plan,” explains Rishi.

Retrieve in the correct order: When retrieving different modules, the order of operations matters. For example, if you are recovering a set of microservices after a disaster, the logical components that support database and security must first be restored. Recreate clusters, restore data to persistent volumes, and rehydrate databases prior to reset microservices that represent parts of the application. For some cloud-native backup solutions, the order of operations for rehydration services can be defined independently in a YAML file.

Build a process that is agnostic to different database flavors. There are many that vary database types in use today, from SQL to NoSQL, PostgreSQL, MongoDB to CockroachDB, among others. And a large organization might be using a mix of different databases, hosting them in hybrid multi-cloud environments. Also, it is possible that the application is backed up with one solution, but stored with a completely different tool. Therefore, Rishi recommends backup and recovery processes that are not tied to a particular database.

Consider DevSecOps tools. As engineers expect self-service DevSecOps tools, they are likely to anticipate similar left-of-change tools to handle persistent storage for container environments.

Stateful data protection: the next phase of Kubernetes maturity

Although Kubernetes has been open source since 2014, Rishi jokes that it is somehow still “eight years old.” Kubernetes as a culture is still maturing, and along with it, companies are still in the process of setting standards in many areas, such as platform governance, multi-cluster security, authentication, and scalability.

First, the community had to resolve the networking aspect of this new style of computing. The next phase is the storage aspect, Rishi predicts. This is becoming more important to block as ransomware and cloud-native vulnerabilities continue to make headlines. There is also a general lack of skills to harden cloud-native infrastructure, which could expose misconfigurations and access control issues. Secure default practices should be implemented, but data backup procedures should always be in place as a last line of defense.

“Every organization understands the need for backup and recovery,” says Rishi. And while the importance is widely understood, organizations should not lose sight of the unique context of data recovery within cloud-native models.

Leave a Comment