Why joining Cloud-Native and AI/ML is a win-win

Why joining Cloud-Native and AI/ML is a win-win

Artificial intelligence (AI) is a huge and divisive topic. We often hear about this in the mainstream media as if this is this conscious and evil provocateur, like a Skynet about to take over all of humanity. However, real use cases like self-driving cars and robotics are a bit less threatening. And when it comes to our digital devices, AI is already built into most of the products we use today; for example, our phones and apps already use AI for things like spell checking, noise cancellation, and face detection. Banks now use AI to help detect fraud and health care is adopting AI to make MRI scans faster and cheaper, thereby improving patient outcomes.

Simultaneously, most companies are shifting to cloud-native technologies. This new software stack embraces containerization and leverages open source tools like Docker and Kubernetes to increase the agility, performance, and scalability of digital services. Even highly regulated industries are becoming cloud native.

DevOps/Cloud-Native Live!  Boston

While cloud-native technology maintenance can start easily, the load can quickly increase as multi-cluster and multi-cloud modes begin to materialize. And to streamline their product and software release processes, organizations will often look to run more complex workloads and generate insights from their data.

I recently sat down with Tobi Knaup, CEO and co-founder of D2iQ, to explore the current and future role of AI within the cloud-native stack. According to Knaup, “companies that can figure out how to harness AI in their products will be tomorrow’s leaders.” Next, we will explore two important use cases for cloud-native AI: using cloud-native technology to host and run AI/ML computing, as well as improving cloud-native architecture management through the use of AI.

Use of AI and cloud-native architecture

There are many benefits to running AI with cloud-native tools. An advantage of using Kubernetes is that it can have a centralization effect: it makes sense to run these related components, such as microservices, data services, and AI components, within the same platform. “Kubernetes is a fantastic platform for running AI workloads,” said Knaup. “An intelligent cloud-native platform is needed to run these AI/ML workloads; many of the AI ​​problems have been solved in cloud native.”

Another critical challenge facing AI/ML projects is uncovering day 2 operations. While companies may have many data science experts to build and train models, actively deploying and running those models is a completely different story. This lack of understanding could be the reason why 85% of AI projects they ultimately fail to deliver on their intended business promises. Cloud-native technology like Kubernetes provides a means to actively run these models as an online service that adds value to the mission-critical product, says Knaup.

Benefits of running AI with cloud-native components

AI/ML and cloud-native have similar implementation patterns. The AI/ML field is still relatively young. It turns out that many of the best practices that DevOps has established around cloud-native can also be applied to AI/ML. For example, CI/CD, observability, and blue-green deployment are very well suited to the special needs of AI/ML. “You can create a very similar delivery pipeline for AI/ML as you would for microservices,” Knaup said. This is another reason why it makes sense to run K8 for such workloads.

Cloud-native brings elasticity and resource allocation for AI. AI/ML tends to require very elastic computation: as you train a model on a new dataset, the process can become quite resource-heavy and exhaust GPUs. And if many data scientists are building models and competing for resources, you need a smart way to allocate resources and storage. Cloud-native developers can solve this problem through intelligent resource allocation. Some toolsets, like Liquid Y Volcanothey are explicitly designed for AI/ML scenarios.

You reap the agility of open source. Open source cloud-native projects tend to move very quickly when the community works together. This is similar to activity around open source AI/ML tools like Jupyter Notebook, Torch, or Tensorflow, which are cloud-native and Kubernetes-native. Although there are concerns about the security of open source software, at the end of the day, the more eyes we have on open source, the better. “Since AI is going to be integrated into so many things, we will need to be able to analyze what decisions the AI ​​makes,” explains Knaup.

Cloud native does not mean cloud dependent. First, a machine learning model must be trained on a large data set. It is generally much more cost effective to run heavy number crunching AI on premises than in the cloud. But after training these models, organizations will likely want to perform inference at the edge, closer to where new data is ingested. Kubernetes is great in this regard as it is flexible enough to run in these different operating environments.

“Data has gravity,” says Knaup, and computing should follow suit. Using K8s as an abstraction layer, you can design it once and run it in any environment, whether it’s a security camera system, on the manufacturing floor, or even on board. F-16 fighter jets.

Using AI/ML to help improve Cloud-Native

On the other hand, there are many ways that artificial intelligence could help manage and optimize cloud-native technology. “You can make an endless list,” says Knaup.

Using AI to automate root cause analysis. First, AI could help human operators diagnose issues with their cloud-native tools more efficiently. Kubernetes is quite complex and can be integrated with many other components, such as service mesh for ingress control or OPA for policy management.

When a failure occurs in such a complex distributed system, it is often a challenge to reconstruct the root cause of the problem. Engineers must discuss metrics and data sources from many sources to debug the problem. In doing so, they often follow a similar set of patterns when aggregating this data. Using AI to find these patterns could help human operators diagnose problems more effectively. This would speed up resolution time, which, in turn, would increase overall availability and security.

Using AI to predict and prevent problems. Another possibility is to use AI to detect and prevent problems altogether. In marketing, it is common to use end-user data to inform predictive analytics. But if we were to apply predictive analytics to cloud-native statistics, what valuable data could we discover? For example, suppose a monitoring tool can predict that, based on past usage, a specific disk will be at 80% capacity in four hours. Platform engineers could then make the appropriate changes in time to avoid any service interruptions. Such predictive service level indicators could become another useful benchmark for SRE.

Using AI for performance optimization. There is ample room for AI to suggest performance optimizations to tune how cloud-native infrastructure works. The results could inform which knobs to tweak to adjust computational efficiency or how to better schedule machine learning workloads.

final thoughts

When we consider AI/ML and cloud native, everyone wins. Cloud-native technology can support AI/ML goals in terms of elasticity, scalability, and performance. At the same time, there are many benefits that AI can bring to streamline cloud-native architecture maintenance.

AI is a burgeoning field, with thousands of algorithms now in the open source realm. TensorFlow Center alone has hundreds of free and open source machine learning models for working with text, images, audio, and video. For this reason, Knaup recommends betting on an open source strategy for AI.

However, successfully working with AI will come down to discovering the right algorithm for your use case. While there are a relatively small number of algorithm classifications, finding which one is best for your problem and applying it to your situation requires domain expertise, Knaup explains. “You need to understand the problem space and how to apply the best AI algorithms,” he said.

Leave a Comment