IT outage rate not slowing, power most common cause • The Register

IT outage rate not slowing, power most common cause • The Register

Infrastructure operators are struggling to reduce the rate of IT outages despite improving technology and heavy investment in this area.

Uptime Institute 2022 Outage Analysis Report says progress toward reducing downtime has been mixed. Investment in distributed resiliency and cloud technologies has helped reduce the impact of failures at the site level, for example, but has also added complexity. An increasing number of incidents are attributed to network, software, or system issues due to this complexity.

The authors make it clear that critical IT systems are much more reliable than they once were, thanks to many decades of improvement. However, data covering 2021 and 2022 indicates that unscheduled downtime continues at a rate that is not significantly reduced from previous years.

The majority of organizations, 80 percent, have experienced an outage in the last three years, with about one in five of those surveyed saying they had experienced a major or severe outage during the same time period.

“Serious” and “severe” are the two highest ratings in the Uptime Institute’s five-tier category classification for outages. “Serious” covers service interruption with potential financial loss or compliance violation, while “major” covers major and damaging service interruption with potentially large financial loss.

Based on the data it has collected, the Uptime Institute report suggests that each year there will likely be at least 20 major IT outages around the world causing major financial loss, customer and business interruptions, and loss of reputation.

When it comes to the cause of outages, the report notes that in addition to a root cause, most outages have other factors that also contribute to an incident. Power failures are listed as the most common cause of outage, with 43 percent of them listing this as the top factor, followed by software, network, and cooling, accounting for roughly 14 percent of incidents.

In the Uptime Institute’s annual resiliency survey, one of the data sources for the Outage Analysis Report, network issues were listed as the most common cause of all end-to-end IT service outages overall, and energy-related issues ranked second.

The Uptime Institute also found that third-party business operators, such as cloud, hosting, and colocation providers, accounted for nearly 63 percent of all public outages over a five-year period, and this percentage has increased year over year to 71 percent. percent during 2021.

However, the keywords here are “public outage,” and the report’s authors note that the reliability of public cloud services has come under increased scrutiny in recent years as a result of some high-profile outages, as well as the growing interest in running critical services in the public cloud.

However, the survey found that business IT managers are “somewhat concerned” about the resilience of public cloud services, with only 13 percent of respondents saying public cloud services are sufficiently reliable to run all your workloads, and the number of “I don’t know” responses has increased since last year.

Delving into the causes, the Uptime Institute found that UPS failures are the most common reason for power-related outages, followed by generators, transfer switches, and power distribution units.

The most common reasons behind a network-related outage are a link between change/configuration management errors and a third-party network provider failure. This is not surprising in modern network environments, the report states, where networks are constantly being upgraded to optimize performance or meet new requirements.

Another trend reported by the Uptime Institute is that the duration of outages also appears to be increasing, at least for publicly reported outages. This is concerning because an outage is likely to be more costly and disruptive the longer it lasts.

In 2021, the number of publicly reported outages lasting more than 48 hours was 16%, compared to 4% in 2017, while those lasting 24-48 hours were 12%, compared to 4% in 2017.

The cost of interruptions has also increased. In 2019, an estimated 60% of major failures cost less than $100,000, while 28% cost between $100,000 and $1 million. In 2021, only 39% cost less than $100,000, while 47% cost between $100,000 and $1 million. The share of outages costing more than $1 million grew from 11 percent to 15 percent.

The data that powers the Outage Analysis Report comes from four main data sources, according to the Uptime Institute. One of these is a public outage database it maintains, another is a confidential system for members to report abnormal incidents, and the other two are its Global Survey of IT and Data Center Managers and Data Center Resiliency Survey. of data. ®

Leave a Comment