Organizations in DEJ’s recent study, Modernizing IT Operations for Digital Economy, reported that they experienced, on average, an 88% increase in processed metrics, events and alerts over the last 12 months. Additionally, 42% of organizations are reporting that the technology solutions that they purchased in the past are not as effective when working with this level of volume and velocity of data.
This comes as a result of several technology trends and some of the most critical ones are:
1 – Increased complexity. An influx of new technologies and management concepts, such as microservices, continuous delivery, distributed applications and others are causing data center and IT environments as a whole to be transforming at a rapid speed. This is driving an increase in the volume of events and alerts that has to be processed in real-time and at scale.
2 – Mix of monitoring tools. Forty-one percent of organizations in DEJ’s research reported that they are using ten on more tools for monitoring IT performance. These solutions often specialize in specific aspects of IT services such as network, application or system performance and have their own alerting capabilities. However, there is very little correlation between data collected by these solutions and they often operate in different silos. That is causing a storm of alerts that are often: 1) missing context for solving the problem; 2) creating overlap by issuing multiple alerts for the same incident; 3) not actionable.
3 – No centralized platform. Only 36% of organizations reported that they have a “single pane of glass” solution for centralized view and processing of all of their IT alerts and events. Interestingly, these organizations are processing 2.4 times more events and metrics as compared to their peers, but they are able to outperform organizations that do not have this capability in the number of KPIs.
4 – Lack of context. Alerts and events generated by monitoring tools very often lack context when it comes to severity, patterns or correlation. As a result, organizations in DEJ’s upcoming study on IT Incident Management (ITIM) are reporting that, on average, 82% of help desk tickets are not actionable.
The inability to address this issues at scale impacts organizations on both operational and business levels. Organizations are reporting that lack of effectiveness in dealing with alert and event overload has a significant impact on areas such as productivity of IT staff (64% of organizations), average mean time to repair per incident (52%) and root cause analysis (45%). These operational challenges are directly correlated with business problems, as organizations are reporting that due to issues with alert overload, some the key business metrics are being impacted – operational cost (57% of organizations), customer satisfaction (46%) and cost due to unplanned downtime (44%).
Processing this volume and velocity of events and alerts requires a new generation of technology capabilities as manually writing rules, which organizations are used to doing in the past, is no longer humanly possible to do at scale. Organizations are increasingly realizing this fact, as 79% of participants in the ITIM study reported that adding more IT staff to address this problem is not an effective strategy. There a several technology approaches that are focusing on this problem and the most effective ones are relying on different flavors of machine learning, algorithmic even correlation and configuration and automated pattern recognition delivered through a centralized platform.
The research shows a significant impact of event and alert overload and organizations are increasingly understanding that putting more people on a problem is not a solution. As organizations are looking to transform their IT departments to create a business value, allocating more valuable IT resources to processing millions of events means not only working against their transformation goals, but also playing a losing game.
One of the key messages of DEJ’s Modernizing IT Operations study is that IT Operations management is increasingly becoming an automation, data management and analytics game. IT Incident Management is a perfect example of that and deploying algorithmic correlation, machine learning and automating incident notification and resolution is a key part of strategies that top performing organization are deploying.