Oct 7, 2021. By S V Aditya
Before DevOps era, Development and Operations worked in silos with each team having different priorities. Once DevOps became a common practice among industries - these siloes dissolved. DevOps brought modern ideas of design and delivery. A practice of continuous development and continuous improvement came to be. This cycle of continuous improvement and development relies on one critical element - effective feedback. That's where one of the key phases of DevOps comes in - monitoring. Throughout their history, DevOps teams have relied on effective monitoring to know what to improve and test if their improvements are having the right effect. But it isn't enough.
Drawbacks of Monitoring The key problem with monitoring is that it's very static. It simply lacks the agility and speed required for modern DevOps. In the Known-Unknown matrix, monitoring only tackles the Known components. DevOps teams must know what KPIs and logs are important. They also have to know in advance what can go wrong and have to measure the right KPIs and logs to check for it. This a priori knowledge is never enough - which leads to many unknown errors and causes delays in fixing incidents. Ultimately, relying on monitoring is the cause of long Mean Time to Resolve (MTTR) Incidents. Secondly, it makes companies dependant on key employees. Think about how much the loss of an individual who knows the ins and outs of your software means to you. Replacing that level of expertise requires years of facing the same challenges to cultivate the same skills. Thirdly, monitoring is inefficient. Even knowing all of the issues, it takes a large team to filter through hundreds of alerts to find the issues. And it's worse if the incident causes a downtime when the stakes are high and the losses astronomical. Fourthly, Monitoring does not always give enough information. Many alerts are unactionable because the engineers reading these alerts do not understand what it means to the system, or what microservices are being affected.
Modern DevOps needs more to fuel consistent growth and innovation without focusing on a constant cycle of fixes. For that monitoring, as it is, is not enough. In keeping with DevOps philosophy, monitoring must also evolve - into Observability.
Observability in DevOps - Current State Observability is defined in different ways by different DevOps teams, but all agree on one key aspect - it enables holistic insight into your IT stack and does this without requiring patterns and policies in advance. We like to define it as Monitoring++ - an evolution of monitoring using statistical and AI-based techniques to meet the demands of modern software architectures. It brings the DevOps concept of continuous improvement to the process of monitoring itself and helps DevOps engineers do their job more effectively. Observability is the next generation of monitoring.
Let's talk about how Observability works in DevOps today. DevOps engineers use tools to track metrics and logs for all the IT stack. AIOps tools perform advanced KPI analytics to find meaningful alerts and events. Unlike static threshold-based alerts, these metric alerts are based on anomalous behavior for a KPI and truly track deviation from a norm. Similarly, for logs, Anomaly Detection identifies all standard patterns and their frequencies to flag anomalies that are complete departures from established patterns and content in logs. This way a DevOps engineer can zone in on items worth inspecting and ignore over 80% of irrelevant information.
However, it doesn't stop there. Advanced AI techniques like Incident Recognition automatically find new incidents by correlating KPIs and logs and understanding the topology and dependency of microservices. This identifies the root cause of events and further cuts down on the alert noise. Simply by using AIOps based services, a DevOps team can save 95% of their efforts in managing incidents while an application is in production.
Both of these are the "Unknowns" - information that the users are not aware is important or worth looking into. AI helps tackle the unknown and is built with the capability to self-create AI models so that any DevOps team(with or without AI knowledge) can use them. All of this is possible today.
Next-Gen Observability and DevOps - Auto-Instrumentation So what does the immediate future hold for DevOps and Observability? What is Observability++? You would have noticed that we have talked a lot about the Ops half of DevOps and not enough of the Dev side.
With the evolution of new techniques and ideas, AIOps is heading this way - to build solutions that cater to the Development itself. Instrumentation is a key part of enabling observation. This is done by Developers while writing their code and contains one of the drawbacks of monitoring - a priori knowledge of what is worth exporting and measuring. While experienced developers can handle some of these elements right, a lot is simply missed out and only found in production environments. Here, the costs of fixing problems are much higher.
AIOps-powered Auto-Instrumentation is specifically made to address this challenge. Here's how an Auto-Instrumentation tool will work. As developers test their services in pre-production environments, an Auto-Instrumentation tool will monitor all runtime-specific KPIs, System-level metrics, and collect complete log-stream data. Once it has a large enough volume of this data, it will use a range of machine learning techniques to find relevant information. For example, collecting request traces and runtime metrics and clustering the traces affecting runtime. Another example is the analysis of Debug values and finding the variables with the highest variation. Then tracking the data of these variables to classify its predict the likelihood of a fatal error. These are just a couple of examples from a wide repertoire. Post analysis, Auto-Instrumentation appends instrumentation code of relevant KPIs, metrics, and Logs to their services using OpenTelemetry standards
What does this mean for DevOps? First, they get a simulation of potential production errors hidden in their code, along with root cause analysis. Second, the instrumentation of their code is taken care of automatically, cutting down on the planning for important metrics and logs. Third, this brings the entire cost of Observation down severalfold. There is no need to collect irrelevant KPIs and logs that add to storage and instrumentation infrastructure costs but give no insight. DevOps teams automatically instrument code to collect what's important.
As AIOps grows and evolves, its importance to DevOps teams will drive the next generation of Software Design and Development. Teams that build the AIOps culture now stand to gain the most from this evolution. To learn more about Algomox AIOps, please visit our AIOps Platform page