Remember the IBM Watson use case for x-ray medical image inspection? AI and ML helped to point out areas in x-ray images that doctors may want to look at, helping them to focus their attention rather than dumbly scan it all manually.
In the world of IT Operations we have seen a strong increase in the number of systems and logs during the last decade.
In cloud world it is easy to spin up additional systems, and during last decade the amount of systems under management has increased for us all.
All systems write logs, you easily end up with terabytes per day in a somewhat complex system.
When something is about to go wrong, or already has gone wrong, it will be documented in some log somewhere.
The challenge in critical mission situation is to find a few correlating needles-in-the-hay-stack that help the experts to resolve the situation (and often under time pressure).
Historically we deployed monitoring and management systems with limited rule sets, mostly based on data from individual system logs, not correlating different logs, and definitely not taking into concern trend information. We relied on humans to overlook dashboards and correlate events to identify pro-actively where things go in wrong direction. (As this an overwhelming demand, we often we only able to react once things had gone wrong, and start the correlating and needle-in-hay-stack finding then).
It is no longer humanly possible to screen all the logs and dashboards. The complexity and amount of log data generated is too much. (and simply adding more cheap human labor is not the answer).
We have to start leveraging automation to help us work smarter.
With automations, and machine learning we can shift from reacting after things have gone wrong, to “dear human, something may be going in the wrong direction here, you may want to focus your attention on looking into analyzing this part” assistive, pro-active approach.
What if the x-ray analogy is applied to IT operations? It allows to have humans focus their attention on situations that are about to go bad, and support smart decision taking.
Practical AI enabled IT Ops services are emerging like preventive maintenance on Dell PC with the SupportAssist service and Intelligent operations services also known as AIOps with Oracle.
In both cases processing power is not merely used to reflect on historic data, yet rather to facilitate pro-activeness, predicting where things starting to go wrong by correlating event information, and filtering out patterns for human analysts to focus their attention on.
