Kentik - Network Observability
Back to Blog

The Rise of the Machines? Bring It On!

Stephen Collins
Stephen CollinsPrincipal Analyst, ACG Research
Terminator

Summary

Applying AI and machine learning to network infrastructure monitoring allows for closed-loop automation. In this post, ACG analyst Stephen Collins discusses the benefits, including how real-time insights derived from streaming telemetry data are fed back into the orchestration stack to automatically reconfigure the network without operator intervention.


Tech luminary Elon Musk is notable for bold statements, and he shocked people when he opined that AI is probably humanity’s “biggest existential threat.” Musk voiced his concern that AI developers like Google might “produce something evil by accident”—perhaps “a fleet of artificial intelligence-enhanced robots capable of destroying mankind.” For the record, over at Facebook, which is also into AI in a big way, Mark Zuckerberg said that Musk’s concerns are “hysterical.”

My opinion is that AI, like any technology, can and will be abused for evil intent, just as it will be applied in a myriad of ways that are benign and beneficial. I harbor no fears that one day malevolent robots controlled by SkyNET will be lording over our species.

As businesses embrace digital transformation and move IT applications to hybrid multi-cloud environments, enterprise IT managers are looking to AI-powered automation tools to master the complexity of cloud-scale operations. Existing tools and techniques are inadequate and network, IT and security operations teams tend to adopt a siege mentality. “Alarm fatigue” is a common affliction, with operators accessing multiple screens and checking a variety of dashboards to detect performance anomalies and failures while overwhelmed by a flood of non-critical events and alerts.

Automation is critical for alleviating this crushing operational burden, which drives up operating costs, leads to employee burnout, and too often results in outages that are the result of human error. Automation can streamline operator workflows, taking humans out of the loop for tasks that are better suited for machines.

Robots never tire, don’t get distracted or confused, and if programmed properly by their human masters, don’t make stupid mistakes.

Hybrid multi-cloud networks are by necessity software-driven, incorporating orchestration software that enables network operators to automate common tasks that used to be performed manually. NETCONF and YANG are helping to streamline network configuration. Intent-based networking takes automation one step further, allowing operators to specify desired network outcomes using business rules which are then translated into configuration data and automatically pushed out to the underlying network infrastructure.

Automating infrastructure configuration is a big step forward, but hybrid multi-cloud environments are highly dynamic with frequently shifting workloads and changing network conditions. Performance bottlenecks may arise inside data centers or across wide area networks. Internet outages and DDoS attacks can impact application performance at any time. Operators need to scale up network capacity in response to increased demand.

The challenge of monitoring cloud-scale infrastructure is what keeps IT managers up late at night. Literally. The tools that have served a generation of IT managers well are not holding up in the hybrid multi-cloud era. Root cause analysis is too often tedious and time-consuming, particularly for insidious “gray failures” that are difficult to detect. It is no longer cost-effective or practical to rely on people to sift through vast amounts of data and mentally correlate multiple data points to determine the source of problems.

This is where AI and machine learning technology will shine. Data scientists and programmers are creating software robots that employ machine learning algorithms to tirelessly analyze vast data sets of streaming telemetry to immediately detect performance anomalies and incipient gray failures that human operators might miss altogether or not discover until it is too late, resulting in a major outage. Machines are capable of ingesting the vast amounts of data this requires, and these large data sets serve as the basis for training machines to do a better job of identifying specific patterns over time.

Applying AI and machine learning to network infrastructure monitoring allows for closed-loop automation in which real-time insights derived from streaming telemetry data are fed back into the orchestration stack that automatically reconfigures the network without operator intervention. At this time, in most cases, network operators will want to verify AI-based root cause analysis before taking action, but as the technologies are refined, the robots will get better and capable of operating more autonomously.

There are optimistic futurists who believe that AI will evolve to a point where it will be used to augment human cognitive functions, but not supersede or control mankind. Although perhaps still well over the horizon, this is a vision I can embrace. I think the current generation of beleaguered IT managers can as well.

So am I worried about the rise of the machines? Heck no. Bring it on!

View in Prod
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.