Network observability: Hype or reality?
Summary
If you haven’t heard of network observability, you soon will, and you’ll be hearing it a lot. Some say it is just marketing hype and that networks have always been observable. This post will explore why that’s not the case.
If you haven’t yet heard the term “network observability,” you will be hearing it soon. And I predict you’ll be hearing it a lot. Some say that network observability is just marketing hype from vendors. They say, “networks have always been observable, so there’s nothing new here.” I say network observability is not just vendor hype, and this blog will make the case.
What is observability?
Let’s back up for a minute and talk about what “observability” means. The concept of observability has taken hold in the DevOps, SRE and application performance monitoring (APM) space. Thought leaders, especially Honeycomb, do a great job teaching the industry why observability as a concept is unique and important. The term has a literal engineering definition, that, in a nutshell, means the internal state of any system is knowable solely by external observation. Such a system is said to be “observable.”
In networking, this would mean that you can understand what’s going on in your network by interpreting the network’s telemetry data. In the practice, that would mean answering questions such as: What is causing a drop in traffic? Why is my bandwidth bill so high? And what configuration change caused this behavior?
But we’ve always been able to answer these questions, right?
Observability is not a binary attribute. There are degrees to which a network can be observable. Indeed, classic NPM tools have allowed some investigation of problems. And with yesterday’s relatively simple fixed-configuration networks, it was possible to explore and sometimes find the cause of unexpected problems — some degree of network observability.
But the critical point here is the trend toward cloud networking, and related trends such as SD-WAN have changed the game. There has been a significant loss of observability for the network, and the classic NPM (network performance monitoring) tools have not kept up. Here are some of the problems of classic NPM:
- They can’t handle cloud-scale. It’s typical for a cloud customer to produce many terabytes of VPC flow logs per month. A SaaS-based solution is required to achieve this scale.
- Port numbers and IP addresses are less useful in traffic analytics.
- Maps and data records built by NPM software can be highly inaccurate or completely wrong.
- Traditional NPM tools do not include the metadata on network security and routing policy, cloud/container orchestration information, critical to understanding how container instances are networked.
- The value of packet-capture technology diminishes significantly in the cloud.
So, at a minimum, networks are less observable than they used to be.
What has changed?
Let’s face it, the game for networking has changed. It has changed in two major ways:
-
Cloud changes everything. Cloud networking is not simply a re-implementation of an existing network architecture within a cloud provider’s domain. Cloud-native applications are fundamentally different, and cloud computing creates new and unexpected challenges for networking. Commonly cited problems are a loss of visibility or understanding of the network topology, loss of control over network policies (because developers can now create network constructs on their own), and new networking tools from the cloud providers that are often siloed and shallow in features.
-
Networking is intersecting the APM domain. Network practitioners have always known this, but the fact that the network plays a major role in application performance has become apparent to application developers in the last couple of years. And the vendor community has responded. Cisco/AppDymanics acquired ThousandEyes, Splunk acquired Flowmill, and New Relic has partnered with Kentik as its network observability solution. Datadog, Dynatrace and other market leaders have added significant network observability capabilities to their platforms.
We need to reframe the networking problem/solution
So, now that application developers, SREs, DevOps and cloud infrastructure engineers all have a growing interest in networking, how are we networking veterans going to help and collaborate with our new “network-very-interested” co-workers? Pull out our 15-year-old textbooks on NPM? Teach them how SNMP works? Give them a training class on packet capture and analysis?
Maybe, but I wouldn’t try that. A lot of the underlying networking technology has not changed, except in the cloud where the physical and data link layers are totally different. I don’t want to disrespect anything that we’ve done in networking in the last two decades, but in digitally-evolved organizations, networking needs a new face. Networking needs a new context - and network observability is the right idea at the right time.
Networking in an application context
The most important context for network observability is application performance and the digital experience of users. And since observability tools are used to measure and diagnose problems with application performance, network observability makes perfect sense as the moniker for the networking part of the discipline.
As networkers, there are steps we can take to increase the importance of network observability as a critical part of the picture. One step is to always talk about network issues in the context of application performance. For example:
- Network latency will impact users’ digital experience and can have negative consequences, such as users closing a service or dumping their shopping cart.
- Network jitter can hurt or even disable the use of audio or video applications, for example, on a Zoom or WebEx conference call.
- Network security breaches, such as unauthorized access, put user data and confidential information at risk.
- Network-borne attacks, such as botnets, can paralyze the users’ application performance and response times.
How does Kentik use network observability?
For Kentik, network observability is a theme that aligns with our current solution and our product roadmap. However, just as importantly, network observability is a reframing of the problem that better aligns with the new challenges seen, particularly in cloud networking. And, observability as a concept resonates with the way application, DevOps and SRE teams see their challenges.
As a business, it is important for us to help companies understand what problems we solve, how those problems are impactful to their business and how Kentik’s solution can help. To us, network observability is not a marketing gimmick or just a new label on network monitoring. It is representative of a change in the way networking is understood as a part of the application and the lens through which modern infrastructure and cloud teams see planning, running and fixing the network.
At Kentik, we won’t change our minds about wanting to better explain our solution and have that better resonate with customers. And…if you want to call this good marketing…thanks for the compliment!