The Network Also Needs to be Observable, Part 2: Network Telemetry Sources
Summary
In part 2 of the network observability series, we tackle the first key to the input needed for network observability — from what networks and network elements we gather telemetry data.
Introduction
In part 1 of this series, I talked about the importance of network observability as our customers define it — using advances in data platforms and machine learning to supply answers to critical questions and enable teams to take critical action to keep application traffic flowing.
Most of the history of network operations has been supported by monitoring tools, mostly standalone, closed systems, seeing one or a couple of network element and telemetry types, and generally on-prem and one- or few-node, without modern, open-data architectures.
Networkers running enterprise and critical service provider infrastructure need infrastructure-savvy analogs of the same observability principles and practices being deployed by DevOps groups. We see these DevOps teams unifying logs, metrics, and traces into systems that can answer critical questions to support great operations and improved revenue flow.
We see the network observability platforms, teams, and tool-builders needing:
- Telemetry input from all critical networks and forwarding elements
- The key telemetry types to shine a light on network activity and health
- The critical context that enables teams to ask questions about users, applications, and customers (and not just IP addresses and ports)
In part 2 of this series, to continue diving into what’s needed to make the network observable, we tackle the first key to the input needed for network observability — which networks and network elements to get telemetry from.
Consider the range of network telemetry sources and observation points
To achieve observability in modern networks, it is key to gather the state of all of the networks your application traffic traverses — overlay and underlay, physical and virtual, as well as the ones you run and the ones you don’t.
The breadth of network telemetry sources we see in modern networks include the components of network types such as:
- Cloud infrastructure: Elements specific to the cloud such as service meshes, transit and ingress gateways
- Data center: Leaf and spine switches, top of rack, modular, fixed and stackable. API gateways for digital services.
- Internet and broadband infrastructure: The internet itself that connects the clouds, applications, and users. Access and transit networks, edge and exchange points, CDNs.
- 4G, 5G: Including evolved packet core (v)EPC, Multi-access edge computing (MEC), optical transport switches (ONT/OLT), Radio Access Network (RAN)
- IoT: IoT endpoints, gateways and industrial switches for consumer, smart city, and corporate
- Campus: Ethernet switches, layer 2 and 3 switches, hubs and network extenders. Wireless access points and controller.
- Traditional WAN: WAN access switches, integrated services routers, cloud access routers
- SD-WAN: Access gateways, uCPE, vCPE, and composed SD-WAN services including their cloud overlays
- Service provider backbone: Edge and core routers, transport switches, optical switches, DC interconnects
- MSO: Cable Access Platforms (CAP) and CMTS, Optical Distribution Network (ODN), Broadband Network Gateway (BNG) and the virtualized version of these
It’s also critical to think about the forwarding and control elements and observation points:
- Network devices: Physical and virtual routers, switches, wireless access points, application delivery controllers and a myriad of possible on-prem or cloud-based devices
- Endpoints: Both eyeball and server/service endpoints, including physical, virtual, and overlay/tunnel interfaces
- Controllers: Software-defined network controllers, orchestrators and path computation applications that program network configuration
- TAPs, SPAN, NPB: Access points in the network that provide port mirroring, tapping or packet brokering
- L4-7 network elements: Web, appliance, content delivery networks, application delivery controllers that generate, or route, shape, and control traffic
- Firewalls and other security appliances and services: As physical and logical (VM, VNF, CNF) gateways, policy enforcement, and telemetry sources, the security layer is both part of the network and key to full-stack debugging of operational issues
- Application layer: ADCs, load balancers and service meshes
There’s probably nothing on these lists that comes as any surprise, other than the fact that most companies can’t yet see a unified view across these networks and key elements in one place.
This highlights one of the big challenges of making the network observable. Our networks have been built up with a wide range of devices — from multiple vendors, old and new, physical and virtual — all working together. Network observability must include most, or all of these to be capable of answering the questions critical to keeping application and user traffic fast and available.
The good news is that it’s possible — with modern data platforms and an inclusive, upfront design — to get started, add value, and iterate/repeat towards complete coverage.
Conclusion
In the past, it may have been okay for the network to consist of interconnected islands, each with their own network monitoring tools. With the shift to DevOps and application-driven everything, we simply can’t work in this fragmented way anymore. All of our operational concerns, planning, running and fixing, need to be coordinated across the complete variety of the networks that affect our traffic.
In my next blog, the third in this series, I will discuss the types of network telemetry data that is generated across this wide range of network types and devices.