Using Rust for Kentik’s New Synthetic Monitoring Agent
Summary
Kentik software engineer Will Glozer gives us a peek under the hood of Kentik’s new synthetic monitoring solution, explaining how and why Kentik used the Rust programming language to build its network monitoring agent.
We recently launched Kentik Synthetic Monitoring. The industry-leading Kentik Network Intelligence Platform is now the only fully integrated network traffic and synthetic monitoring analytics solution on the market, and the only solution to enable autonomous testing ― for both cloud and hybrid networks.
A key component of our solution are hosts running ksynth, Kentik’s software agent for synthetic monitoring. Used in both Global Agents and Private Agents and developed in Rust, ksynth generates synthetic network traffic in the form of ICMP pings, UDP-based traceroutes, and HTTP(S) requests. Performance, reliability, and security are the key reasons why we used Rust to develop the agent.
Managing agents in Kentik Synthetic Monitoring
Design
At any given moment, one of our global agents may be running thousands of traffic generation tasks, so aside from an initial setup process, ksynth execution is almost entirely asynchronous. Kentik has been running Rust in production since 2017 (more on that in this previous blog post), but this is our first serious use of async/await, which was stabilized in late 2019.
The asynchronous design allows us to model each ping, traceroute, or HTTP request assigned to an agent as a distinct task. These tasks are primarily I/O bound, waiting on packets or timeouts, so each agent can run many tasks without requiring significant resources. Results are exported at predefined intervals from another task, and the agent periodically polls the backend for new work and submits status reports from yet other tasks. We use the tokio async runtime with its work-stealing, multi-threaded scheduler to efficiently spread task load across N threads.
Ksynth uses raw sockets for ping and traceroutes, and supports both IPv4 and IPv6, which adds a considerable amount of complexity. IPv4 raw sockets are able to send the full packet starting with the IP header. IPv4 traceroute probes each hop by setting the Time to Live field. However, IPv6 raw sockets do not have direct access to the IP header, instead ancillary data must be passed to sendmsg. That’s why I’ve contributed an open-source crate, raw-socket, that provides this functionality and more for both synchronous and asynchronous raw sockets.
In addition to efficiency, the async/await model makes for very pleasant and readable code as demonstrated by the following snippet extracted from a very simple ping implementation (see https://docs.rs/crate/raw-socket/0.0.1/source/examples/ping.rs):
let mut sock = RawSocket::new(ip4, dgram, Some(icmp4))?;
let ping = IcmpPacket::echo_request(1, 2, b"asdf");
let dst = SocketAddr::new("1.1.1.1".parse()?, 0);
let mut buf = [0u8; 64];
let pkt = ping.encode(&mut buf);
sock.send_to(pkt, dst).await?;
let mut buf = [0u8; 64];
let (n, from) = sock.recv_from(&mut buf).await?;
let pong = IcmpPacket::decode(&pkt[..n]);
Alternatives
Go and Rust are Kentik’s primary backend languages, and this design seems well-suited to Go’s concurrency model, so why did we choose Rust instead? Performance-wise, I’d expect the two languages to be quite similar as the agent is mostly waiting for network and timer events. Rust’s lack of garbage collection and attention to minimizing allocations probably allows the agent to run more tasks—using fewer OS resources—which is beneficial, but not a major factor. It’s security and cross-compilation where Rust really shines for this specific application.
Security
Private Agents run within customer networks, which makes security a key concern. Ksynth relies on raw sockets to send and receive ICMPv4 and ICMPv6 packets and UDP packets with specific hop limits. Traditionally raw sockets require root permissions, but on Linux systems we are able to use capabilities(7) to grant the agent the CAP_NET_RAW
capability while running as a non-root user.
Capabilities are a per-thread attribute so—aside from the well-known benefits of memory safety—Rust is also ideal for setting up this sort of sandboxing prior to starting the async runtime, which will manage spawning new threads as needed.
In comparison, the programmer has little control over when the Go runtime will spawn new threads, and threads are spawned before any user-level code is executed. This is not an issue if the binary is started with the correct capabilities, but we want to be able to drop all unnecessary capabilities from all threads, even if a customer runs ksynth as root with unbounded capabilities.
Cross-compilation
We deliver ksynth binaries for a number of different architectures including x86_64, armv7, and aarch64. The Rust toolchain has excellent support for cross-compiling and, for the most part, you simply need to add a target with rustup
and then pass the --target
option to cargo build
. Ksynth and its dependencies are mostly pure Rust; however, a few crates like zstd and ring contain C code and require a C compiler. To address this, we use the excellent cross tool, which acts as a drop-in replacement for cargo and performs builds inside a container pre-configured with the necessary C toolchain for each target.
Go also has excellent support for cross-compiling pure Go code, simply set the GOOS
and GOARCH
environment variables. However, when CGO code is introduced you then need the C cross-compiler toolchain, and I haven’t seen a Go equivalent of cross
. Additionally, Rust crates that depend on C libraries frequently bundle a copy of the library source and build it at compile time rather than depending on system libraries, pkg-config, etc.
A final point in Rust’s favor is strong support for building statically-linked executables even when cross-compiling and linking to C libraries. We leverage this to minimize the complexity of shipping packages for many different architectures and operating system releases.
Next Steps
We’ve just made Kentik Synthetic Monitoring generally available to our customers and currently have agents running in more than 200 PoPs across 44 countries. By the end of the year, we expect to have hundreds more running and a number of additional synthetic test types. We’re eager to see how ksynth scales and how many tasks each agent can handle. Keep an eye out for future articles discussing this and more on how we’re using Rust at Kentik!
If you’d like to try Kentik Synthetic Monitoring for yourself, you can learn more and request a free trial here.