Lessons from Google on Converging Network Technologies with SD-WAN
Summary
Google is investing and innovating in SD-WAN. In this post, Kentik CTO Jonah Kowall highlights what Google has been up to and how Google’s SD-WAN work can apply to the typical organization.
Today, SD-WAN is an overlay. However, these systems are evolving, and as they do, we can see the future coming together where SD-WAN becomes increasingly intelligent and driven by multiple signals. The future is, once again, being pioneered by today’s forward-thinking organizations that are building this reality.
Google is a perfect example of a company that is often 5 to 10 years ahead of the industry, including for SD-WAN innovation. After all, they operate one of the largest technology platforms on the planet.
The recent NANOG 78 conference, which converged into Kentik’s hometown of San Francisco, provided the opportunity to listen to Google’s networking leadership, as well as other very inspirational and interesting talks.
The first day of NANOG kicked off with a keynote from Amin Vahdat, Ph.D., engineering fellow and vice president at Google. This fascinating talk went into detail about how Google constructed and operates one of the most sophisticated and high-scale networks in the world. Dr. Vahdat also gave a deep retrospective of an outage in the summer of 2019, including what the team learned and how they evolved based on the outage.
In summary, Google has created multiple WANs: one designed for computer-to-computer communications (B2) and then an SD-WAN, called B4, which they have been building over the past decade. Google has also augmented its peering with a sophisticated edge that Dr. Vahdat explained. The platform, known as Espresso, understands the applications and importance for making routing decisions to optimize performance. One of the key elements of Espresso is the software-defined peering driven by measuring every TCP connection but done at each server. Google can build a sampled aggregate of the performance.
Dr. Vahdat also explained that Google does not do this in real-time, but rather every “single-digits minutes” (meaning it’s not every minute, but not every 10 minutes either), and the data from that system is fed to the edge of the network to change the routing and paths. Google uses standard MPLS labels to decide on the egress so they can communicate with their peers. This is more easily conveyed by the high-level diagram of the system (featured above).
The second day started with a keynote from Bikash Koley, vice president of global networking at Google, who was formerly the CTO of Juniper. Koley’s talk highlighted how Google is building a true intent-based network. They are using a modeling language compiled into instructions for the network to enforce these policies. The key to this is using heavy amounts of monitoring and streaming telemetry from the network devices. Google has been building this closed-loop system, but they do not have a self-driving network yet. They do, however, have very ambitious future plans to leverage the Tensor Processing Units (TPUs) designed for AI and machine learning to make more sophisticated decisions of the network.
How does Google’s work apply to the typical organization?
If we fast forward, a key construct for the network of the future is the incorporation of telemetry and monitoring data to drive the network and automate network-level decisions. SD-WAN is also a critical way to overlay across multiple vendors and technologies. Ultimately, there must be a platform for real-time decisions required to make the self-driving network a reality. The beginnings of these components already exist, but many partnerships, acquisitions, and new technologies must be built before this can be something folks can purchase or use.
This platform of the future and the concepts being forged by Google will trickle down into our networks over the coming decade, As a result, we will see much more intelligent WANs that leverage telemetry to enable better reliability and security to improve the network and, ultimately, the application performance to drive better user experiences.