A Tale of Two BGP Leaks
Summary
Doug Madory investigates two large BGP leaks from August 28 and 29, 2023 and how RPKI ROV and other technologies can help mitigate widespread internet disruptions that can result for incidents like these.
“It was the best of routes, it was the worst of routes.”
Earlier this week, the internet experienced another two BGP leaks. Monday saw a path leak emanating from Bangladesh and, on Tuesday, an origination leak from Brazil. While they were brief, they resulted in misdirected traffic and dropped packets, and as such are worthy of investigation.
In this blog post, I’ll look into these leaks using Kentik’s unique data and capabilities and see what we can learn from them.
What are BGP leaks?
“A route leak is the propagation of routing announcement(s) beyond their intended scope.”
That was the overarching definition of a BGP route leak introduced by RFC7908 in 2016. Border Gateway Protocol (BGP) enables the internet to function by providing a mechanism by which autonomous systems (ex: telecoms, companies, universities, etc.) exchange information on how to forward packets based on their destination IP addresses.
In this context, the term “route,” when a noun, is shorthand for the prefix (range of IP addresses), AS_PATH and other associated information relating to packet delivery. When routes are circulated farther than where they are supposed to go, traffic can be misdirected, or even disrupted, as happens numerous times per year.
RFC7908 went on to define a taxonomy for BGP leaks by enumerating six common scenarios, half of which appear in the two leaks covered in this post. In my writing on route leaks, I like to group them into two broad categories: origination leaks and path leaks. As I described in my blog post earlier this year, A Brief History of the Internet’s Biggest BGP Incidents, this distinction is useful because the two types of error require different mitigation strategies.
With those definitions out of the way, let’s get into the two leaks.
Path leak on Monday
Beginning at 11:12 UTC, AS58715 leaked almost 30,000 BGP routes to its transit provider BTCL (AS17494), first reported by our friends at Qrator. These routes were learned from both AS58715’s peers and its other transit providers, making the incident a combination of leak type 1 and 4 from RFC7908.
Once circulated onto the internet, the leaked routes misdirected internet traffic from around the world through AS58715 in Bangladesh. Below is a visualization based on Kentik’s aggregate NetFlow data of the traffic that wasn’t destined for Bangladesh seen flowing to AS58715 via AS17494.
Let’s dig into a couple of problematic BGP announcements to illustrate the leak. These messages can be found in the Routeviews archive here.
In this first message, we see AS58715 passing an Amazon route (13.32.249.0/24) from its peering session with Amazon (AS16509) to its transit provider AS17494 (RFC7908 leak type 4). Although the leaked Amazon routes didn’t propagate very far, AS16509 was still the largest destination for misdirected packets (see graphic above) simply due to the large volume of traffic AWS handles.
TIME: 08/28/23 11:13:01.660163
TYPE: BGP4MP_ET/MESSAGE/Update
FROM: 43.226.4.1 AS63927
TO: 128.223.51.108 AS6447
ORIGIN: IGP
ASPATH: 63927 17494 58715 16509
NEXT_HOP: 43.226.4.1
COMMUNITY: 24115:17494 24115:65012 63927:106 63927:2101 63927:5201
LARGE_COMMUNITY: 24115:1000:1 24115:1001:1 24115:1002:1 24115:1003:40 24115:1004:17494
ANNOUNCE
13.32.249.0/24
In this second message, AS58715 passes a Microsoft route (20.46.144.0/20), learned from one transit provider AS9498, to another, AS17494 (RFC7908 leak type 1). These routes were circulated widely because they are not typically seen in the global routing table and, therefore, faced no competition with existing routes.
TIME: 08/28/23 11:13:06.020567
TYPE: BGP4MP_ET/MESSAGE/Update
FROM: 64.71.137.241 AS6939
TO: 128.223.51.108 AS6447
ORIGIN: IGP
ASPATH: 6939 1299 174 17494 58715 9498 59605 8075
NEXT_HOP: 64.71.137.241
ANNOUNCE
20.46.144.0/20
In fact, if we visualize the propagation of this route using Kentik’s BGP visualization, we can see 85.4% of our BGP sources would have chosen to send traffic to the IP addresses in 20.46.144.0/20 and other similarly leaked routes via AS58715 in Bangladesh.
Since the leak did not change the origins of the routes nor introduce more-specific routes, RPKI ROV wasn’t able to help. In fact, of the leaked routes, 17,173 were RPKI-unknown (without a ROA) and 12,588 were RPKI-valid. In either case, networks rejecting RPKI-invalid routes would not have rejected routes from this leak.
Autonomous System Provider Authorization (ASPA) was designed to address path leaks like this one. ASPA works by allowing ASes to assert their transit relationships in RPKI which enables other ASes to identify improper routes due to valley-free violations and reject them. However, ASPA is still in its early stages and is not yet fully fielded.
Origination leak on Tuesday
And then on Tuesday, August 29, AS266970 accidentally began originating nearly every prefix in the IPv6 global routing table. This lasted for 10 minutes and resulted in the misdirection of a significant amount of internet traffic, as observed in our aggregate NetFlow, pictured below.
Unlike the BGP leak on the previous day, this was an origination leak — the type of leak that RPKI ROV is supposed to help contain, limiting the disruption. So, how much did it help?
Last year, Job Snijders of Fastly and I explored the question of how much does RPKI ROV reduce the propagation of RPKI-invalid routes. It is an important question because that ultimately is the objective of RPKI ROV — to reduce the propagation of problematic routes, thus reducing the disruption they cause.
We can estimate propagation by counting how many Routeviews BGP sources had each leaked route in their routing table during the leak. Then, we can separate these routes by their RPKI evaluation, as shown in the plots below. (Note: The two graphs below contain the same data, but the one on the left uses a log plot.)
Although there can be numerous factors that can influence the propagation of any individual route, it is clear to see that the RPKI-invalid routes propagated less, primarily due to networks rejecting RPKI-invalid routes.
To isolate the impact of RPKI ROV, let’s compare two routes originated by the same ASN, one with a ROA and the other without. 2a02:ee80:4270::/48 lacked a ROA, and the Kentik BGP visualization below illustrates how it fared. During the peak of the leak, 60.7% of our BGP sources saw the leaker (AS266970) as the origin and would have directed its traffic there.
Conversely, 2801:1f0:4017::/48 was also originated by AS3573 and was impacted by the leak. However, at its peak, only 2.4% of our BGP sources saw the leaker (AS266970) as the origin. 2801:1f0:4017::/48 has a ROA that asserts AS3573 as the valid origin, and was hardly impacted by the leak.
Conclusion
Years ago large routing leaks like these might have been the cause of widespread internet disruption. Not so much anymore.
Humans are still (for the time being) configuring routers and, being human, are prone to the occasional mistake. What has changed is that the global routing system has become better at containing the evitable goof-ups. Route hygiene has improved due to efforts like MANRS and the hard work of network engineers around the world.
That progress is largely the macro-level, but there are many individual networks that have not deployed RPKI and to them I would say the following:
- Creating ROAs will help to protect your inbound traffic by asserting to the rest of the internet which origin is the legitimate one — useful during an orignation leak like what happened on Tuesday.
- Conversely, rejecting RPKI-invalids helps to protect your outbound traffic by rejecting leaked routes that might misdirect that traffic or lead to a disruption.
By reducing the impact of BGP leaks, we can focus on the harder problems left to be solved in routing security. Problems such as the “determined adversary” scenario witnessed in last year’s attacks on cryptocurrency services. In that realm, there is still much work to be done.
“It is a far, far better ROA to create, than to never have done. It is a far, far better RPKI-invalid to reject than to forward on.”