
Summary
In part 2 of our tour of Kentik Data Engine, the distributed backend that powers Kentik Detect, we continue our look at some of the key features that enable extraordinarily fast response to ad hoc queries even over huge volumes of data. Querying KDE directly in SQL, we use actual query results to quantify the speed of KDE’s results while also showing the depth of the insights that Kentik Detect can provide.
SELECT src_geo,
dst_geo,
Sum(both_bytes) AS f_sum_both_bytes
FROM big_backbone_router
WHERE src_geo <> ‘US’
AND dst_geo <> ‘US’
AND i_start_time > now() - interval ‘1 week’
GROUP BY src_geo,
dst_geo
ORDER BY f_sum_both_bytes DESC
LIMIT 10
|** src_geo | dst_geo | f_sum_both_bytes **|
| HK | BR | 27568963549063 |
| GB | BR | 8594666838327 |
| NL | DE | 6044367035356 |
| HK | GB | 6004897386415 |
| HK | SG | 5305439621766 |
| HK | CO | 4893091337832 |
| NL | BR | 4330923877223 |
| HK | JP | 4086102823771 |
| HK | PL | 3833512917644 |
| HK | TR | 3501243783418 |
SELECT 10
Time: 0.675s
The first row, HK (Hong Kong) → BR (Brazil) seems like an interesting, unlikely pair. Let’s filter on that and see who was talking to whom from a network/ASN perspective:
SELECT src_as,
i_src_as_name,
dst_as,
i_dst_as_name,
Sum(both_bytes) AS f_sum_both_bytes
FROM big_backbone_router
WHERE src_geo = ‘HK’
AND dst_geo = ‘BR’
AND i_start_time > Now() - interval ‘1 week’
GROUP BY src_as,
i_src_as_name,
dst_as,
i_dst_as_name
ORDER BY f_sum_both_bytes DESC
LIMIT 10
|** src_as | i_src_as_name | dst_as | i_dst_as_name | f_sum_both_bytes **|
| 65001 | Global Transit Net | 65101 | Eyeball Net A | 11686340849432 |
| 65001 | Global Transit Net | 65201 | Eyeball Net B ASN1 | 5076468451751 |
| 65001 | Global Transit Net | 65301 | CDN Net A | 3337948976347 |
| 65001 | Global Transit Net | 65102 | Eyeball Net C | 1261908657743 |
| 65001 | Global Transit Net | 65103 | Eyeball Net D | 1234101190857 |
| 65001 | Global Transit Net | 65104 | Eyeball Net E | 1211922009485 |
| 65001 | Global Transit Net | 65105 | Eyeball Net F | 334959552542 |
| 65001 | Global Transit Net | 65202 | Eyeball Net B ASN2 | 247936925394 |
| 65001 | Global Transit Net | 65106 | Eyeball Net G | 229671528291 |
| 65001 | Global Transit Net | 65401 | Enterprise Net A | 209961848484 |
SELECT 10
Time: 0.458s
Substring and regex matching If we wanted to drill down on the first row, we could additionally filter on the specific source/dest ASNs. But let’s filter on the ASN names instead, so we can see how KDE supports SQL substring and regular expression matching on text columns. Substring/regex matching also works on other strings such as interface names and descriptions, AS_PATHs, and user-defined flow tags.
SELECT ipv4_src_addr AS f_cidr24_ipv4_src_addr,
ipv4_dst_addr AS f_cidr24_ipv4_dst_addr,
Sum(both_bytes) AS f_sum_both_bytes
FROM big_backbone_router
WHERE i_src_as_name ~ ‘Peer|Transit’
AND i_dst_as_name LIKE ‘%Eyeball Net A%’
AND i_start_time > Now() - interval ‘1 week’
GROUP BY f_cidr24_ipv4_src_addr,
f_cidr24_ipv4_dst_addr
ORDER BY f_sum_both_bytes DESC
LIMIT 10
|** f_cidr24_ipv4_src_addr | f_cidr24_ipv4_dst_addr | f_sum_both_bytes **|
| 10.129.204.0 | 10.156.25.0 | 115419904954801 |
| 10.221.155.0 | 10.141.201.0 | 78651524382556 |
| 10.156.25.0 | 10.129.204.0 | 62329500664567 |
| 10.254.246.0 | 10.31.38.0 | 39162753399340 |
| 10.117.35.0 | 10.31.38.0 | 39073550458830 |
| 10.144.99.0 | 10.254.210.0 | 28582936121869 |
| 10.31.73.0 | 10.254.244.0 | 27632400104976 |
| 10.31.75.0 | 10.17.153.0 | 26265050083173 |
| 10.144.99.0 | 10.254.244.0 | 25763076333705 |
| 10.17.100.0 | 10.93.63.0 | 23713868194889 |
SELECT 10
Time: 0.980s
As we see above, KDE can aggregate IP address columns by arbitrary subnet masks. In this query we’ve grouped the data by source → dest /24 subnet pairs. KDE also natively understands IP addresses and CIDR notation in filters, so we can drill down on the top subnet pair and look at pairs of individual IPs:
SELECT ipv4_src_addr,
ipv4_dst_addr,
Sum(both_bytes) AS f_sum_both_bytes
FROM big_backbone_router
WHERE ipv4_src_addr LIKE ‘10.129.204.0/24’
AND ipv4_dst_addr LIKE ‘10.156.25.0/24’
AND i_start_time > Now() - interval ‘1 week’
GROUP BY ipv4_src_addr,
ipv4_dst_addr
ORDER BY f_sum_both_bytes DESC
LIMIT 10
|** ipv4_src_addr | ipv4_dst_addr | f_sum_both_bytes **|
| 10.129.204.41 | 10.156.25.5 | 101922511168965 |
| 10.129.204.34 | 10.156.25.4 | 16534277019052 |
| 10.129.204.69 | 10.156.25.79 | 12821801454 |
| 10.129.204.85 | 10.156.25.79 | 12408606234 |
| 10.129.204.116 | 10.156.25.79 | 11170668135 |
| 10.129.204.110 | 10.156.25.79 | 11078339112 |
| 10.129.204.76 | 10.156.25.79 | 10895308401 |
| 10.129.204.84 | 10.156.25.79 | 10497115055 |
| 10.129.204.115 | 10.156.25.79 | 10361345421 |
| 10.129.204.75 | 10.156.25.79 | 9923494659 |
SELECT 10
Time: 0.660s
Time-series data Summary tables are great, but often we want time-series data to build visualizations. In KDE, this is as simple as adding a time column to the SELECT and GROUP BY clauses. Let’s take the top IP pair from the results above and get time-series data (bytes and packets) over the last week:
SELECT i_start_time,
Max(i_duration),
Sum(both_bytes) AS f_sum_both_bytes,
Sum(both_pkts) AS f_sum_both_pkts
FROM big_backbone_router
WHERE ipv4_src_addr LIKE ‘10.129.204.41’
AND ipv4_dst_addr LIKE ‘10.156.25.5’
AND i_start_time > Now() - interval ‘1 week’
GROUP BY i_start_time
ORDER BY i_start_time ASC
|** i_start_time | max | f_sum_both_bytes | f_sum_both_pkts **|
| 2016-03-10 23:00:00+00:00 | 3600 | 475231866603 | 558646000 |
| 2016-03-11 00:00:00+00:00 | 3600 | 141987820665 | 180911990 |
| 2016-03-11 01:00:00+00:00 | 3600 | 85119841990 | 130098569 |
| 2016-03-11 02:00:00+00:00 | 3600 | 102749092833 | 124245217 |
| 2016-03-11 03:00:00+00:00 | 3600 | 40349266424 | 74404852 |
| 2016-03-11 04:00:00+00:00 | 3600 | 47615668871 | 80084659 |
| 2016-03-11 05:00:00+00:00 | 3600 | 39601556357 | 71966274 |
| 2016-03-11 06:00:00+00:00 | 3600 | 44595721100 | 55644084 |
| 2016-03-11 07:00:00+00:00 | 3600 | 36984645947 | 73379683 |
| 2016-03-11 08:00:00+00:00 | 3600 | 57309415120 | 86561840 |
| 2016-03-11 09:00:00+00:00 | 3600 | 221576669330 | 219835996 |
SELECT 168
Time: 0.430s
It doesn’t make sense to return 10,800 1-minute intervals for a week-long time series query; you can’t display that many data points in a visualization that fits in your average web browser. So KDE auto-selects the interval width to return an appropriate number of points / rows, based on the overall time range covered by the query. In this case, for a query covering a week, we get 168 one-hour intervals. For more information, see Time Rounding. KDE can also return a column showing the width (in seconds) of each interval so we (or the Kentik Detect frontend) can easily calculate rates like bits/sec or packets/sec. Maintaining full granularity None of the per-interval data above was pre-calculated; it was generated on the fly, at query time, from the individual records that we saw in the response to the first query in part 1 of this blog series. That means that when we see an area of interest it’s easy to narrow the time range or apply filters to drill down. We can get full-resolution, 1-minute granularity for any historical period within the data that KDE stores, such as this one-hour time range from 90 days ago:
SELECT i_start_time,
Max(i_duration) AS i_duration,
Sum(both_bytes) AS f_sum_both_bytes,
Sum(both_pkts) AS f_sum_both_pkts
FROM big_backbone_router
WHERE i_start_time > Now() - interval ‘2159 hours’
AND i_start_time < Now() - interval ‘2158 hours’
GROUP BY i_start_time
ORDER BY i_start_time ASC
|** i_start_time | i_duration | f_sum_both_bytes | f_sum_both_pkts **|
| 2015-12-15 22:29:00+00:00 | 60 | 179245157376 | 189853696 |
| 2015-12-15 22:30:00+00:00 | 60 | 181873404928 | 192246784 |
| 2015-12-15 22:31:00+00:00 | 60 | 183132584960 | 193918976 |
| 2015-12-15 22:32:00+00:00 | 60 | 180520254464 | 191270912 |
| 2015-12-15 22:33:00+00:00 | 60 | 179917988864 | 190438400 |
| 2015-12-15 22:34:00+00:00 | 60 | 175917901824 | 185893888 |
| 2015-12-15 22:35:00+00:00 | 60 | 174799783936 | 184879104 |
| 2015-12-15 22:36:00+00:00 | 60 | 175613580288 | 185396224 |
| 2015-12-15 22:37:00+00:00 | 60 | 173256493056 | 182279168 |
| 2015-12-15 22:38:00+00:00 | 60 | 170268498944 | 179223552 |
| 2015-12-15 22:39:00+00:00 | 60 | 169344593920 | 178819072 |
| 2015-12-15 22:40:00+00:00 | 60 | 169141132288 | 178192384 |
| 2015-12-15 22:41:00+00:00 | 60 | 169238467584 | 178177024 |
SELECT 60
Time: 1.293s
In summary, the Kentik Data Engine provides a very performant, scalable, and flexible platform that enables Kentik Detect to analyze network traffic data without making any compromises on the granularity or specificity of the results. But that’s actually just the tip of the iceberg. We’re excited about the other types of datasets that KDE will consume, and the additional applications we’ll be able to build on top of it in the areas of network performance, security, and business intelligence. Want to try KDE with your own network data? Start a free trial of Kentik Detect and experience KDE’s performance first hand. Or maybe you’d like to help us take KDE to the next level? We’re hiring.[/vc_column_text][/vc_column][/vc_row]