Looking ahead to tomorrow’s economy, today’s savvy companies are transitioning into the world of digital business. In this post — the second of a three-part series — guest contributor Jim Metzler examines the key role that Big Data can play in that transformation. By revolutionizing how operations teams collect, store, access, and analyze network data, a Big Data approach to network management enables the agility that companies will need to adapt and thrive.
In part 2 of our tour of Kentik Data Engine, the distributed backend that powers Kentik Detect, we continue our look at some of the key features that enable extraordinarily fast response to ad hoc queries even over huge volumes of data. Querying KDE directly in SQL, we use actual query results to quantify the speed of KDE’s results while also showing the depth of the insights that Kentik Detect can provide.
Kentik Detect’s backend is Kentik Data Engine (KDE), a distributed datastore that’s architected to ingest IP flow records and related network data at backbone scale and to execute exceedingly fast ad-hoc queries over very large datasets, making it optimal for both real-time and historical analysis of network traffic. In this series, we take a tour of KDE, using standard Postgres CLI query syntax to explore and quantify a variety of performance and scale characteristics.
NetFlow and IPFIX use templates to extend the range of data types that can be represented in flow records. sFlow addresses some of the downsides of templating, but in so doing takes away the flexibility that templating allows. In this post we look at the pros and cons of sFlow, and consider what the characteristics might be of a solution can support templating without the shortcomings of current template-based protocols.
As the first widely accessible distributed-computing platform for large datasets, Hadoop is great for batch processing data. But when you need real-time answers to questions that can’t be fully defined in advance, the MapReduce architecture doesn’t scale. In this post we look at where Hadoop falls short, and we explore newer approaches to distributed computing that can deliver the scale and speed required for network analytics.
NetFlow and its variants like IPFIX and sFlow seem similar overall, but beneath the surface there are significant differences in the way the protocols are structured, how they operate, and the types of information they can provide. In this series we’ll look at the advantages and disadvantages of each, and see what clues we can uncover about where the future of flow protocols might lead.
The plummeting cost of storage and CPU allows us to apply distributed computing technology to network visibility, enabling long-term retention and fast ad hoc querying of metadata. In this post we look at what network metadata actually is and how its applications for everyday network operations — and its benefits for business — are distinct from the national security uses that make the news.
In part 2 of this series, we look at how Big Data in the cloud enables network visibility solutions to finally take full advantage of NetFlow and BGP. Without the constraints of legacy architectures, network data (flow, path, and geo) can be unified and queries covering billions of records can return results in seconds. Meanwhile the centrality of networks to nearly all operations makes state-of-the-art visibility essential for businesses to thrive.
Clear, comprehensive, and timely information is essential for effective network operations. For Internet-related traffic, there’s no better source of that information than NetFlow and BGP. In this series we’ll look at how we got from the first iterations of NetFlow and BGP to the fully realized network visibility systems that can be built around these protocols today.