Telemetry Now | Season 2 - Episode 20 | November 7, 2024

Navigating the Transition from Traditional Networking to Cloud Networking

Play now

In this episode, Phillip is joined by Charlie O’Riordan to discuss the transition from traditional enterprise networking to cloud networking. They explore fundamental differences in network designs between AWS and Azure, the use of underlay technologies like GENEVE and VXLAN, and the integration of firewalls and stateful devices. Tune in to learn about some of the limitations of cloud-native tools, the advantages of third-party solutions, and practical advice on using Terraform for infrastructure automation.

Transcript

So many cloud engineers that I've talked to over recent years are actually former network engineers who had to transition into a cloud networking role and add that skill to their repertoire, or maybe they're even still traditional network engineers but had to add that cloud networking function to their current job role. But what does that even mean? Is cloud networking really that different from traditional networking?

In today's episode, my friend Charlie O'Riordan joins me to give us insights into how networking works in the cloud.

We go over some considerations we have to make that may be a little different than on prem networking.

We talk about the efficacy of cloud native versus third party tools, and we even get some practical guidance on using Terraform for infrastructure automation.

We're trying to bridge the gap between traditional networking practices and modern cloud methodologies here, so stick around. I'm Philip Gervasi, and this is Telemetry Now.

Charlie, it's really great to have you on today. I've had the pleasure of meeting you several times at various network and user groups, mostly in the northeast where you and I both live. So it's really cool to have you on the podcast, talk about, some cloud technology, some networking technology, all that kind of stuff. So thanks so much for joining me today.

Hey. My pleasure. Yeah. Thanks for having me, Phil.

Alright. So before we get started, why don't you give us a little bit of background? Now I know who you are, and I know what you've been working on lately. But, for our audience's sake, what, what what's your professional experience? You could talk about personal stuff as well, of course. And, and especially, like, in your own personal journey from being very, very focused on, I guess you can call it more traditional networking and now very, very much in cloud technology, which I know takes up the vast majority of your of your workday.

Right. Yeah.

Yeah. So, you know, Charlie O'Riordanin, I've been I've been doing this, for a little over a decade now in in the networking space, and I started off as a traditional, you know, route switch enterprise network engineer.

I did that for about five or six years until, I had I had moved into the data center fabric and and data center connectivity and fabric space and got involved with things like the XLAN and EVPN, doing, like, large scale data center topologies.

And, I was, I was sort of poached by a cloud vendor. At this point, I had I had dabbled in ExpressRoute and things like Direct Connect for some of the AWS and Azure, basic cloud connectivity concepts.

And, I had been poached by Microsoft, and joined them as a, as a consultant on their Azure advanced networking team, supporting, you know, much larger scale cloud deployments for a variety of customers in the commercial and federal space. And that sort of, you know, launched my pivot into cloud network engineering.

I went on to work for a company called Aviatrix, who was doing multicloud networking, supported way more AWS. And at this point, I think I've done a lot more AWS networking than Azure, and a little bit of GCP here and there. And I I just basically dove headfirst and and hyper focused on networking concepts and hybrid cloud connectivity, you know, you know, building those those pathways between traditional enterprise and data center environments, into, into modern cloud networking architectures and tying it all together for for customers and, you know, all sorts of verticals. Right?

Would you say that then the modern environment really is that hybrid approach? Because you did call one traditional and then cloud modern, and then tying it all together. So would you say moving forward, that is that is a modern network. Right?

You know, I think it is. I think what defines a modern network, you're really at the mercy of what your application developers and owners want to do. You know, the people, the people provisioning or designing or building applications for your enterprise, ultimately, where they want to host, you're sort of at the mercy of where they wanna host those apps. And some environments are better for, you know, better for other for other apps. Right? Just depending on the depending on the platform and what you're trying to run. And for a long time, that was that was on prem enterprise environments, and it still is for many applications and workloads.

But now, you know, we have we have a multitude of of cloud providers that are offering, you know, specialized environments that are better for SQL or better for machine learning or, or just better for compute.

And, your application or your development teams have the choice to say, you know, hey. We wanna we wanna be able to deploy these apps in a in a scalable and agile way into this environment. And, you know, depending on how you look at it, fortunately or unfortunately, network teams need to be able to support that connectivity, to avoid getting that finger pointed back at them as to why, you know, why app teams can't deploy things where they want to deploy them. Right?

Yeah. Yeah. So really, I mean, your background is not necessarily, hey. I'm a cloud engineer, or, hey.

I'm a network engineer. You're a person who does focus very much on the network, but it's in the context of the modern network, which is part and parcel of the cloud and on premises and the data center and all these things. And so it almost sounds like correct. If I'm wrong, I don't wanna put words in your mouth.

To be a network engineer in two thousand twenty four and then into two thousand twenty five, it really does mean having an understanding of, yeah, the foundational principles of networking, route switch, and how networking works in in the cloud. That's just how it is.

Yeah. I think I think that's a really big part of it. Then that's not true for for all verticals, you know, depending on what what what type of environment or what type of service you're supporting, you may very much be be an on prem an on prem engineer. Right?

Yeah. Supporting data centers, you know, collocations, what have you. But, for many of the sort of, like, mixed medium enterprises I'm seeing at the medium and large scale, you know, nine times out of ten now, they have a cloud presence. And and there and there is either a network engineering team that supports all of it or a dedicated cloud team depending on the size that that may support the cloud components as well.

Yeah. Yeah. For sure. And and in my experience, you know, only the very, very largest organizations are gonna have very hyper siloed teams like that, where it's just I touch only the firewalls, but not only that, but only this vendor firewall. And we have a separate team for a different like, we have only the Cisco firewall team, and then we have the Palo Alto firewall team. Like, I've been in those kind of organizations that are that siloed.

And sometimes for, you know, regulatory reason and sometimes because that's just how big they are, and they have specialists. You know?

But in this case, I I really do believe that I I don't know what the percentage you just said is, but the vast majority of organizations, that are operating the things that we use day to day. So even at a very large scale are not these gigantic organizations that are hyper siloed like that. I'm sure there's exceptions.

They they're we'd like to call them small and medium sized businesses. Right? SMB. But I remember when I was, like, eyeball deep into Cisco certifications, you know, years and years ago.

And, you know, I'm reading a Cisco textbook, and then, like, in the textbook, it would say, like, a medium sized organization with approximately ten thousand employees. And I'm like, that's a medium organization like, that's an SMB? Yeah. Okay.

Fair enough. I have worked in those kind of organizations like a health care, facility, here in Northeast where I live with twelve thousand, fifteen thousand employees. And, my goodness, there's a lot of overlap, and you don't have those silos. I think it's almost scary.

Maybe this is not your experience, but it's almost scary that some of these, what I consider larger organizations, mission critical, like health care, have, you know, a scarcity of good engineers on staff and folks wearing so many hats.

And so maybe that's just what the way it is across the board, but that's been my experience for sure.

No. I think, you know, I think that's especially true in in government, public sector space as well. I'm finding more often than not in those environments. There's there's one person who manages, you know, an extensive infrastructure and and has been for the past decade or two. Right?

And sometimes that's not entirely visible to the end customer who's contracting out those services to manage their network, and that just that just comes down with, like, with legacy support of an environment. Right? Yeah.

Yeah.

Yeah. Yeah. So you're you're currently, among other things, focused very much on multicloud connectivity, hybrid cloud connectivity. That's fine. And I do wanna talk about the difference some of the fundamental differences between the major clouds like AWS and Azure. But let's prior to getting into that, what are some of the fundamental differences that you'd like to call out between networking on prem, traditional networking, and networking in the cloud? I mean, I assume a lot of the principles, the the constructs the network and constructs are they're they're the same.

You know, you'd think, at the end of the day, you know, taking Azure and AWS, which are which are the areas I I I operate in most most often. Right?

These these cloud providers, it's it's the same network you and I are used to under the hood.

So the major cloud providers that I deal with, like AWS and Azure, under the hood, they're they're a regular network, but regular in the sense of what we're seeing in the large scale data center space. So these are multistage closed, typically, which, for for anyone who's unfamiliar is is like a two or three or four stage spine leaf topology, pretty much where you can have spine, super spine layers, and you scale up from there. And these are highly efficient IP fabrics, with, you know, massive backbones and and and global connectivity.

And on top of that, they're running a tunneling mechanism. And in Azure's case, that's VXLAN, which is a MAC and IP and UDP encapsulation method. Man, that's a mouthful.

And on the AWS side, it's GENEVE, which is is similar. It's also a UDP based encapsulation, method. And the nice thing about, you know, if you've ever worked with tunnels before is tunnels are generally underlay agnostic. So what you end up with is a highly efficient IP backbone underneath, and then a tunneling protocol that provides a data and then generally a control plane overlay on top of that. Now the control plane part of that for AWS and Azure is this this mess of extremely complex and, and diverse, stacks of software that are operating this infrastructure.

And they're controlling, you know, how these different tunnel endpoints talk to each other, and they're segmenting traffic based on these different tags and IDs. And they have, you know, different mechanism mechanisms of doing that. But at the end of the day, it's a it's a giant data center underneath just like the ones you might deploy at a at a smaller medium side size if you were doing, like, a spine leaf, you know, architecture for Mhmm. For, for your environment on prem. Right?

Now the thing to understand as a network engineer coming into, coming into coming into a multicloud environment or or even just a single cloud and needing to build those initial connections is is it's mainly done through the UI.

Working in Azure or working in AWS, it's it's all software. It's all click ops.

So even though it's a traditional network underneath, AWS and Azure have abstracted those concept concepts of networking so far away from what you're used to as maybe, you know, Cisco or a risk to network engineer on prem that it's going to look nothing like traditional networking.

They have concepts of routers in the cloud. You know, AWS, that's a transit gateway. And in Azure, that could be a, that could be an Azure route server, or it could be a v WAN hub in Azure, right, depending on your use case.

But it's the way it's implemented is is is wildly abstracted and and very software based. So it's generally working with the cloud coming from exclusively on prem network engineering is is incredibly overwhelming.

But you're still just providing point to point connectivity between devices on prem and in the cloud. And within the cloud, the layer two and layer three, you know, network forwarding operations are mostly the same, albeit with some some key differences in Azure and and AWS. But the idea that devices that are in the same subnet can talk to each other, and devices across subnets need to route to some gateway somewhere, those constructs are relatively the same in clouds with some small caveats here and there.

Right. And then, of course, the, understanding that we, consume these resources differently than we do on prem. Though certainly in larger organizations that are cost conscious, they're going to look at cost per port and what are the, least expensive transit paths and things like that. But when you're talking about, cloud networking, then you mentioned transit gateways, you know, there is a formula on the AWS site about how, you know, they calculate my bill partly. You know, obviously, there's other resources, but the network networking piece on the, the egress that traverses that transit gateway among other things as well. So so there is, there's traffic engineering that we're used to. Right?

Yeah. From the traditional route and switch world. And then there's traffic engineering in the cloud that has a different dynamic, a different component to it. Trying to say, alright.

What is my cheapest path? You know, you know what? Now that I say this out loud, I guess in the service provider space, we also have that concept. Right?

We are looking for, the cheaper transit paths, but, but that's certainly part of it.

So, so Understanding that AWS bill becomes a really a really core component of actually designing the infrastructure.

Similar to when you're you're specking out like a new DC or an office space and you're tasked with putting a bomb together and researching the right equipment, making sure it's cost effective and it meets the needs of the business, you're going through that same process in the cloud, but it's done really every time you provision a resource. And it's tough, when you're being asked to build some basic cloud connectivity with really no direct you know, really no direction as to how, how how how much it's going to scale and how quickly.

And normally, that's that's the approach I see, a lot of engineering teams there are having to make. They're being asked to build some basic connectivity into the cloud because team a, b, or c wants to deploy an app and run it there, and it needs to talk to some database server on prem or they need to manage it privately, whatever the use case.

And that that's the immediate use case. And because cloud you know, deploying in a cloud environment is so convenient and quick to do, we can kinda miss that step of building out an architecture that's going to be scalable, because it's just so it's just so easy to consume. It's so quick to deploy. You don't have to go through that whole process of a building materials, of researching your equipment, of building out a a network design. You could just click a few buttons, and you could have a functioning network, functioning network infrastructure isolated and available for you, the customer, in just a matter of minutes. Right?

What are then the, the differences, not the differences, but, like, the use cases for for someone that's preferring AWS over Azure or vice versa or Google for that matter. I mean, I know most of your experiences in AWS and Azure, but why would one why would somebody choose one cloud over another? So Yeah. I wanna political reasons, I guess. Right?

No. No. That's that's fair. Really, really, what I'm seeing is mostly application teams driving that, and it's it it's been it's been changing year by year, based on the services available in each cloud.

And most of those go beyond what anyone would have to support, you know, as a network engineer. But what you'll find is, where what I'm seeing at least is I'm seeing application teams come come to me or my team and say, hey. We we want to deploy in AWS because they are offering the service. They are offering this this private, you know, SaaS solution in AWS or PaaS solution okay.

That meets our needs. And sometimes, you know, sometimes that's, that's some, OpenAI based workload in in Azure, or, some SQL based workload that that operates particularly efficiently or is more cost effective in Azure. Mhmm. And, again, that's one of the huge benefits of having a, having such a widely available cloud platform, is you can you can look at a at a moment's glance and see and and just make a decision which part of the world is this going to be which part of the world and which cloud provider is this going to be cheapest to run.

Mhmm. And you could make a decision that way. And a lot of the times, that's just coming from the team who wants to deploy the application. So the, you know, the fun of it is is our job as as engineers.

We we just kinda kinda gotta shut up and support it. Right?

Mhmm. Yeah. Sometimes. Or a lot of or most of the time. But, that's from a cost perspective.

Right? And then, you know, looking at it from various geos and things like that. But from a networking perspective, are there factors to consider as far as performance? Obviously, geos, you can start, you know, factoring in latency because you have to go eighteen hundred miles to get to whatever region or not.

But is isn't that part of it? I mean, you're gonna fact the again, the difference between AWS and Azure, how does network performance factor into it when you're designing both, you know, connectivity to the cloud or perhaps hybrid or multicloud connectivity among those clouds?

Interesting question. I don't I don't know that there's really I I haven't seen a a noticeable performance difference in the way network traffic flows between cloud, but, I I really don't have any telemetry data to back that up. I'm I'm not quite sure.

Okay. Yeah.

Yeah. As far as I like there there's some there's some substantial integration differences between how you do things like, you know, like centralized security for for example, between AWS and Azure. But as far as actual, actual network performance, it's really per region. And a lot of the times, you know, especially as customers are deploying in in collocations, very commonly that have, that have, you know, are are in the same meet me point as, like, a as an as an Azure MSE.

Right? Microsoft Edge, one of the one of the junior typically a Juniper device that sits in a cage and Equinix in a big colocation. And and customers can request a cross connect into that cage, and that's how you provide in the express route. Right?

Generally, those, those providers like AWS and and Azure, they're they're all arriving in the same colocation, and you may very well have the same latency to each provider just depending on which circuit you use. Right? So, sometimes it's all there right in the same place.

But the underlying constructs in both AWS and Azure focusing on those two are are similar enough.

Granted they're different and there's gonna be differences in in perhaps the way that they abstract the underlying infrastructure, but certainly they're gonna perform in similar ways and, and From what I've seen, yes.

At a similar service. Yeah. Yeah. I mean, the thing is though that there are differences I know from what I do for a living. There are differences in in the information that you can get. Like, I know what NSG flow log looks like versus a VPC flow log, and they're different they're they're different things.

Oh, yeah. And the exposed APIs are, like, completely different between providers. Exactly.

So there are considerations then beyond just cost. There are considerations beyond just pure networking. What is your what is your, you know, your team already familiar with right now? And so what kind of data do you wanna ingest from, public clouds? And and, frankly, a lot of the time, it's lacking anyway, which kinda leads me into my next question about third party, tools, third party approaches to, not not just to cloud networking, but into cloud management, into managing cost, managing, you know, visibility into the things like that. I mean, there's a few different, companies that you mentioned here in the in the show notes.

So it kind of begs the question, what's what's lacking with inherently in the in the cloud native tools and services that they're offering?

Yeah. I think more holdback.

No. Yeah. That that that's a that's a big topic. I I think the big one really that I hear every day is visibility.

Okay. You you're not you're you don't own the underlying network infrastructure anymore. You can't span a port. There are some abstracted ways of doing that in in Azure, for instance, where you can kinda mirror a data off a particular NIC, but, it's not it's not what you're used to, coming from, coming from an on prem environment.

Right? Visible network visibility is the big thing. And mainly the as a as a cloud, as a cloud network architecture grows and you get to the point where you're saying, hey. We need to support this long term, and we need to build this to scale.

And you're looking at your Azure environment, you're looking at your AWS environment, generally, the model you go with is HubSpoke. That's that's kind of that's kind of it. You're tying these these these virtual cloud networks together. Every vendor has a different definition of what that is, but, fundamentally, it's this it's this bubble of address space that you're tying to another bubble of address space, similar to a a hub spoke network and, like, an SD WAN deployment.

And, what you don't have inherently in the cloud is, is any device that's going to provide you, you know, per packet analytics or inspection or enforcement.

You do have some controls at the network interface level, the virtual network interface level, I should say, in in most cloud providers where you can where you can provide individual isolated security policies.

But deploying those at scale is complicated. Managing those to be consistent across, across environments is complicated. Across cloud providers is complicated. So what normally, I think the first direction folks go is trying to get some intelligent, inspection capable device in the cloud.

Generally, the idea is let's put a firewall VM somewhere in the mix here and make sure our traffic's going through a firewall, you know, between different either between different cloud environments or between different, different VNets or, or, VPCs within the cloud within AWS or Azure. Let's just redirect that traffic through centralized appliance so we can get all those, you know, nice visibility bells and whistles that we're used to and do that. The packet inspection, what have you. Right?

And Azure and AWS both have different ways of doing that.

Generally, it's it's it's it's pretty standard, like, route redirection. You have traffic from, you know, one virtual cloud network that needs to reach another. And instead of going to the hub and the hub saying, okay. Go to this other attachment here. There's some route in there in the mix that says, actually, go to the firewall first. And the firewall inspects it, punts it back out the interface if it's approved, and it forwards onto its destination. And you may have that firewall behind, the AWS or Azure equivalent of a load balancer.

And you can have them in a scale set, and you can scale your firewall instances up horizontally. So great. You know, we're we can support a ton of traffic, and we can scale that out, you know, infinitely, right, to to some extent. Some limitation somewhere, probably eight nodes or something.

And that's great. That provides some central inspection. AWS actually learned as as an aside, does a pretty cool thing with their firewall integration behind a load balancer. I mentioned earlier that each each cloud platform has kind of a different idea of providing a tunneling a tunneling mechanism as a data plane overlay.

Azure, that's VXLAN, AWS, that's Genev. The VXLAN header is kinda dumb. It's really just it's it's pretty locked in. You it's just specified.

There's some reserved bits, but it really just specifies a VNI, which is a virtual network identifier. GENEVE is kinda cool. GENEVE has, TLVs, which if you're not familiar, is a type length value.

And it's a great way where between different platforms and vendors, you can kind of specify whatever you want. You can make packets do funny things that aren't normal. You know? They can ignore routing and do these make these per packet decisions, right, based on that GENEVE header being read by a by an appliance on the other side and being interpreted in some other way.

And you can pack that right into the tunnel encapsulation that GENEVE header. AWS uses GENEVE. So what happens when you have a centralized firewall in AWS and you put it behind a gateway load balancer is, traffic will arrive, you know, at your gateway load balancer potentially from an endpoint, and it'll get redirected from the front end of the load balancer to the back end, which is your firewall, maybe several firewalls, say two in this case, and it gets load shared to one particular firewall. Maybe maybe it's pinned to an availability zone, which is this this concept of, an availability zone in in Azure and AWS, is is really like an isolated data center.

So it's it's the idea that if you have two, you know, two virtual devices in the same availability zone, and there's a, say, a regional outage for that physical data center supporting that underlying infrastructure, because, again, there's a real IP network under this, both those things in that same AZ availability zone could be impacted. So this idea that you put, you know, potentially one firewall in each availability zone.

When the when the when the GENEVE when the GENEVE header, does the encapsulate when the encapsulation happens, then GENEVE header is put on the packet at the gateway load balancer and gets sent to the firewall. These AWS, you say a pal Palo Alto instance in AWS, it's smart enough to read and interpret that GNAVE header, and it will it will make sure that that packet goes back out the same interface it's received on. So it can it can be returned right to the same load balancer or potentially the same load balancer end point if you're doing something funky with that.

The the way I came across this was trying to, tell a firewall in AWS to, hey. Behave differently. I wanna receive traffic from my load balancer, and I wanna egress it out a different interface for a different use case.

And you can you can put as many static routes as you want on that box, but it's not gonna listen to you because it's it's it's interpreting a header.

It's interpreting an IP a GENEVE header that it's pulling off of the IP IP packet underneath, and you don't get a say in what happens. Right.

You know? Interesting.

It's happening before routing decisions. So, that's an example of some of the funkiness that happens, behind the scenes in a cloud in a cloud environment. And the results are you can't always do the same thing with your appliance that you would expect to based on your experience on prem redirecting and routing traffic, you know, to different DMCs or or what have you. So, it may look similar, but it's, there's some, there's some interesting characteristics that the cloud providers have, have put on that to make their software work in their environment with their networking constructs. Right? Right. Right.

Just yeah.

So you mentioned as far as, some inherent limitations. You mentioned visibility. We didn't really get into that. When we we can or not.

It's up to you. And then, of course, you mentioned just now, the not the gotchas, but these, not I I don't wanna call them anomalies because they're very much, part of the underlying infrastructure, but these differences in traditional networking things that you would expect to happen, which don't because of the underlying, Azure or AWS networking. Yeah. You just mentioned was AWS.

But what are some other areas that you would identify as in inherent limitations? You know, I can see our outline. I can see a couple of them. And, you know, the first one I wanna mention, I'm asking you, but I'm already answering you as well, is, as far as how you handle security in, in the public cloud.

Right. So yeah. Yeah. Perfect segue. And and thanks for reminding me because I'm forgetting where I am in reality right now.

So on that topic of what people are doing with centralized inspection and realizing they need a firewall and they want to see their traffic. That's kind of the the on prem network engineer brain approach to solving this problem in the cloud. That's my approach. And and and it's mine as well, and a lot of my customers are still asking for that, and that's a perfectly viable solution. And, you know, so anybody listening, don't don't listen to the vendors that are telling you you can't or shouldn't do that. It's it's a good scalable solution, but there are always other approaches you can take, and there are some interesting things that other providers are offering that I think are really worth looking at.

When when really the big rush to cloud happened probably, what, like, six or seven years ago now where cloud was kind of a hot shiny thing like AI is today, and everyone was, you know, at the leadership level was really being pressured to let's do something with this for our for our infrastructure, for our enterprise, for our product.

A lot of vendors joined the mix and joined joined in the mix and and really took a swing at, providing, like, consistent and scalable network infrastructure across the cloud because the concepts were so wildly different between vendors and and and cloud platforms.

So you had some you had some weird ones. Like, I I remember, Arista had this idea where they were gonna do EVPN in the cloud.

ACI has some some similar product, Cisco's ACI solution where they're like, yeah. You can go ahead and deploy these virtual appliances in the cloud, and we'll manage them through the same ACI platform, and you can encrypt and tunnel your traffic up. And, again, these were all kind of like on prem approaches to ACI, and and you'll see some other, on prem approaches to, to to cloud networking. And you'll see some, some other vendors, sort of taking a different approach. Like, Aviatrix was a a company that I worked for for a couple years, and they took a they took an approach to cloud networking where they wanted to provide, provide a consistent framework that you could apply to multiple cloud platforms.

So the idea was you you spin up these virtual machines in every single virtual network or virtual or, you know, virtual private cloud or virtual cloud network, you know, depending on Google, AWS, and Azure.

And they build IPSec tunnels with each other, and then because traffic has to pass through multiple, you know, multiple of these VMs and get encrypted and decapsulated, you have all that visibility back. Anytime any of your packets want to go somewhere, they go from the the host in the in the VNet, say, Azure, for example, directly to an appliance. And then in that appliance, it gets tunneled across the, you know, the Azure, the Azure Underlay.

So your cloud overlay kind of becomes an underlay to another overlay and so on and so forth. Right? You're further abstracting the cloud madness, which I think is a good approach, if it's cost effective because you're paying for these VMs and you're paying for more licenses. And the benefit to an approach like that and another vendor that does it well is Alkira, and I I've seen them in in in play a few times, and they they have they have a very similar idea, is that, by doing this and getting all this visibility back, you can do kind of two things. One, you can see all your traffic, which is great. Good for telemetry, good for inspection. And and two, because you already have the traffic going through, you know, basically next topping to a VM anytime it needs to go anywhere, as close to the source as possible, you can you can apply inspection and policy directly, at that first hop versus waiting for it to get to some centralized appliance in your in your hub VNet.

And I think that's great because you can distribute your your approach to firewalling. Right? No longer are you having to pigeonhole everything into some centralized appliance and and do this hub and spoke model. The cloud cloud as a platform is wildly distributed. So, the idea is this this supports that model of something that's widely distributed.

And I think that's a good approach. But, again, you have to weigh the cost and the licensing fees and everything else that comes along with locking yourself into another vendor.

But you'll see there are other key there are other players in the in the cloud space that are that are taking a swing at this to, to to try to simplify that. And I think it's a good thing.

Okay. Alright. So then, you know, you mentioned visibility here and that that wasn't that's not necessarily the primary concern of these, security focus vendors.

But a side effect of being able to have all these package traverse your devices that you have visibility into traffic.

You have all the data. Yeah. All the data. Exactly. Close to the host versus waiting for it to traverse the entire cloud backbone before you can do some inspection or collect some data on it. Right?

Yeah. And that's one of the things that, you know, we're focused on as a company, at Kentik, is is really the, the inherent weaknesses in cloud visibility inherent inherent weakness in the cloud native, visibility tools. There there are some certainly some strengths and a lot of folks are doing just fine, and I get it.

But as soon as you go into a hybrid approach into, a multicloud environment, which, you know, a lot of people are without them even realizing it or planning for it. All of a sudden, there's a merger or there's, you know, some new workload you need to spin up over there because of some bespoke, you know, service that's running like you said. And all of a sudden, lo and behold, you have IPSec tunnels everywhere. You have a multicloud environment that is much more difficult to get a grasp of, especially when you're trying to track application activity and then, you know, match that with your with your AWS bill, and why is my cost where it is. So Yep.

Back back billing is huge. I mean, we did that for for years on prem. Right? And I feel like a lot of customers are finally getting to that place with a fabric where they can they can isolate out their network infrastructure, you know, per customer and do that internal back billing and and pull back that revenue for their networking teams.

And now you're being you're you're faced with that same problem all over again with Azure and AWS where you're charged just sometimes flat egress. Right? And that's that's for all your traffic. That's not for a particular customer in a particular VNet.

So, that is something I'm seeing as well with a lot of these, like, a lot of these kind of holistic, like, multicloud platforms is, you'll hear you'll hear, like, backbilling support and a lot of their marketing data. And and it's true. You know, having that visibility is huge. And, when when you're concerned about cost, being able to backfill some of that to other departments does help offset that and justify that that purchase of that service for the networking team, and and that is a win.

Right?

Yeah. Absolutely. And and we do that with, you know, adding custom tags and an enrichment process. You know, basically just adding to the flow logs that we ingest and and any other type of telemetry that we ingest that's relevant.

And, and then being able to parse out, different departments or business units or whatever. And usually usually, the use case is charge back like you said. So, and and it really is. It's just the propagation and the analysis after the fact and being able to query on on those custom tags.

Yeah. So Absolutely.

Let's say here we are. I've been a network engineer for ten years, fifteen years, and whether I like it or not, I'm also managing some cloud environment. No. I'm not a cloud engineer. I have my foot in both camps, which is, again, like we started out our podcast today, probably a very common scenario for, for a lot of engineers.

What are some tools or if there's one particular tool that, that you wanna talk about that will allow me to programmatically manage cloud environments, that will allow me to also manage in the same, you know, in the same, tool, my on prem environments and and get a handle of all that without having to manage multiple?

Yeah. I think, I'll I'll say this coming from the perspective of someone who, who started his journey in network engineering, because I tried software development and hated it, which I think is a really common theme with, with with a lot of network engineers I've talked to. Right? It's like, oh, man. I'm doing this, like, I'm doing this JavaScript class, and I'm just having the having the worst time, you know, worst time ever, but routers seem cool. Right?

I really never wanted to do any programming. And, with the advent of cloud and, and folks needing to find a way to automate that consistently, I didn't really have a choice.

So, eventually, I, you know, I I got pulled into the Terraform world, and this was around the time I was with, you know, Aviatrix and and Microsoft and Okay.

Having to support these these larger topologies. Right? A lot of the customers were already managing their environments with with Terraform. And what Terraform is is it's kind of like, it it's kind of like it's kinda like automate it's kinda like the easy button for automation, I think.

And you could take it as far as you wanna go. So I've seen Terraform deployments that are plain text simple, and I've seen Terraform deployments that are that are incredibly complex. And both of them accomplish the same exact task with varying degrees of efficiency, which is great for someone like me who's trying to come in and, doesn't want to learn. I don't wanna learn.

I don't wanna be a Python expert, and I I've never did a lot of object oriented programming, and it just doesn't it's not something I particularly enjoy.

So if you're like me, I think Terraform is a great place to start. The reason is it's it's basically a plain text language where you're defining the resources you wanna provision. And, all these these big vendors like, AWS and Azure, what they're doing is they have these Terraform provider registries that are publicly available on the Internet, where they write up their own Terraform code. What it is underneath is it's really just Golang, looping through a bunch of API calls. So, like Golang's kinda like a Python alternative, as I understand it as, again, like a non software engineer. Mhmm.

And what it's doing is it's just looping through a bunch of API calls in an intelligent way. So by the time you're you're writing Terraform code and you say, you know, resource, VPC, you're you're you're deploying something in in AWS. It's as simple as just saying that you want a VPC defining, you know, the CIDR for that VPC, the address space for it, maybe some tags, some a couple other required characteristics, and then, you know, seven other optional field fields that you can define if you want them. You click go as long as you have access to the API and you have the keys to manage your environment.

That resource gets deployed.

And you can you can list out every single resource in a one by one fashion and deploy an entire cloud infrastructure, your entire cloud infrastructure that way. You can also get, you can also get very detailed, with your Terraform deployment. You can start creating modules. You can, you can start writing writing looping mechanisms that will, you know, deploy multiple instances of the same resource over and over again with with, with, with varying IDs and inputs, and you can start, you know, creating variables for all these objects.

And, and you could just keep running with it that way. You could do ternary expressions. Right? You could start feeding you could feed in Python.

There's ways to, you know, attach other other bits of code into your Terraform if you wanna do something a little more complex, if you need to maybe loop through some API calls that, aren't available in the provider registry.

Or like I said, you could just flat out list out plain text English a bunch of resources that you want deployed. And what Terraform allows you to do is it's this, this concept I was introduced to called, idempotency.

It's the idea that you can you can deploy, you can deploy your infrastructure. And if you attempt to deploy it again and there haven't been any changes in the code base, nothing will change. So it won't try to redeploy your infrastructure.

Yeah.

It's a it's it's, yeah, in the in the in the sense of a network engineering, it's stateful versus stateless. Right? You're you're aware of an active session. That's kind of the concept, but it's it's it's the programmatic version of that.

Mhmm. Right. So you can deploy your infra. And if you haven't made any changes and you click the deploy button again, unless someone went in and made changes outside of the code base, it will not it will not change anything.

Nothing will get redeployed. There will be zero changes, and it allows you to visualize those changes before you push it out too. So if you if you make a couple changes, maybe you change some tags on a VPC, maybe you add an additional peering between VPC, something like that, and you go and you can do a Terraform plan, and it will spit out exactly, hey. Based on how your infrastructure is currently deployed, this is Terraform talking to you.

This is what I think is going to change or this is what I know is going to change. Right? And you could review that, make sure it makes sense, and then you can apply that configuration. It'll it will make exactly those changes and nothing else.

And the same thing with deleting and removing resources as well. So you you're able to you're able to maintain this very stateful environment where, it's it's extremely efficient. Philip very, it's a very, like, risk adverse approach to managing infrastructure, that allows you to preview everything you do before you make changes because it's it's just so easy to to bosh infrastructure in the cloud with a click of the button with how with how wildly complex and and and sometimes poorly documented these vendors like like Azure and AWS make their make their gear. Right?

Yeah. Yeah. And on prem for that matter.

Yeah. And on prem. Absolutely.

I mean and that that actually for me, it begs the question that if I'm using Terraform to, more programmatically manage my cloud environment, can I use that in any way, shape, or form for on prem?

I mean, is that is is that It's it's a good question, and I think I think all of us have that, like, that that, that burning that I think all of us have a desire to manage our, manage as much of our as our infrastructure as possible through a single point.

Right? Yeah. Exactly.

That's what I mean.

Ideally, less less tools better. Right?

The downside to Terraform is I don't see a ton of on prem infrastructure support. And what you can do is is if you have a particular platform in mind that you're you're interested in, in managing with Terraform, you can go on the Terraform provider registry, and you could search for that vendor. And you could just kinda take a look at the the available code base there and and see if it's something that's pretty well thought out, if it looks, you know, reasonably maintained.

It's really kind of this optional thing that vendors can do. They need to dedicate engineering time to turning all this Golang script into TerraForm code and then publishing it and maintaining it every time they update something. And that's, that is absolutely something the cloud providers do. I don't see that as much on prem. But you will occasionally see providers like Cisco ACI is a TerraForm provider. I'm not sure that I would that I would want to use that a hundred percent of the time. Okay.

I've seen, I've seen some SD WAN vendors, like VeloCloud is a is a VMware's, you know, SD WAN product. They're a Terraform provider, but it's like there's, like, four resources in there. And I think it's just for initially deploying maybe the first controller, and and nothing else, not actually making changes to your environment. And some of the vendors that already have, like, some centralized manager appliance, what I've noticed is you won't really see them write TerraForm code for that because they want you to actually log into their appliance and manage things that way.

The the workaround for that is really, like I said, under the hood, TerraForm is just stringing together in a complex way a bunch of API calls. What you can do and what most vendors do have are a published list of of APIs that are available. So if you have a particular use case you want to do on prem, you're getting comfortable with TerraForm and the idea of working with an API, what I would suggest is take a look at the API documentation for that for that particular product or provider. What you may find is that is very well thought out and published, and that's been my experience as well.

And you could you could write a pretty straightforward script that just loops together a couple of particular, you know, you know, post or get messages, and accomplishes some operational task that you wanna do. Maybe it's some it's changing a switch port. Right? And I think that's the I think that's the right way to get started with automation if you're not coming from a software engineering background.

Yeah. Terraform has been such a great intro you know, introduction into that world for me, that I think got me over the hump of, like, this stuff is scary, and I don't wanna deal with it.

They do really make it very simple in the cloud, and it's the the big hurdle for me is was learning how to work with Git for the first time and, and understanding how to, like, how to generate keys so I can log into the right environments. Like, that operational headache, that was the hard part. Actually, writing stuff with Terraform, it's it's so well documented for the cloud providers. And and if you have an opportunity to do some small version of that in your environment, I would recommend everyone give it a shot.

Yeah. Especially for network engineers that are are now, have embraced or embracing network automation. I mean, it goes without saying that you wanna apply some of those same skills and mindset to what you're doing in the cloud considering that you do many of us do have a foot in both of those camps now, traditional networking and, and in cloud networking, so whether we like it or not. So, Charlie, before we close out, are there any recommendations that you would like to or or could give to, more traditional network engineers that find themselves now working in the cloud world as well that they can do to, to sort of, increase their chances of of success as they kinda straddle both camps?

Yeah. I would say, I would say in in AWS and Azure's case, take a look at some of the well architected frameworks that they have available. Okay. There are some there are some really good starting topologies that you can work with to deploy a simple cloud environment.

And, and don't be afraid to push back on people who just wanna throw money at the problem. You don't always need an ExpressRoute or Direct Connect circuit. You don't need a ten gigabit circuit into the cloud. And what I would suggest is avoid making the mistake of treating the cloud like an extension of your data center.

It's it's not it's not your on prem data center. It's a it's a really it's a service driven platform that is great at a ton of things, but it's not an extension of on prem.

Okay.

And what I what I found in in most of my, most of my experiences with the cloud dating back to, you know, my first my first attempts at ExpressRoute in Azure is that is that is a lot of times the impression that folks in leadership may get about how to handle the cloud. Oh, yeah. And I think it's our it's our jobs as engineers to to to try to, you know, understand and explain, the appropriate use case for each technology. Right? And really advocate for that.

Otherwise, you get into a scenario where you're you're you're doing, you know, massive amounts of data replication into a cloud environment that really doesn't need it. And you'll see that egress bill. It is not pretty. Right?

So, yeah, you know, take your time and, and and look at Terraform. It's a it's a great tool. Right?

Yeah. Yeah. And I love the, recommendation to look at those reference architectures, and just get a sense of what some what a good design looks like. You know?

I, that that's one of the ways that I learned over the years in traditional networking just to see how somebody solved a particular problem in a network diagram if they had one or in, you know, in in the various courses. So that's a great recommendation. Thank you. So, Charlie, this has been a great conversation.

I would love to have you back one day to talk really a lot more in-depth about GENEVE and also with your experience with EVPN, EVPN, VXLN, BGP, all of that overlay.

Oh, I'd love to. Yeah. Yeah. Yeah. Yeah. I don't I don't get to do enough of it anymore, but, really, that's that's where my heart's at.

Data center fabrics, I think, are are so fascinating. And it's, and we're we're to the point now that cloud adoption has become so so prevalent that, those two technologies are starting to bleed together a little bit or starting to have to interoperate. And, and that that brings into some, you know, some really fun architecture. So Yeah.

Yeah. I would love to chat about it.

Yeah. Yeah. We'll set it up. So thanks again for, joining me, Charlie. And for our audience, if, if you have an idea for an episode, I'd love to hear from you.

You can reach out at telemetrynow@kentik.com. Or if you have an idea for a show, please reach out, and, we'll have a conversation. So for now, thanks so much for listening. Bye bye.

About Telemetry Now

Tired of network issues and finger-pointing? Do you know deep down that, yes, it probably is DNS? Well, you're in the right place. Telemetry Now is the podcast that cuts through the noise. Join host Phil Gervasi and his expert guests as they demystify network intelligence, observability, and AIOps. We dive into emerging technologies, analyze the latest trends in IT operations, and talk shop about the engineering careers that make it all happen. Get ready to level up your understanding and let the packets wash over you.

All Episodes

Kentik is the network intelligence platform for modern infrastructure teams.

844-356-3278

Platform

Solutions

Technology

New and Notable

Learn

Company

View in Prod

We use cookies to deliver our services.

By using our website, you agree to the use of cookies as described in our Privacy Policy.