Solving the SD-WAN problem: a different approach

Solving the SD-WAN problem: a different approach

In this episode of Tech Deep Dive, Max Clark talks with CloudGenix System Engineer Leader, Yash Bajpai, into the technical deployment and offers how CloudGenix differs from other products.

CloudGenix is tracking the emit failures, transaction failures, the round-trip time from the user to the server, and also taking tangible actions: transaction failures on path one, no transaction failures on path two, and more, going beyond, which transcends the link, layer, latency, jitter, packet loss. CloudGenix is able to have end to end visibility and tangible, actionable analytics.

Episode Transcript:

Max: [00.00] Hi I’m Max Clark and today I’m with Yash Bajpai with CloudGenix, and we’re about to do a pretty deep dive into the technical deployment and what CloudGenix is and does. Yash, thank you very much.

Yash: [00.14] Thank you Max, I’m super excited to be here with you.

Max: [00.18] So Yash, when I think of SD-WAN, I have a really crude… Buckets I put SD-WAN into, and you know, starting at the beginning, we see inter-aggregation and quality of service or optimization as a category, I think MPLS replacement or augmentation as a category, and then I think of WAN acceleration or traffic de-duplication as a category. And people kind of flow up and down these stacks, and cross over a little bit, but how would you classify CloudGenix? What is the problem that you’re solving, what is the use case that you’re built around?

Yash: [01.00] Yeah, thanks for that question, Max. So, it’s very natural, a very human tendency so relate to what you know, and say, “Okay, this is similar to what I did in the nineties.” In fact, some of the SD-WAN solution out there, they’re basically taking the problem of twenty years ago, and then applying a software-defined orchestration layer on top of it, and calling it SD-WAN, whether that’s modifying the routing protocol, whether it’s taking their WAN acceleration IP and then calling that SD-WAN… CloudGenix is a pure play SD-WAN. So, what we’ve done is where everyone else has taken the bottom approach, meaning they’ve – on the OSI layer stack they’ve taken the network layer up, and solved the SD-WAN problem. We’ve taken the user and the application layer as our unit of operations; so we’re looking at not packets, but we’re looking at sessions, and we’re looking at flows, and we’re looking at the user experience using those applications, and then we’ve built our SD-WAN solution.

Max: [02.04] Let’s dig into this and talk about this in terms of human words, right? So… When we – we look at SD-WAN options, you know, some vendors are taking and looking at network performance, or latency, or ping times on circuit A versus circuit B versus circuit C, and they’re making a decision on, you know, “This is our preferred path, and it has the lowest latency to whatever we’re testing against, so use this path,” or “This circuit just went down, so don’t use that path,” or, “We want to have a really good user experience, so duplicate traffic and send traffic down multiple paths at the same time and whichever way gets there the fastest, discard everything else.” What I’ve heard from CloudGenix and what I’m hearing you say right now is that you’re really different in that regard. So, how are you different and what does it actually mean for somebody when they’re looking at, you know, a CloudGenix – they’re evaluating your product?

Yash: [02.55] So, first of all I do want to say that – and I’m not trying to say that we don’t do what the other guys do, I’m saying we do something that the other guys don’t – so let me talk about that. For example, the link, layer, latency, jitter and packet loss, everyone else tracks that and acts on it, we also do that. We also track on latency, jitter and packet loss and are able to track that and act on it. However, that is where everyone else’s story stops, and that is where our story just begins. So for example, if your application – let’s say we’re using Zoom as an application – and if that application, a user is saying, “I’m not able to connect,” or “I was logged into Zoom and I got disconnected,” and this has nothing to do with link, layer, latency, jitter and packet loss; this is application initialization failures, or transaction failures. Nobody in the industry can do something tangible with it. There are some solutions that – some bolted on solution can probably give you some of that, but with us, we are tracking the emit failures, transaction failures, the round-trip time from the user to the server, and then not just tracking it, but then taking tangible actions: transaction failures on path one, no transaction failures on path two, so then the application is now automatically going to gravitate on path two – just that application. So, these are the things which go beyond, which transcends the link, layer, latency, jitter, packet loss – the first mile problem that people solve with packet loss or WAN acceleration. We’re able to have this end to end visibility and tangible, actionable analytics, as we call it.

Max: [04.46] So I mean as a network engineer, you know, I’ve never heard in my life like, “The network’s the problem, the server is fine,” that’s never happened to me. So if you think about it from a network engineer or you know, your troubleshooting path is usually like, “Oh, can I ping the remote site? What’s the latency with a traceroute look like? Am I seeing asymmetric jitter? Can I do an iperf?” You know, there’s like a usual toolkit that people are walking through to troubleshoot these things, and often that last piece is never really tested. “Oh, can you telnet into port 80, and does the server respond to you, what is the actual thing?” So, it’s really interesting that CloudGenix is taking and actually looking at – if we send a SYN, how long does it take for that SYN/ACK to come back on this path, and you know, is it a path issue, or is it an application server issue, and can we make decisions and do things about it? I don’t think you guys really talk about this enough, or maybe it’s that your people aren’t hearing what you’re saying, because this is a very — again, as a network engineer, this is a very interesting thing to me of… Why wouldn’t you want this? Of course I want this. It’s a very cool feature.

Yash: [05.56] Yes, and we lovingly call this – you would have heard this from us before – the mean time to innocence chart. What we give our customers are the tools beyond the ping and trace and PCP pings and packet captures, by the way, we support that on every single port of CloudGenix, but beyond that… If the problem today, when a SaaS application has a problem, no longer can a network engineer go and ping. What is he pinging, because a SaaS application is plugged together by many, many different widgets, many, many different servers that form this dashboard on the UI, on the browser. So, what exactly is he supposed to ping? The network engineer typically has no clue, neither does the SysAdmin in most cases, for cloud applications. So, we’re able to automagically – not just track that – but give you a very clear picture on this path, this application is having a brownout, an application brownout, only this application, and then we’re able to steer traffic for that affected application around that path, to another path which is allowed by policy.

Max: [07.10] So really we’re talkin about CloudGenix’ SD-WAN as it relates to application performance and application access. Right now, a lot of organizations are in this transition, right? We see some applications still internally hosted, some applications in a datacenter, some applications have transitioned into a cloud or SaaS model, whether or not it’s the enterprise’s infrastructure in a cloud, or whatever it’s a SalesForce or an Office 365. So, does the approach change at all within the CloudGenix stack, of where the application is on the remote side? Does it matter if it’s at a corporate HQ, or at a colocation facility, or in a cloud – in a hyperscale environment? Or is it completely agnostic to that?

Yash: [07.56] So we are agnostic to it, we support any model the customer may have. A lot of our financial customers, for example, all of our banks are still on the traditional model where most of the enterprise applications are in their physical datacenters, and we also have a very large set of Fortune 1000 customers who have rapidly moved to the cloud. So, from a CloudGenix perspective – just to break it down a bit – we have the branch devices, so we have a couple of hardware models, and these hardware are nothing but x86-based platforms. So, it’s just a server with a CloudGenix sticker on top; you can put that in the branch. At the datacenter, we have a datacenter-grade appliance, with redundant power supply, fan trays and such. And then we also have deployment on all the major public clouds, and one more thing – we also have the — Equinix has a marketplace, so CloudGenix is there in the Equinix colocation as well. So whether it’s private cloud, public cloud, physical datacenter or a branch, it’s very easy to deploy CloudGenix, and I’ll make one last point there, Max – and this is tying back to the approach that we have – a majority of the applications are now being delivered in a SaaS model. In this model, we are the only solution that doesn’t require a bookend. So, it doesn’t matter if there is a CloudGenix at the branch but there is nothing on the other end, because the destination is neither in the physical datacenter, nor is it in Equinix, nor is it in AWS, Azure or GCP, it is somewhere. It is SharePoint, it is Office365; whatever it may be, a single CloudGenix device can still enforce prioritization, can still enforce the app performance thresholds like round-trip time and transactions successes, failures, and give customers the best path. Again, nobody else can do that, because we’re application-defined, we’re the only SD-WAN solution that can do that. 

Max: [10.06] Let’s talk about deployment for a little bit; we’ll start with the branch and work our way up from there. What does a branch deployment look like with CloudGenix?

Yash: [10.12] So we have two models on the branch. We have a seamless insertion, what is typically known as a bump in the wire deployment, and that model, when the customers says for historic reasons, legacy reason, their MPLS router must stay, and they are running some proprietary VOIP stuff, maybe it’s a non-ethernet handoff… We can go seamlessly as a bump in the wire; we say to the customer, “You don’t even have to change the IP address, you don’t have to change the routing protocol. We’ll go as a bump in the wire, and then we will terminate all the new internet connections in addition to the MPLS connection, where we are a bump in the wire app,” and now we are able to use all the paths actively, build the VPNs automatically, do all the hubs, and we are now doing everything what an SD-WAN is supposed to do, active-active pathing, steering traffic to the best path, everything else. At the branch, that is our deployment. If the customer wants to deploy a firewall at the branch – which many of our customers still do, deploy a physical firewall – typically the firewall or the WAN acceleration device will go south of CloudGenix. So, we will have the steering wheel for all the wide area networks for the WAN paths, and the other devices like a firewall or a WAN acceleration, will go south of us. And I should also mention the second deployment – I talked about the bump in the wire deployment – the second deployment which is, the majority of our deployment is a typical layer three deployment, which is where we become the default gateway for all north-south traffic, so people can either dynamically or statically route traffic through CloudGenix, and now we are a level three hub, and now we do the same steering across all WAN paths, actively and smartly.

Max: [12.08] Okay, let’s get really, we’re going to geek out here for a minute. So, when we look at these things and we talk about physical insertion models, we talk about that the firewall’s south of the device, right? When I haer that and you say bump in the path of an insertion model, so a seamless insertion model… What I’m hearing in my head is that you are taking and effectively bridging a primary circuit IP addresses through the CloudGenix and then landing other circuits on CloudGenix and providing either a Double NAT or a proxy NAT or something along those lines, so that way you can say, “We’re still using the designated primary circuit and the IP address persist, or we need to use a secondary circuit and the IP address is — obviously can’t persist, as it’s not on the network path.” So, is that an accurate assumption on my part?

Yash: [13.00] Yeah, so let’s talk about that – so the bump in the wire typically goes for your private WAN circuit. So when we go as a bump in the wire, the closest analogy I can think of is PaloAlto networks for the longest time had a virtual firewall, where you could be a bump in the wire firewall there, so just like that — it’s not truly a switch; we are able to intercept the traffic, and based on policy, we don’t have to bridge it, we can route it, we can encapsulate it, and we can send it along, but the bump in the wire typically gives customers two big advantages. One is their deployment becomes simple, without rocking the boat too much, because they’ll be like, “Oh well, I have an MPLS router managed by the provider, I can’t really change the IP or change the routing protocol,” so that makes super simple for us to get deployed without making any changes. That’s number one. But the even bigger advantage, which I suppose the business leaders will appreciate, is without an HA, now I’ll talk about the HA if the question comes to that, but even without the HA and a bump in the wire model, we give customers a fault tolerance, in the sense that if the MPLS router north of us fails, we will automagically take over the IP address and start — and the MAC address off the MPLS router — so that now, without any changes on the LAN side, maybe there is a legacy printer that hates it when a MAC address changes, right, we all must have seen that in our past networking life, none of that problem happens, and with the same MAC and the same IP, CloudGenix becomes the default gateway when the router fails. And what happens if the CloudGenix device fails? Even if somebody powers off the CloudGenix device, the bump in the wire, it fails to wire. So now, the circuit closes and the branch still stays up on the MPLS. Now granted, if the CloudGenix fails, there is no internet, there is no VPN, because you didn’t have HA, but the branch is still up. So, even with a single device, there is fault tolerance: if the router fails, we’ve got your back, if we fail, we still fail in a way that the branch is still up on MPLS. So, this is unprecedented and customers love us for this fail to wire technology that we have at the branch.

Max: [15.22] So because we’re not sitting in front of a whiteboard I’m going to try and keep these scenarios somewhat simple, right? If we talk about a north-south architecture with a firewall behind CloudGenix, right, that would become the ideal deployment – you know, CloudGenix box, the circuits laying in CloudGenix, and then the firewall behind CloudGenix. So, primarily circuit IP addresses are passed through CloudGenix to a firewall, and then you can take and determine which network link you’re seeing traffic to. In that scenario, let’s talk about high availability, right? A lot of this comes into… How do you failover between circuits – we’ve kind of talked about that – but now you’ve got a physical appliance, there’s another failure point – what are your high availability options, and what do people have to consider or think about?

Yash: [16.12] Yeah, so for this, and just to keep in line with keeping it simple, what I will say is the majority of customers deploy us in the layer three mode, and not as a bump in the wire, and that might be simpler for our audience to follow also. So, everything that you’ve described is the same. Let’s assume it’s AT at CloudGenix, deployed as a layer three – as a layer three hub, not as a bump in the wire. The story becomes easier when it’s a bump in the wire in some sense, but the majority of our deployment is layer three, so I don’t want to ignore that use case. The advantage is the fail to wire that I speak of gives the customer something which they’re not used to, and I’ll tell you what that is. So, traditionally when you have two devices, two routers, or two SD-WAN appliances, and let’s say you have two physical hand-offs: you have MPLS CAT6 cable drop, and then you have a broadband ethernet drop off. Now, you typically will connect one connection to one appliance, the other connection to the other appliance, and if any one of those appliances fails, a non-CloudGenix router or an SD-WAN appliance fails, that means the end of that WAN’s circuit. That WAN’s circuit is going to go down, because that appliance went down. But remember, with CloudGenix, we have the fail to wire. So what that means is that in this deployment, if a CloudGenix SD-WAN appliance fails, even if somebody powers it off or there is a software catastrophic failure, at that point that device is going to fail to wire the WAN’s circuit to the other appliance, which means that when a known failure happens, the branch has full forwarding WAN capacity, so CloudGenix HA gives customers the advantage that nobody else gives, which is that even when an appliance fails, the customer has no loss in bandwidth; they get the full MPLS and full internet, or dual internet. So whatever the WAN circuits might be, everything is available for them. 

Max: [18.20] The only places I really see infrastructure deployment fail to wire in this manner is like, transparent network TAPs, which is where, you know, where you don’t want to insert a TAP in your network and have the TAP go down and take your network with you, and so they have… You know, this capacity built into it, and I was very surprised when I heard that CloudGenix was doing this, because I don’t know of anybody else in this space that offers this as a feature.

Yash: [18.44] That’s right, so that is a unique point, absolutely correct.

Max: [18.49] You know, when I look at this, and we’re talking a little bit about bump in the road and different insertion models, and layer three versus not… A lot of these decisions come into, how do you insert a CloudGenix appliance into an existing network infrastructure? I mean, if you’re dealing with hundreds of sites and then plus some permutation of circuits at those sites, and you know, different amount of technical resources at different locations disparately, and maybe it’s just across the country or across multiple continents… Being able to roll a new network architecture out, you know, as painless a way as possible, that becomes a real consideration. So I mean, are you actually looking at this and going through deployment with your customer saying, “This is what you really have here, and in the ideal state maybe you’d want to do this architecture, but given what you have and how we’ll have to roll this thing out, maybe we’ll do this other model instead because it’s going to result in less energy and work, more time in the market…” What does that conversation look like?

Yash: [19.49] Yeah, and when we talk to our customers – I’m going to use a very oft quoted acronym – which is ZTP. And that in itself may not be a — other SD-WAN may also tout the same thing, “Okay, we have ZTP,” and so that may not sound like a big differentiating factor when I say that – zero touch provisioning basically changes the game. I remember this one PAC Northwest customer, they are fully deployed, they are a bank based in the PAC Northwest… The main network architect, she would tell me that she hates pizza parties, and I said, “What is that?” And she said, “These are the deployment weekends where we would take the incumbent routers and we would hand code all the routers that need to be shipped to their three hundred and forty eight locations, and then the ATM would get a smaller incumbent router, and then they would have to configure them and the – all the weekends would go there.” And when you compare that with CloudGenix, we shipped all the devices to their warehouses. Now, we could have shipped it directly to their end location, but they preferred they asset tag it, so we shipped it to warehouse, they asset tagged it, they shipped it to the branch, the branch manager – without a CCIE or any other networking, engineering team there – the branch manager plugged the devices in, the device was part of the network. So, their deployment becomes pretty simple when you compare it with the routed model. But at this point, I will also say that this is no different from the other SD-WAN solutions. The other SD-WAN is also – to a large extent – plug and play. The place where we shine more than anyone else when it comes to deployment, is our automation story. So, the customer who are going on a DevOps or on a NetOps journey, they would love the fact that CloudGenix has a fully exposed Python SDK; we also have C#, JavaScript and NodeJS, but most customers, the defacto standard for DevOps is Python. So they use the Python SDKs, we have fully exposed Rest APIs as well, if they directly want to work with Rest… What that means is now the customer can have a full stack on the SD-WAN, meaning not just a deployment, but if they want to now make a massive change. They want to change the DNS, they want to change the SysLog server. Try doing that on a massive, wide area network on a router based network. It is not super hard, but it is not trivial also. With CloudGenix, it is super, super trivial for us to make these ad-move changes, to handle RMAs. I mean the customers would say  that every time a customer would fail, they would have to now configure it from their archive config, the latest config, and the snapshot that, and then pass it, then send it to remote hands. With us, the device just ships directly from factory to this failed device where the branch is, and they would just plug it in and because there’s configure stored in the cloud, they would just follow a three-click automated wizard, and all of the config would get pushed to this new device, and voila. It’s part of the network again, with the same config. So, it’s super simple operationally also, to handle these things. 

Max: [23.07] So you’re talking about like… Using Napalm with Ansible or SaltStack as an option to be able to configure and manage CloudGenix deployments at a large scale?

Yash: [23.19] So we don’t have a religion there, as to whether it’s Napalm — so we do have many customers who use Ansible, but whether it’s Napalm, whether it’s GitHub repository, whether it’s some CI/CD pipeline, what I’m seeing is the customers have choice, and we have many sample scripts that enable customers, we also have – like I said – full SDK, so if they don’t want to use our scripts, they can use our SDK and do whatever they want with it, in whatever platform or DevOps model or framework of choice they may have. 

Max: [23.48] CloudGenix has a lot of financial services and banking customers; why is that? What is it about CloudGenix that has resulted in that for your guys?

Yash: [24.00] In large parts, it is our security posture. So, I can’t name this customer, but one of the largest global banks that is fully — well, they’re deployed on four thousand locations now — they had a fourteen-month security cycle, their GIS, their Global Infosec Team, did a fourteen-month trial on us and another SD-WAN vendor, and we passed — and the amount of scrutiny that we had at the device-level, at the controller level, at the overall solutions security level, we are the only vendor that passed all their pen testing, ethical hacking, all their audits, and everything else. So, case in point, I’ll give you a few simple examples – this was just neither here nor there, right? So I’ll give you a few things – for an example, I’ll start with the VPN. The IPSec VPNs that CloudGenix build, they are – by default – banking grade, AES, 256-bit encrypted, but not only that, the crypto cyphers that we use between the VPNs, they rotate every hour. So, there are some solutions out there which default to a 128-bit encryption, IKE phase one — IKE one, for phase one, and so, compare that with what we do on the VPN side; that’s just the data plane side of it. There are many other things, for example, we don’t have a gateway model – so we don’t send our traffic to this unknown black box which is another attack vector, because the customer doesn’t really have this shared gateway model with CloudGenix. So, their traffic always stays in their network, which is true for many banking customers, they don’t want their traffic to go to any other third party devices. There are a lot more that I can say, but I’m going to pause here and see if there are any questions. 

Max: [25.51] No, no, no. So, you mentioned shared gateway models, right? So, the shared gateway – in a lot of cases – is a technique to allow you to pin traffic to a central location, to, you know, to deal with – for instance UCaaS, for voice. You don’t want to drop sessions if you drop a network path, and a certain gateway gives you the ability to actually have that traffic transmit to a specific location, so that VOIP traffic doesn’t drop and come back up. You know, if you’re got an office with a couple hundred phones and you lose a circuit, you don’t want every single phone basically rebooting on yourself, right? So with CloudGenix, because you don’t have assured gateway, how do you handle and approach those things, where you have, you know, more sensitive SaaS applications that don’t like the IP address to change?

Yash: [26.40] That’s a fair question, and the problem there that you speak of is the change in the NAT boundary. When a customer has two disparate circuits, and if you try to steer the traffic to another circuit, that boundary changes, the source IP changes, and the phone reboots/resets. That is true for a lot of the voice applications, but that exchanging also — so I’ll first talk about the application enhancements, which has nothing to do with CloudGenix, I cannot take credit for that, but an application like Zoom for example, when you have that on two different internets, and it’s going on the primary internet, and if you pull the cable out of the primary internet, the Zoom session stays up. So, they’ve figured it out to not break the session, and they are able to keep the session up, but there are many other applications that will break, and customers – to your point – they don’t want all the phones resetting when that happens. So now, when we have a requirement like this, we offer a customer two very simple solutions. It’s really – the solution doesn’t change the overall architecture; the solution is to solve for the problem at hand, and the problem at hand is just voice traffic. So, we say, “Look, lucky for you Mr. Customer, we are completely app routing, we are app defined, so we can selectively take those apps which you want to preserve, and preserve against a NAT boundary change, we can take these voice calls and sub signalling or whatever it might be, and we can do one of two things: we can steer them to your datacenter, through your datacenter, so if the branch has dual internet, and on this dual internets they have let’s say two VPNs to the datacenter…” I’m taking a simple scenario here. So now instead of sending their traffic directly out on the internet, it goes to a datacenter. Now, the customer may say, “Well, but my datacenter is too far away. I don’t really – this makes sense to me what you’re saying but I wish if there was a closer place where I could ping the traffic.” If that’s the use case, then we can basically figure out the nearest public cloud, or an Equinix location, whatever the customer fancies, and then we can spin up a virtual CloudGenix – like I said, we are available in all major public clouds and Equinix – so we are able to now host, or spin up a small instance of CloudGenix, and the only job for this is to have the VPNs for this particular branch, and steer the voice traffic off this — off this branch, so just the voice traffic goes to this nearby gateway. 

Yash: [29.25] So this gateway is not a CloudGenix gateway, mind you. This is completely managed and controlled by the customer. So, my point about us giving customers complete control still applies here, because whether it’s their datacenter or their public cloud gateway, it’s always the customer who is in control. It’s not a blackbox, not a vendor blackbox which the customer has no idea what’s happening and who’s sharing what resources. 

Max: [29.51] We talk about preserving MPLS or firewalls at the branch… Do they have to maintain a firewall at the branch? Can they get rid of the firewall and just run CloudGenix?

Yash: [30.05] We’re seeing a lot of that, so yes – absolutely yes, Max. The trend that we’re seeing in the industry is people want to reduce the hardware footprint at the edges. They don’t want — because the more hardware footprint you have, it’s more opex, and every time there is a problem you need a bunch of engineers going there, so they want to reduce the hardware footprint at the edges. So, we give customers basically the choice and where the industry is moving towards is the SaaSy model, where the security services are also cloud-delivered. So, we have a very tight integration with a lot of the SaaSy vendors, and I’ll pick on Prisma for example. So, we have a one-click integration with PaloAlto Prisma, and what that allows our customers to do is instead of having a physical firewall at each of their, you know, twenty or two hundred or two thousand locations, now they have the ability to scrub that traffic against threat prevention: malware, URL filtering, what have you. They can do the whole connection firewalling in the cloud with the likes of Prisma, and that is a trend that we’re seeing more and more. And we have a beautiful story there with our CloudBlades feature.

Max: [31.23] I’m really excited about this model, and I mean, for me the idea is that any time you can take out a physical box from a location, you’ve won, you know? Nothing makes me smile more than taking a physical box out of a deployment, and I also just — you know, manageable, I mean – I just don’t understand why more enterprises haven’t shifted to this already. I mean, I understand why but I really do see that this is going to be the only way things are deployed. And as I understand it, when you’re talking about Prisma, you know if you have a branch where everything is not being pinned to Prisma, you’re really just talking about traffic that needs to transit to the internet. So, application traffic, datacenter traffic, branch to branch traffic, east-west traffic, all these different models would stay within the CloudGenix infrastructure, and not touch Prisma, but things that were going to and from the internet would then transit and touch Prisma. I mean, am I stating that correctly?

Yash: [32.23] You are, but again – with us, with CloudGenix, we give the customer that choice. So, we have some customers that say every single traffic coming out of that branch needs to hit a firewall, and so every traffic goes through Prisma Cloud, for example. There are many customers who say that, “You know what, I don’t want to send my iTunes traffic and my Google traffic or my real time traffic through Prisma.” So, we have – and I didn’t say this and maybe this is the time to say it – we have a full app-defined zone-based firewalling. So right on the edge, with a single pane of glass, with the CloudGenix orchestration, you can have security policies that can filter out applications that you don’t want, so that goes to blacklisted applications… We can whitelist applications and say, “You know, these applications I trust, and these applications I don’t see the need or value to send to a scrubbing engine in the cloud, so send it directly out on the internet, and then these other applications -” this is what you were saying, Max, “- these other applications, they live in my datacenter. So now I want to use the CloudGenix VPNs or AppFabric, as we call it, and send that traffic either branch to branch, through datacenters, sending that directly out. And now what remains is a gray list traffic, which is all the users accessing the internet at that branch, and that user to internet applications, which are not whitelisted, they are called gray list traffic and that gray list traffic can go to a SaaSy model to get scrubbed in the cloud. So, we support all of that model; it can be all or none, it can be any combination thereof.

Max: [34.08] So, with the cloud, with the public cloud, you spin up an instance, the public cloud is providing connectivity, you know, you’ve got resources – you’re fine, right? So, when you started talking about legacy datacenters, and a legacy datacenter is going to be connected typically with point to point circuits or an MPLS circuit, plus internet, right? So, let’s just say that as a deployment model. How does CloudGenix insert into the datacenter? Where do you sit in a datacenter, how do you aggregate massive amounts of branches? I mean, you’re talking about, you know, if you have a financial services customer that’s got four thousand-plus branches, that is a lot of capacity coming back to relatively small amounts of centralized locations. What does your scale-out model look like in there?

Yash: [34.55] Yeah, and it becomes actually more exaggerated for these financial customers, who are back-calling all the traffic, even today. I say that because for a lot of the customers, a lot of the high tech customers – and we have a lot of wins there also – a lot of the who’s-who in the high tech industry are fully deployed on CloudGenix, they have bigger, fatter pipes at the branches, and the aggregation of all of that at the datacenter, you would think that it is impossible for any VPN device to aggregate that much bandwidth, but the fact is, a lot of these customers are sending traffic directly out on the internet. So, it never really comes to the datacenter. Let me answer the question of where we are deployed in the datacenter and how do we scale the datacenter. So, at the datacenter, our deployment is off-path deployment… Off-path. What that means is, we sit – we don’t have a whiteboard so I’m going to say this as simply as I can – we sit on the side, peering with BGP, with the datacenter core, and optionally, if the customer has MPLS edge, then with an MPLS edge. So, think of it like a triangle, where CloudGenix is one point, and the CloudGenix has a peering relationship with the WAN edge, and a peering relationship with the datacenter core. Now – and I’m going to talk about scale and HA in a bit – but in this triangle relationship, because we are sitting off-path… Now think about this, in a lot of the other SD-WAN use cases, when their datacenter SD-WAN goes down, it takes down all the branches that are on that solution, on that SD-WAN. With CloudGenix, because we are an off-path model, if the single CloudGenix, so let’s say there is no HA, if the single CloudGenix at the datacenter dies, well all the branches will still have their existing MPLS connection because the WAN edge is still there, datacenter core is still there, so they can still communicate, the business is still up. So, our architects have given a lot of thought on how do we deploy at the branch, and give customers resiliency, which we talked about, and now we’re talking about how do we get deployed at the datacenter? When we were a startup – well, we still are a startup – but when we were brand new four years ago, we would go to a customer and say, “Hello, we’re CloudGenix, do you want to install this appliance in your datacenter?” They would say, “Get lost, we don’t want you to be in the center of our universe!” But when we offered them a model, which is where we sit on the side, and we proved them that even if you fail the device, your branches will still stay up on the legacy MPLS – that’s where they’re like, “Nobody does that, this is great.” Now, the scale part of it: each device at the datacenter can do about five and a half gigs of encryption, that may or may not be impressive, depending on the customer who’s listening to this right now, but we have a horizontally-scalable model. So, the customers who do have a lot of throughput requirements, they have twenty gig, forty gig, whatever – just like a web farm, you can have stacks of these datacenter deployed in parallel. So, it’s like a web farm where you can horizontally scale the throughput that you need.

Max: [38.22] And what’s managing that horizontal scale? Are you doing ECMP, UNINTELLIGIBLE 38.25 or are you actually seeing this within the orchestration layer, these are endpoints and the branch devices become aware of multiple endpoints, and they’re managing themselves with what to connect to and how to scale out?

Yash: [38.39] Yeah, no – that’s a great question. So, we’re not doing ECMP, it’s pretty smart, and I’ll try to explain this simply without the whiteboard again. Let’s say that you have a hundred branches, and you had two datacenter appliances in that one datacenter; a hundred branches, aggregating to two datacenter devices at the datacenter. The way our HA works, the active-active HA works, is statistically, roughly fifty branches would be active on one device, and the other fifty would be active on the second device. Now, each of the hundred branches would have a VPN to both the datacenter devices. When I say active, what I mean specifically is which device is going to be advertising the BGP to the datacenter. So, for fifty branches, only one device advertises the BGP route, and for the other fifty, the second one advertises the BGP route, so we’re not relying on BGP ECMP; from a BGP standpoint there is only one source of truth, there is only one place they should be going, because if you did ECMP, we would have all sorts of asymmetry, and a whole bunch of other problems based on the hashing that the vendor uses on ECMP… So no, we don’t use that – we use the model that I described. 

Max: [40.04] So for an enterprise that’s taking and acquiring a few cabinets of datacenter space to host critical applications, right? Maybe that’s their ERP… You know, ERE installation, you know, accounting, whatever it actually is. Maybe this — you know, if an organization is not… You know, does not have expertise around BGP will CloudGenix actually help them configure this? How deep will you go into the customer’s gear to get this stuff working?

Yash: [40.30] We give our customers a white glove service. So we absolutely support our customers with their deployment, and we have a deployment team as well, so right from the pre-sales to the actual deployment, we are lock and step along with our customers, shoulder to shoulder, during the deployment and the planning phase of it.

Max: [40.53] Deployment timeline, I mean when you get engaged, obviously there’s a pretty big variety of what the customer’s going to look like, you know, how many branches they have, how dispersed they are, existing MPLS setup right? That’s all going to impact decisions. But if you’re talking about a simple deployment, let’s say it’s 20 sites. We’re just looking at internet-based optimization and aggregation… What should somebody expect or kind of anticipate in terms of a project plan? What does that cycle look like, what’s the engagement, what should they be prepared for, can you walk me through that?

Yash: [41.27] So realistically, the first branch, the first – maybe the first two branches, are usually the slowest, because a customer is still getting used to the UI – our UI is one of the simplest. I don’t know if you’ve had – I should say the pleasure of – looking at the product demo, but our UI, customers rave how simple it is, but it is still different. It’s still different than whatever else they may have been used to in the past. So, the first one or two sites may take a bunch of hours – it can be two hours, it can be four hours but once the customer gets used to it – I talked about the white glove deployment – so I lead an SE team here, and I always ask my SEs to be with the customer when they deploy… After the two sites, maybe three, the customer doesn’t bother calling us, and they’re like, “Oh yeah, by the way, I deployed these three sites over the weekend, and they don’t even need to be there,” so the device is plugged in and within a few minutes the device is online. So, there is definitely a curve, a ramp, a technical curve initially, because it’s so different – it’s so new. But it is all said and done, operationally, one of the tangible savings we have in terms of the ROI is how much time it reduces – or how much time our customers get back when they deploy a solution like ours. So, one fact and figure I’ll give you one of the retail store based in PAC Northwest, one of the earliest customers we have, it used to take them three days – so whatever that comes out to, fifty plus hours – to do a retail deployment, a full-stack retail deployment, which included the wireless, the firewall, the routers for SD-WAN, all of that… With CloudGenix, and of course their own automation – because it’s not just CloudGenix they’re automating, so kudos to their automation team as well, they’ve now reduced that to less than four hours. So, less than four hours, the entire retail store, the entire IT stack comes up. So, they call it store in a box, and they’re able to roll that out in four hour windows and they have multiple deployment teams that are able to spin up multiple dozens of retail stores, or if they’re updating or changing hardware, they can do multiple dozens a night. So that is the kind of benefit that our customers can expect.

Max: [43.56] I mean it’s hard to quantify that in terms of you know, efficiency and flexibility for an organization to make that kind of change. I mean, it’s mind-boggling when you actually see it in practice, of what you can actually accomplish. There’s one more thing that we didn’t talk about and we should get into… So, for the enterprise that has still, you know, an HQ model, with centrally-hosted resources at an HQ, and you know, some number of remote locations – I think manufacturing, you know, comes to mind right away, right? So there’s not a datacenter in play, but they’re hosting ERP or manufacturing automation controls or there’s an ETL process, or something like that, for orders, for their warehouses coming in and out… Now you have things that have to be publicly surfaced from the HQ, not necessarily just to the branches, but to customers and to partners, and there’s a requirement for high availability in that world. If they’re deploying CloudGenix and using CloudGenix for their HQ and for their branches and for their different locations, what should they expect in terms of those public resources that are actually being served and hosted to the internet?

Yash: [45.04] Yeah, so I’ll talk about the most important feature for this question which is NAT, but before I do that – every CloudGenix device, when you configure a port to be an internet port, it’s a completely locked-down port. So, I mean I’m stating the obvious here, but I do want to make the point that by default, nothing gets in on the internet – you can’t even ping the interface – the internet interface — so everything is blocked, but then you often have these situations like you described, where you have a service inside the LAN that needs to be exposed. It could be this manufacturing service, it could be… A very common use case is remote access; GP, and they need to expose the GP service, because remember I said the firewall sits south of us on a branch deployment… So, we support a very flexible NAT. So NAT is the underpinning of this — the technology that will allow this to happen. So the flexibility NAT that we have, anything that you can do on a modern firewall, when it comes to NAT’ing, which is a NAT on the ingress, NAT on the egress, source NAT, DNAT, no NAT, a static NAT; all of this is fully supported on CloudGenix, with a very simple, flexible policy. So again, it’s not a device by device config, you now are able to create a policy and you can say, “You know what, all my manufacturing sites have this service exposed, and all of this has this local,” so obviously we support local locales, so these local prefixed lists are only relevant to these local branches… I don’t know how much detail we want to go into, but this is – what you’re asking is a very common ask, in some sets of our customers, and we are able to deliver that with our flexible NAT policy.

Max: [46.49] So you know, multiple circuits, different IP addresses on each circuit, you define the NAT policy for each — the circuit for each IP address is coming in, facing whatever service you want to, published to the internet… And then the enterprise is using a DNS-based traffic distribution with healthcheck, and something is saying, prefer this IP address, and if it goes down, failover here. Is that the deployment model?

Yash: [47.15] It is, it is. And to be fair, that failover is triggered by DNS, or by the user agent, right? If it’s a GP, then their GP client then automatically connects to the second IP – but yes, that is not controlled by us, because we have no control on what is acceptable and what is not. From a service standpoint, we give you the plumbing, we give you the access, we do the – what’s the right word – we open the right pinholes with the security measures for the service to be exposed, over the disparate internet circuits, going from one path to another is controlled more by the application and the agents. 

Max: [47.58] Do you guys ever see models where people are deploying small CloudGenix appliances at you know, remote workers, distributed workforce? So, like myself, I would sit at my house and have a box at my house, and use that to then, you know, aggregate and connect back?

Yash: [48.15] So now in this unique situation that we’re in with the pandemic, we’re seeing that. So this is not unprecedented for us, I mean the pandemic is, but the ask for a home office device – in fact, we’re just closing a big deal in Phoenix, and their ask is mainly they have these agents who take your travel reservations, and these agents, they tried the softphone and soft agent and for the business, they hated it, so now they’re basically going to go with a CloudGenix small device, a small, fanless device. The agent just plugs in the port, all the VPN comes up, and the business says that they also now have the ability to put an SLA on the application, on the network uptime, they’re able to get all that with the CloudGenix orchestration, with the UI. So, this is an increase – and we haven’t seen a lot of that, but this is definitely a use case that is on the rise recently. 

Max: [49.14] I mean you lose a lot of things, because the assumption would be a  home user does not have more than one internet circuit and it’s probably a broadband circuit but they’re not failing over between different carriers, so you’re not necessarily routing around carrier issues, but you are getting management, and reporting and prioritization, these sorts of things out of it, in a very easy to plug manner, right? You’re not trying to configure software VPN, it’s just you plug this box in and the box does its thing.

Yash: [49.39] Exactly right, so it’s not about active-active or anything like that, it’s more about everything that you said: prioritization, classification, visibility… Sometimes, even if you have one path, you have one broadband path – and this might be interesting to you on the technical side – one path, and you will think, “Well, what can an SD-WAN solution do, it’s only one path at the end of the day?” Well, what if this particular customer has two datacenters, one on the east coast, one on the west coast? This has happened with many of our enterprise customers, but what if this agent is using their corporate email – I’m just going to pick on Google for now – let’s say it’s Gmail, that’s a corporate email. Gmail is having a problem, it’s having an app brownout on the local internet this agent has. So, she’s trying her best to be effective, to be productive, but her email doesn’t work because her local broadband provider has a peering problem with Google – that’s an application brownout. No SD-WAN can do anything about it – we’re going to detect that that’s happening, we’re going to then backcall just Google, Gmail traffic over to the VPN, to either the east coast or west coast, based on wherever she is, wherever is the active datacenter or primary datacenter for her, and we will then be able to backcall just Gmail traffic over to the datacenter. The datacenter’s internet would not be having this Google brownout situation – probably resolved. So this is that again, I’m tying it back to the app brownout, the app health — so even on a single path, you have multiple, logical paths, an underlay path and two overlay paths, and we’re still able to do path selection on the single path. 

Max: [51.15] That’s awesome. So, really what you’re saying is, the network team would have to configure this within CloudGenix, this isn’t an automated thing, but you would say… You know, for Gmail, you can connect directly, or here’s our datacenter paths, or here’s our cloud-hosted appliance path – appliance isn’t the right word – cloud-hosted instance path, and you know, if Gmail won’t be on a direct path, send it somewhere else. 

Yash: [51.41] You’re right, but it’s not that much also, meaning that by default, our policies, you can say something like something like – you know what, all SaaS traffic, I want to have go directly out on the internet, but as a backup – just as a backup. If things go south, and you use the VPN, that’s all you have to — you have to provide the intent, and the intent doesn’t have to be per app; you can make it per app, you can make it as granular and say, “You know what, Google Docs is not allowed to be backcalled, but Gmail is,” so you can go that granular, but you don’t have to. So, our customers can do as much work or as little work in terms of configuration as they desire.

Max: [52.19] And this is something you guys are going to do with the customer as you do in deployment, helping them apply these rules, figure out the details and say, “Hey, we really think you should do these sorts of things, save yourself some aggravation down the road.” 

Yash: [52.30] Which is why also the first one or two sites take a few hours, like I said, because these are the things we discover, and we give them feedback, and they’ll ask us questions, and we say, “Okay, here’s why,” and so, you’re exactly right; we do help them out with all this.

Max: [52.47] Yash, thank you very much, I’ll leave the last question to you. Is there anything that we haven’t touched on that you think we should hit before we wrap?

Yash: [52.56] What I would ask, for the audience, if they are interested in what they heard so far, and some of them might even be in this belief, “Ah, this is not – this certainly cannot be true,” what I would ask or even challenge is to take a peek at our solution, request a demo, because I can say whatever I want here Max, the reality is unless the customer sees it live, either in the demo or in their own environment, that’s when they are a believer, right? That’s why we have twenty-seven percent of the Fortune 1000 customers that are fully deployed on CloudGenix, so we are very, very proud of our logos and our customer wins.

Max: [53.34] Awesome, thank you very much. 

Yash: [53.36] Thank you, Max.