Best Practices Learned from 1,000 AWS VPC Configurations

Missed our webinar on, “What We Learned from 1,000 Amazon Web Services (AWS) VPC Configurations?” No worries. Watch the recording of our webinar, access the slides and read below for a summary of the webinar.

In this webinar, we covered what we learned from 1,000 Amazon VPC configurations that we’ve configured for companies of all sizes: small businesses, Fortune 100 companies and everything in between.

You can jump to different sections by clicking the hyperlinks below:

  1. What is an AWS VPC?
  2. AWS VPC Topology
  3. Accessing the AWS VPC
  4. AWS VPC Packet Flow
  5. Lessons Learned: AWS VPC Best Practices
  6. SoftNAS and AWS VPC
  7. Common AWS VPC Mistakes
  8. SoftNAS Cloud NAS Overview
  9. AWS VPC Q&A – Questions from Webinar Attendees
  10. Claim my $100 AWS Credit

Watch the recording:

See the slides on Slideshare:  What We Learned from 1,000 AWS VPC Configurations

Amazon VPC Configuration with SoftNAS Cloud NAS: See the guide here

SoftNAS Cloud NAS on the AWS Marketplace: Visit SoftNAS on the AWS Marketplace

What is an AWS VPC?

aws vpc overview


Let’s talk about the AWS VPC, or the Amazon Virtual Private Cloud (VPC). What exactly is an Amazon VPC? It’s a virtual network that’s specific to your environment. Think of it as your own private data center or private network within your AWS account. It provides you with a location for launching resources, such as EC2 instances that you want to physically isolate into your own private environment.

It gives you some configuration options that are not generally available in the EC2 classic surface. Examples include: being able to configure your private IP address ranges, setup private subnets, control and manipulate the routing versus route tables, setup different networking gateways which will cover the different types of gateways that are available, and get granular with security settings from both a network ECL as well as a security group type setting.

aws vpc overview2

When you talk about the main features that are available, you talk about control, right? You configure what is going to be your IP address range, how the routing actually works, whether or not you’re actually going to allow VPN access, and what the actual architecture of the different subnets are going to be within the VPC. You have security options such as security groups and network ACLs and specific routing rules that can be configured. That allows you some different features, such as running multiple NIC interfaces. You get static private IP addresses, and some of the EC2 instances are actually only available for launching the VPC such as the T2s.  

You can use a VPC to leverage and create an AWS hybrid cloud by leveraging the AWS direct connect service. It allows you to extend your on-premise into the AWS cloud over a high bandwidth, low latency connection. There’s some network advantages, leveraging things such as VPC peering. Or you can actually connect your VPC to another VPC.

This could be done for your organization or you could actually use it to connect to other organizations for specific services or specific access if that were required. Plus you have things like endpoint flow logs that can actually help you with troubleshooting connectivity issues or problems that you may be having providing gaining access into specific services within the VPC itself.

AWS VPC Topology

aws vpc topology

Just a couple of notes on AWS VPC topology. VPCs are used in a single region. But they are multi-availability zone, which basically means that each subnet you create has the availability to live in a different availability zone. Or you can put them all in a single availability zone if you want. All of the subnets that you create within a VPC can route to each other by default. The overall network size for a single VPC can be anywhere between a16 or a28 subnet for the overall CIDR of the VPC, and that’s also configurable for each of the subnets that you want to set within the AWS VPC.

It also gives you the ability to choose your own IP prefix, so if you want the 10 network or the 50 network or whatever you’d like your private IP address to be, that’s going to be configurable within your AWS VPC topology.

How can you gain access to this VPC and how do the resources within it actually gain access out? There’s several different types of gateways, and one of the things that we get asked quite frequently is, what does each of these gateways do? How do they work? There was this IGW and VPC and CG, and what do they mean and how do they function? Hopefully this gives you a pretty brief and easy explanation.

The internet gateway is actually the internet gateway, so you can point specific resources within your VPC via route tables to actually gain access to the outside world via the use of the internet gateway, or you can actually leverage a NAT instance, and more about NAT later, right? And then when it comes to actually providing VPN access into your VPC, whether that would be done via like, say, direct connect where you had dedicated bandwidth to connect to your VPC, or leverage a hardware based VPN.

There’s two parts; there’s the VPG which is the virtual private gateway which is the actual AWS side of a VPN connection, and then there’s the customer gateway which is the customer side of a VPN connection. Most of the major VPN hardware vendors have supported template configurations that you can download directly from the virtual private gateway interface within your VPC via the AWS console.

How do the packets actually flow within a VPC? Let’s just take a sample setup in your  VPC, and let’s actually talk about how the packets will flow and how you would connect. In our example here, we have three subnets: the, the, the We’ve got three instances; instance C here is connected to subnet 3, instance A is connected to subnet 1. And instance B actually has two elastic network interfaces, or ENIs, that are connected to two different subnets, right, subnet 1 and subnet 2, and understanding the logical flow of packets within an AWS VPC really was an eye-opening and enlightening experience for me and my team that allowed us to actually be able to troubleshoot and deploy environments a lot better.

Accessing the AWS VPC

aws vpc access

Let’s talk about how instance A and instance B actually connect to each other over subnet 1. They’re both living in the same subnet, so by default the routing table is the first thing that it hits, and that routing table has automatically a default route associated with it to route to all traffic within the overall CIDR of the VPC. Next, it hits the ARP table, the outbound portion of the firewall, a source and destination check actually occurs, which is a configurable option within AWS. 

Then it hits the outbound security group which by default, the outbound security group is wide open, okay? All traffic is allowed out. It then goes over to the other instance and checks the inbound security group, a second source and destination check and then hits the firewall before the packet actually flows into instance B.

People will say, “I can SSH or I can’t SSH or I can ping but I can’t ping,” and a lot of the problems that people experience in troubleshooting connectivity here is primarily around the security groups, primarily on the inbound side because the outbound is actually opened by default. This is usually the first place to check when you’re having some type of connectivity issue within the AWS VPC to ensure that the security group is not actually blocking the type of traffic based upon either source or destination IP or port number, for example, that may be impairing your connectivity.

AWS VPC Packet Flow

aws vpc packet flow

So how would the packets flow to instance B and C?  So let’s just go quickly back to make sure that we understand that we’ve got instance B which is living in two subnets and instance C which is actually living in subnet 3, right?  So how would that actually look?  So if instance B wanted to talk to instance C, it can go one of two ways. It could go out subnet 1 or subnet 2, but the same actual rules apply, right? It’s going to hit the route table, go to the firewall, source destination check, security group out.

aws vpc packet flow2

It’s going to check the route table to make sure it has a route to that destination network, and then because it’s going to a different network it’s actually going to check the network ACL out, and then on the reverse side it comes back in. It’s going to check the network ACL in before it checks the security group, so it’s different types of connectivity options for instances that happen to live in a different subnet, right? So this is some very important information.  Hopefully you’ll find that it’s useful. I know that from my perspective and my team’s perspective, once we really understood how the packets flow and where they were going to, how everything was being checked and designed, it really allowed us to understand a lot better troubleshooting and looking at different connectivity issues.

aws vpc packet flow3

AWS VPC Best Practices

Let’s talk about some of the lessons that we’ve learned in all of these multiple different AWS VPC scenarios that we’ve seen. I’ve personally put my hands on, 85% of these 1,000 plus AWS VPCs, either from different tests that I’ve ran and created myself or engaging with customers who are deploying SoftNAS Cloud NAS in their environment, troubleshooting SoftNAS Cloud NAS in their environment, etc.

aws vpc best practice

The first best practice is organize your AWS environment. We recommend that you use tags. As you continue to add instances, create route tables and subnets, it’s nice to know what’s connects with what. And the simple use of tags will make life so much easier when it comes to troubleshooting. Make sure you plan your CIDR block very carefully. We would suggest that you go a little bit bigger than you think you need and not smaller.

Remember that for every subnet that you create, AWS takes five of those IP addresses for subnet. So when you create a subnet know that off the top there’s a five IP overhead. Avoid using overlapping CIDR blocks, and the reason being that at some point, you may not want to do it today but you may want to do it down the road, you may want to pair this VPC with another VPC, and if you have overlapping CIDR blocks, the pairing of the VPC will not function correctly and you’re going to find yourself in a world of configuring nightmare in order to be able to get those VPCs to pair.

Try to avoid using overlapping CIDRs, and always save a little bit of space for future expansion. There’s no cost associated here with using a bigger CIDR block, so don’t undersize what you think you may need from an IP’s perspective just to try to make it clean and easy.

aws vpc subnet

You can subnet your way to success, right? And so understand, what is your subnet strategy going to be? I would suggest that you align your subnets to different tiers as humanly possible, such as DMZ/Proxy layer, ELB layer if you’re going to be using load balancers, application or database layer. Remember, if your subnet is not associated to a specific route table, then by default they’re going to the main route table. It’s caught up a lot of people in my dealings where they created this route table and they’ve got a subnet but they’ve associated the subnet to the route table but they thought they did. So the packets aren’t flowing where they think that they are.

I would suggest that you put everything in a private subnet by default and use either ELB filtering and monitoring type services in your public subnet.  You can use NAT to gain access to public networks. I would highly recommend, and you’ll see this later, that you use a dual NAT configuration for redundancy. There’s some great cloud formation templates that are available to set up highly available NAT instances and make sure that you size those instances properly for the amount of traffic you’re going to actually push into your network.

You can go ahead and set up VPC peering for access to other VPCs within your environment or maybe from a customer or a partner environment, and I would highly suggest leveraging the endpoints for access to services like S3 instead of actually going out either over a NAT instance or over an internet gateway in order to gain access to services that may not live within the specific VPCs. They’re very easy to configure and they’re actually much more efficient and have lower latency by leveraging an endpoint than actually going out over a NAT or over an internet gateway to gain access to something like S3 from your instance, okay?

aws vpc access

Control your access. Don’t be lazy and use a default route to the internet gateway. I see a lot of people that do this, and it comes back to cause them problems later on.  I mentioned to use redundant NAT instances. There is some great cloud formation templates available from Amazon on creating a highly available redundant NAT instance.

The default NAT instance size is an m1.small, which may or may not suit your needs depending upon the amount of traffic you’re going to use, and I would highly recommend that you use IAM for access control, especially configuring IAM roles to instances, and remember that IAM roles cannot be assigned to running instances. It has to be set during instance creation time, and using those IAM roles will actually allow you to not have to continue to populate AWS keys within the specific products in order to gain access to some of those API services.

SoftNAS Cloud NAS and AWS VPCs

softnas aws vpc

How does SoftNAS Cloud NAS fit into AWS VPCs? We have a highly available architecture from a storage perspective, leveraging our SNAP HA capability, which allows us to provide high availability across multiple different availability zones. We leverage our underlying secure block replication with SnapReplicate, and we highly recommend using SNAP HA in a high-availability mode which would give you a no downtime guarantee, plus a five nine uptime, and also it’s important to remember that Amazon provides no SOA unless you run in a multi-zone deployment, right? So a single AZ deployment has no SLE within AWS.

We have two methods of actually deploying our cross-zone high availability here at SoftNAS. The first is actually to leverage the use of elastic IPs, where you have two separate controllers, each in their own availability zones. They’re in the public subnet and we assign each node an elastic IP address. We use a third elastic IP address as our VIP or virtual interface.

You can figure SnapReplicate between the two instances which will provide you the underlying block replication, and then what happens is that the elastic IP address that’s considered to be the VIP IP address is assigned to whatever’s the primary controller, and whatever services you have from an NFS, CIFS or iSCSI perspective will actually mount or map drives to that elastic IP address, and then if there is a failover or failure of the storage instance.

It will move that elastic IP address over from the primary controller to the secondary controller should anything trigger our HA monitor, which looks at things like health of the file system, health of the network, at multiple different levels. This is applicable for doing things like backing EBS with SoftNAS, using S3 with SoftNAS.

The second mode is to use a private virtual IP address where both SoftNAS Cloud NAS instances actually live within a private subnet and don’t have any access out, and what you would actually do there is it’s the same underlying SnapReplicate technology and monitoring technology.  However what happens here is you actually pick a virtual IP address that is outside of the CIDR block of your AWS VPC, your clients map to it, there’s an entry that’s automatically placed into the route table, and should there be a failover occur we’ll update the route table automatically in order to route the track properly to the proper controller that should be the primary at the time. This is probably the more common way of deploying SoftNAS in a highly available architecture.

Common AWS VPC Mistakes

aws vpc mistakes

And so just a couple of common mistakes, and this comes from our support team that they see the customers do. It’s that each of these deployments require two ENIs or two NIC interfaces, and both of those NICs need to be in the same subnet. You need to make sure that you check this when you’re creating your instances or adding the ENS, and make sure that both NICs are in the same subnet.

The other common is that one of the health checks we actually perform is to do a ping between the two instances, and the security group isn’t always open to allow the ICMP health check to happen which will cause an automatic failover to happen if we can’t gain access to the other instance. We do actually leverage an S3 bucket here in our HA deployment as a third party witness, so if you deploy SoftNAS as your private subnet, we do need to gain access to the S3, either via NAT or the configuration of an S3 endpoint within the VPC.

And again, as I mentioned just a few moments ago, for private HA, a virtual IP address must not be in the same CIDR of the AWS VPC. So if your CIDR is, then you need to pick a virtual IP address that doesn’t fit within that subnet, so say would work in that particular case or whatever works for you best, but it cannot fall within the CIDR block of the AWS VPC, or the route failover mechanism that we’re leveraging will not function properly.

SoftNAS Cloud NAS Overview

SoftNAS Cloud NAS is a powerful enterprise-class, virtual storage appliance that works for both public, private and hybrid clouds. It’s easy to try, easy to buy, and easy to learn and use. You have freedom from platform lock-in, and it works with the most popular cloud computing platforms including Amazon EC2, VMware vSphere, CenturyLink, Microsoft Azure. Our mission is to be the data fabric for businesses across all types of cloud, whether private, public or hybrid.

We have a couple of different products that we can leverage. Our first is our SoftNAS Cloud NAS cloud product which runs on your public clouds, which is a NAS filer for public clouds. We have our cloud file gateway which is for on premise use to connect to cloud-based storage. We also have SoftNAS for service providers. Which is our multi-tenant NAS replacement for service providers that leverage iSCSI and object storage.

VPC Q&A – Questions from Webinar Attendees

  • We use VLANs in our data centers for isolation purposes today. What VPC construct do you recommend to replace VLANs in AWS?
    • That would be subnets, so you could either leverage the use of subnets or if you really wanted to get a different isolation mechanism, create another VPC to isolate those resources further and then actually pair them together via the use of VPC pairing technology.
  • You said to use IAM for access control, so what do you see in terms of IAM best practices for AWS VPC security?
    • So the biggest thing is that you deal with either third party products or customized software that you made on your web server. Anything that requires use of AWS API resources need to use a secret key and an access key, so you can store that secret key and access key in some type of text file and have it reference it, or, b, the easier way is just to set the minimum level of permissions that you need in the IAM role, create this role and attach it to your instance and start time. Now, the role itself can’t be assigned, only during start time. However, the permissions of several can be modified on the fly. So you can add or subtract permissions should the need arise.
  • So when you’re troubleshooting the complex VPC networks, what approaching tools have you found to be the most effective?
    • We love to use traceroute.  I love to use ICMP when it’s available, but I also like to use the AWS Flow Logs which will actually allow me to see what’s going on in a much more granular basis, and also leveraging some tools like CloudTrail to make sure that I know what API calls were made by what user in order to really understand what’s gone on.
  • What do you recommend for VPN intrusion detection?
    • There’s a lot of them that are available. We’ve got some experience with Cisco and Juniper for things like VPN and, Fortinet, whoever you have, and as far as IVS goes, like Alert Logic is a popular solution. I see a lot of customers that use that particular product. Some people like some of the open source tools like Snort and things like that as well.
  • Any recommendations around secure junk box configurations within AWS VPC?
    • If you’re going to deploy a lot of your resources within a private subnet and you’re not actually going to use a VPN, one of the ways that a lot of people do this is to just configure a quick junk box, and what I mean by that is just to take a server, whether it be a Windows or Linux, depending upon your preference, and put that in the public subnet and only allow access from a certain amount of IP addresses over to either SSH from a Linux perspective or RDP from a Windows perspective.  It puts you inside of the network and actually allows to gain access to the resources within the private subnet.
  • And do junk boxes sometimes also work? Are people using VPNs to access the junk box too for added security
    • Some people do that. Sometimes they’ll just put like a junk box inside of the VPN and your VPN into that. It’s just a matter of your organization security policies.
  • Any performance or further considerations when designing the VPC?
    • It’s important to understand that each instance has its own available amount of resources, from not only from a network IO but from a storage IO perspective, and also it’s important to understand that 10GB, a 10GB instance, like let’s say take the c3.8xl which is a 10GB instance. That’s not 10GB worth of network bandwidth or 10GB worth of storage bandwidth. That’s 10GB for the instance, right? So if you have a high amount of IO that you’re pushing there from both a network and a storage perspective, that 10GB is shared, not only from the network but also to access the underlying EBS storage network. This confuses a lot of people, so it’s 10GB for the instance not just a 10GB network pipe that you have.
  • Why would use an elastic IP instead of the virtual IP?
    • What if you had some people that wanted to access this from outside of AWS? We do have some customers that primarily their servers and things are within AWS, but they want access to files that are running, that they’re not inside of the AWS VPC.  So you could leverage it that way, and this was the first way that we actually created HA to be honest because this was the only method at first that allowed us to share an IP address or work around some of the public cloud things like node layer to broadcast and things like that.
  • Looks like this next question’ is around AWS VPC tagging. Any best practices for example?  
    • Yeah, so I see people that basically take different services, like web and database or application, and they tag everything within the security groups and everything with that particular tag.  For people that are deploying SoftNAS, I would recommend just using the name SoftNAS as my tag.  It’s really up to you, but I do suggest that you use them.  It will make your life a lot easier.
  • Is storage level encryption a feature of SoftNAS Cloud NAS or does the customer need to implement that on their own?  
    • So as of our version that’s available today which is 3.3.3, on AWS you can leverage the underlying EBS encryption. We provide encryption for Amazon S3 as well, and coming in our next release which is due out at the end of the month we actually do offer encryption, so you can actually create encrypted storage pools which encrypts the underlying disk devices.
  • Virtual VIP for HA: does the subnet this event would be part of add in to the AWS VPC routing table?
    • It’s automatic. When you select that VIP address in the private subnet, it will automatically add a host route into the routing table. Which allows clients to route that traffic.
  • Can you clarify the requirement on an HA pair with two next, that both have to be in the same subnet? 
    • So each instance you need to move NIC ENIs, and each of those ENIs actually need to be in the same subnet.
  • Do you have HA capability across regions? What options are available if you need to replicate data across regions? Is the data encryption at-rest, in-flight, etc.?  
    • We cannot do HA with automatic failover across regions.  However, we can do SnapReplicate across regions. Then you can do a manual failover should the need arise. The data you transfer via SnapReplicate is sent over SSH and across regions. You could replicate across data centers. You could even replicate across different cloud markets. 
  • Can AWS VPC pairings span across regions?
    • The answer is, no, that it cannot.
  • Can we create an HA endpoint to AWS for use with direct connect?
    • Absolutely. You could go ahead and create an HA pair of SoftNAS Cloud NAS, leverage direct connect from your data center and access that highly available storage.
  • When using S3 as a backend and a write cache, is it possible to read the file while it’s still in cache?
    • The answer is, yes, it is. I’m assuming that you’re speaking about the eventual consistency challenges of the AWS standard region; with the manner in which we deal with S3 where we treat each bucket as its own hard drive, we do not have to deal with the S3 consistency challenges.
  • Regarding subnets, the example where a host lives in two subnets, can you clarify both these subnets are in the same AZ?
    • In the examples that I’ve used, each of these subnets is actually within its own VPC, assuming its own availabilities. So, again, each subnet is in its own separate availability zone, and if you want to discuss more, please feel free to reach out and we can discuss that.
  • Is there a white paper on the website dealing with the proper engineering for SoftNAS Cloud NAS for our storage pools, EBS vs. S3, etc.?
    • Click here to access the white paper, which is our SoftNAS architectural paper which was co-written by SoftNAS and Amazon Web Services for proper configuration settings, options, etc. We also we have a pre-sales architectural team that can help you out with best practices, configurations, and those types of things from an AWS perspective. Please contact and someone will be in touch.
  • How do you solve the HA and failover problem?
    • We actually do a couple of different things here. When we have an automatic failover, one of the things that we do when we set up HA is we create an S3 bucket that has to act as a third party witness. Before anything take overs as the master controller, it queries the S3 bucket and makes sure that it’s able to take over. The other thing that we do is after a take-over, the old source node is actually shut down.  You don’t want to have a situation where the node is flapping up and down and it’s kind of up but kind of not and it keeps trying to take over, so if there’s a take-over that occurs, whether it’s manual or automatic, the old source node in that particular configuration is shut down.  That information is logged, and we’re assuming that you’ll go out and investigate as to why the failover took place.  If there’s questions about that in a production scenario, is always available.
  • Can we monitor SoftNAS logs using SplunkSumo and see which log file we should monitor?
    • Absolutely, but we also provide some built-in log monitoring.  They key logs here are going to be in the SnapReplicate.log which controls all of your SnapReplicate and HA functionality. The snserv.log, which is the SoftNAS server log. It controls all things done via StorageCenter, and because this is a Linux operating system, monitoring log messages is a good idea.  That’s just a smattering of those. 

Looks like that’s all the questions. I’d like to thank everyone for taking the time to attend today’s webinar.  Our goal here was to pass on some of the lessons that we’ve learned from configuring AWS VPC deployments for our customers. As you’re making that journey to deploying in the cloud or you’re already operational in the cloud, maybe this webinar saved you time from tripping over some of the things that other customers have tripped over.

We’d like to invite you now to try SoftNAS Cloud NAS on AWS. We do have a 30 day trial. If you click blue button below, you can try SoftNAS Cloud NAS on the AWS platform with a $100 AWS credit. There are also some links there about how you can contact us further if you have any more questions and you’d like to get more information around it.

Claim my $100 AWS Credit

softnas aws vpc credit