Containerization vs. FaaS

September 18, 2017 · Shirish Kamath · 2 min read

Containerization is a major leap from using fat VMs to run isolated environments for your app. But I'm not entirely sold on "Dockerize all the things".

I <3 the workflow of coding, testing, and deploying Docker containers to AWS ElasticBeanstalk. If you are in the business of building and deploying microservices, this is a workflow you should definitely consider. Finally, if you're on GitLab, you can set up tests in your microservice to run in the Docker container on GitLab's CI.

In production, you can easily auto-scale each microservice independently depending on what components are experiencing traffic spikes or require scaling horizontally.

All of the above discusses how useful Docker is for shipping code with a stable, reproducible runtime environment. From a development perspective, Docker has been more or less painless. From a "unit of deployment" perspective, Docker has been painless. From a performance perspective (in production), Docker's effect has been, quite frankly, unknown.

A lot of tools have been developed to capture and analyze webapp performance metrics over the years. When VMs came along, most of these tools could simply be installed in VMs instead of the base OS and they'd continue doing their jobs. Containerization sometimes requires specialized tools for monitoring and capturing metrics. There are tons of new tools for this new space that are effectively reinventing stuff that worked at the OS/hypervisor level.

Or, you know, buy into AWS's stack and use EB or ECS + Cloudwatch...

An alternative is to build services on a Function-as-a-Service (FaaS) platform such as AWS Lambda or Google Cloud Functions, but I believe we aren't there yet. For instance, connecting to RDS from AWS Lambda isn't as good as EC2 to RDS. Unless you make your RDS instance publicly accessible (terrible idea), Lambda instances need to be in the RDS VPC, which means a Lambda fns having a cold-start take a few extra seconds to come up because of VPC interface/NIC requiring configuration. A work-around for this is to keep the Lambda fns warm with regular "heartbeat" requests, but this kind of defeats the purpose of Lambda. But even if you solve that issue, you'd also have to set up a VPC NAT Gateway for your Lambda functions to access the internet. You could also run the risk of exhausting database connections to RDS from concurrently running Lambda instances that start/stop quickly.

So for a balance of performance and uptime one option is to have Lambda functions requiring database access talk to an EC2 instance via pub-sub/HTTP/whatever and have that instance connect to RDS with a connection pool. Is that really worth it though? Bottomline is - there are major tradeoffs involved, and Lambda at the moment doesn't solve all problems magically.

We use Lambda with DynamoDB (key/value store) in production, and I must say, it's a breeze. DynamoDB has a somewhat unintuitive pricing model, so costs aren't very clear upfront. When traffic picks up, you may have to shell out more or risk degraded performance. But from an uptime perspective, the two make a very reliable pair. DynamoDB is by no means an impressive data store and it severely lacks advanced querying capabilities, but for simple CRUD use-cases, it's good enough.

Apart from this, FaaS changes the way apps are developed. You have to consider using a framework such as Serverless or Apex or something else for structuring your code. There are also some neat ways to build REST APIs with Swagger defs on Lambda. But if you choose this path, know that things are going to be different from your typical monolithic Rails/Django app. You may have to reorganize your code for it to effectively run on Lambda. Simple Sinatra/Express/Restify/Flask apps are decent candidates for Lambda provided you are okay with the datastore performance and cost considerations.