Avoiding Serverless Cost Gotchas – Cost of Serverless Operation

Avoiding Serverless Cost Gotchas

In the next section, we’ll explore the tools and techniques available for estimating serverless costs. However, the cost of a serverless application that is distributed across many managed services, interacting constantly as traffic flows into and through your system, can be difficult to predict.

These more obscure costs can be seen as secondary costs of an application. Primary costs are incurred by usage of services that are core to your application, such as Lambda function invocations or DynamoDB storage. Secondary costs are typically incurred by service integrations. For example, services may perform API requests to facilitate integration with other services, such as retrieving KMS encryption keys to encrypt or decrypt data at rest or in transit, or operational tasks, such as sending logs to CloudWatch.

In Chapter 6, we recommended that you deploy to production as soon as possible to truly understand the behavior of your application. The same applies for monitoring costs, covered in the last section of this chapter—cost estimation will only get you so far, and for an accurate picture you’ll need to run your application in production. But not everything needs to be learned the hard way with billing surprises!

The following list describes some of the most common billing gotchas to be aware of when designing and operating your serverless application:

CloudWatch costs

CloudWatch is usually at or near the top of any serverless bill. You will be charged per GB of data ingested by CloudWatch Logs and per GB of data stored. Keep logging to a minimum (see Chapter 4 for how this can also improve security and reduce the risk of logging sensitive data) and prefer tracing with X-Ray where viable (see Chapter 8 for more information about tracing). Always set a retention period for your log groups (logs are kept indefinitely by default) and only retain log data for as long as it is useful. Consider moving log data to a cheaper alternative, such as S3 Glacier, if you require a long-term archive. CloudWatch alarms can also become expensive, as you are charged per metric listed in the alarm for every hour that the alarm is active during the month. Per the advice in Chapter 8, any alarms without a clear purpose should be removed.

Transfer-out costs

A common operations strategy is to transfer logs and metrics from CloudWatch to other systems for aggregation and analysis. If you do this for logs, you will be charged per GB of data transferred out from AWS to the internet. When transferring metrics out of CloudWatch, you should understand your options. Be aware of third parties that use the CloudWatch API to poll for metrics, as this will incur a high cost at scale. Prefer to use metric streams where possible.

Expensive caching

Caching is often viewed as a cost-saving technique: less requests means less compute and less cost. However, this may not always be true with serverless compute on Lambda, and you should always estimate costs based on volume with and without a cache. For example, in some high-volume scenarios applying API Gateway caching in front of your Lambda functions can get very expensive, as you are billed per GB of cache size per hour. Consider applying a caching strategy in future iterations once you have validated the need for it in production and confirmed it will reduce costs. In addition to API Gateway caching, you should also evaluate the other caching options available for your architecture, including Amazon Elastic File System (EFS), Amazon ElastiCache, Amazon Elas‐ tiCache Serverless, and Amazon CloudFront.

Services calling other services

Many AWS services make use of other services. The associated costs of these underlying operations can always be found in the pricing page for a service, but they’re not always obvious when making architectural choices, especially without applying the context of volume to your estimates. For example, Amazon Athena can be a very inexpensive service for medium-volume SQL queries when taken at face value ($5.00 per 1 TB of data scanned, at the time of writing). However, Athena will incur costs for other services, such as when making requests to the Amazon S3 API to query data, using the AWS Glue Data Catalog to model data, and retrieving encryption keys from AWS KMS if the source data in S3 is encrypted. Be careful with estimates and always monitor costs in production when releasing new pieces of infrastructure.

Infinite Lambda loops

It is possible to create an infinite loop of Lambda function invocations when the function outputs to the same service that triggered it. For example, a function may be recursively executed if it puts a message on the SQS queue that invoked it. You can use Lambda’s recursive loop detection feature to automatically detect and break recursive invocations.

Non-production costs

Be careful of pay-per-use in non-production environments. If you have serv‐ ices that are continually called, for example via scheduled jobs, continuous integration pipelines, or third-party webhooks, the costs could add up. As rec‐ ommended in Chapter 6, keep pre-production environments to an absolute minimum. Reducing non-production costs is also closely associated with the first point in this list: CloudWatch costs. CloudWatch alarms will usually only be useful in your production environment, and you can also reduce the amount of log data ingested through the use of logging levels depending on your needs in non-production environments.

Leave a Reply Cancel reply

Related Posts

Why Is Sustainability Thinking Necessary in Serverless? – Sustainability in Serverless

Data lifecycle 2 – Sustainability in Serverless

Implementation Patterns and Best Practices for Sustainability – Sustainability in Serverless