Monday, September 14, 2020

How I reduced AWS expenses by 95%

This a short tale of how I cut 95% of my AWS expenses by using CloudTrail and Athena to find and fix the cause for the high costs.
My case is unique. So is yours. This is not general advice on how to cut costs.

It started when I accessed My Billing Dashboard and looked at the Monthly costs by service report. I recently added functionality to my service and wanted to see the affect of adding some resources had to my monthly bill.
It has been a while since I reviewed the monthly bill.
I noticed the cost of one service has increased exponentially and so did my monthly cost.

It was the Key Management Service (kms).



I didn't remember using this service explicitly anywhere in my code.
I started searching CloudTrail to see who, or what, was using this service.



This provided a lot of information, but is not easy to analyze. This is why CloudTrail provides an option to create an Athena table, as you can see in the previous image on the top right corner.
Athena is a query service that makes it easy to analyze data using standard SQL. Very useful tool.
I started grouping and counting the events in Athena and found out that most of the kms Decrypt events were done by my QuickSight user.
Amazon QuickSight is the AWS Business Intelligence service. I use it to create and share business reports.
This query shows which S3 files were decrypted by kms and how many times they were decrypted by day since August 1st: 

SELECT
  count(*) as cnt,
  json_extract(requestparameters, '$.encryptionContext["aws:s3:arn"]') AS s3arn,
  month(from_iso8601_timestamp(eventtime)) as m,
  day(from_iso8601_timestamp(eventtime)) as d
FROM
  default.<cloudtrail events table name>
WHERE
  eventsource = 'kms.amazonaws.com'
AND
  eventname = 'Decrypt'
AND
  json_extract(requestparameters, '$.encryptionContext["aws:s3:arn"]') is not null
AND
  eventtime > '2020-08-01T00:00:00Z'
GROUP BY
  json_extract(requestparameters, '$.encryptionContext["aws:s3:arn"]'),
  month(from_iso8601_timestamp(eventtime)),
  day(from_iso8601_timestamp(eventtime))
ORDER BY m DESC, d DESC

In my case a scheduled task fetches events from event sources every 30 minutes and stores them in S3. Every time it fetches events the new event records are stored in a new file in S3.
Athena is set on top of the S3 files and QuickSight is set on top of Athena.

I forgot that when this was set up I used kms to encrypt the data for security reasons.

kms pricing, at the time of writing, is $0.03 per 10,000 requests.
Every file that needs to be decrypted counts as one request.

My solution to reducing the costs was to merge event files that happened on the same day into one daily file. I merged event files for past days such that each day now holds a single file instead of 48 files.
You keep our Athena partitioning in place and you don't lose data in any way with the merge process. All the events are still there.
In addition, the data showed me that I had many files in all of my envs - dev, CI, staging and prod, while I only want to keep historical files for prod. I changed the S3 retention so that prod stores event records for a year while the other envs store files for a week.

Merging files, and expiring unneeded files had a dramatic effect on kms costs, reducing them by 95%. From over 100$ daily cost for kms the cost was reduced to under 5$ daily.
This saves 35,000$ yearly.

The image below shows the effect of these actions on my kms cost.
On September 4th I started merging files and on September 10th I removed unneeded files and updated the file retention rules.