Richard C Valente 9 February ,2021
Amazon offers several solutions for data storage each with their own trade-offs and use cases. The three storage types we will be focusing on are Redshift, Aurora and Athena. Although each of these services allow for SQL to be used on the dataset, the underlying infrastructure is very different. We will discuss these services in terms of a few key metrics: response time, scaling and cost.
Amazon Redshift has an underlying infrastructure that integrates with Amazon S3. The underlying data is stored in an S3 instance, this is paired with dedicated high performance systems. This combination leads to a highly optimized and minimally constricted configuration that has data that can be replicated and deployed in an unlimited amount of zones and scaled with up to 128 processing nodes to create one of the quickest data access schemas available today.
Redshift is useful for projects that have high traffic workloads, a large data volume and need low latency. It is also the most expensive of the three solutions.
Response Time: 1st place
less than 1 second
Scaling: 1st place
Petabytes
Cost: 3rd place
Minimum $180/month
Amazon Aurora is essentially a customized Postgres integration. It is a SQL server that runs on a type of EC2 server instance. The use cases for SQL servers are more limited than the more recent NoSQL services that don’t have as strict of restrictions as SQL. The server can be assigned more powerful EC2 servers depending on needs and can be replicated up to 15 times to support different region low latency access.
Aurora is useful for projects that have high traffic workloads, a low data volume and need low latency. It is an order of magnitude cheaper than Redshift but still two orders of magnitude more expensive than Athena
Response Time: 2nd place
less than 1 second (small queries)
Scaling: 3rd place
terabytes
Cost: 3rd place
Minimum $15/month
Amazon Athena is a serverless technology that is similar to Redshift in terms of utilizing S3 as the underlying main storage medium. The main difference is Athena does not keep a dedicated server. Instead Athena is given the format of the data in the S3 bucket in advance and assumes a specific order to the data when it runs a query. This has the advantage of scaling to large dataset sizes but is limited by its higher latency. This is due to its open-ended storage formats and the lack of dedicated servers.
Athena is useful for projects that have low traffic workloads, a high data volume and can handle higher latency. It is magnitudes cheaper for infrequently changed and updated datasets.
Response Time: 3rd place
multiple seconds
Scaling: 3rd place
Petabytes
Cost: 3rd place
Minimum $0.05/month
(scales depending on usage)
When choosing between these solutions it's important to understand the long-term use cases of your data and application. This includes assessing major factors such as cost, scaling and latency.