AWS-Redshift Performance Tuning
Performance Tuning Techniques for Amazon Redshift
Amazon Redshift uses a variety of innovations to achieve up to ten times better performance than traditional databases for data warehousing and analytics workloads, they include the following:
Amazon Redshift uses a variety of innovations to achieve up to ten times better performance than traditional databases for data warehousing and analytics workloads, they include the following:
Columnar Data Storage: Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. Unlike row-based systems, which are ideal for transaction processing, column-based systems are ideal for data warehousing and analytics, where queries often involve aggregates performed over large data sets. Since only the columns involved in the queries are processed and columnar data is stored sequentially on the storage media, column-based systems require far fewer I/Os, greatly improving query performance.
Advanced Compression: Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk. Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational data stores. When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.
Massively Parallel Processing (MPP): Amazon Redshift automatically distributes data and query load across all nodes. Amazon Redshift makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.
Redshift Spectrum: Redshift Spectrum enables you to run queries against exabytes of data in Amazon S3. There is no loading or ETL required. Even if you don’t store any of your data in Amazon Redshift, you can still use Redshift Spectrum to query datasets as large as an exabyte in Amazon S3. When you issue a query, it goes to the Amazon Redshift SQL endpoint, which generates the query plan. Amazon Redshift determines what data is local and what is in Amazon S3, generates a plan to minimize the amount of Amazon S3 data that needs to be read, requests Redshift Spectrum workers out of a shared resource pool to read and process data from Amazon S3, and pulls results back into your Amazon Redshift cluster for any remaining processing.
Materialized views: Materialized views provide significantly faster query performance for repeated and predictable analytical workloads such as dashboards, queries from business intelligence (BI) tools, and ELT (Extract, Load, Transform) data processing. Using materialized views, you can store the pre-computed results of queries and efficiently maintain them by incrementally processing the latest changes made to the source tables. Subsequent queries referencing the materialized views use the pre-computed results to run much faster, and automatic refresh and query rewrite capabilities to simplify and automate the usage of materialized views. Materialized views can be created based on one or more source tables using filters, projections, inner joins, aggregations, grouping, functions, and other SQL constructs.
Scalability: The compute and storage capacity of on-premises data warehouses are limited by the constraints of the on-premises hardware. Redshift gives you the ability to scale compute and storage as needed to meet changing workloads.
Automatic Table Optimization (ATO) is a self-tuning capability helps you achieve the performance benefits of sort and distribution keys without manual effort. ATO continuously observes how queries interact with tables, and uses machine learning to select the best sort and distribution keys to optimize performance for the cluster’s workload. If Redshift determines that applying a key will improve cluster performance, tables will be automatically altered within hours without requiring administrator intervention. Optimizations made by the ATO feature have shown to increase cluster performance by 24% and 34% using the 3TB and 30TB TPC-DS benchmark, respectively, versus a cluster without ATO. Additional features like Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze eliminate the need for manual maintenance and tuning of Redshift clusters to get the best performance for new clusters and production workloads.
Amazon Redshift Advisor develops customized recommendations to increase performance and optimize costs by analyzing your workload and usage metrics for your cluster. Sign in to the Amazon Redshift console to view Advisor recommendations. For more information, see Working with recommendations from Amazon Redshift Advisor.