AWS-Redshift Best practices

-----------------------------------------------------------------------------------------------------------------------------------------

     Which in turn affects query performance by reducing the number of I/O operations and minimizing the memory required to process queries.

   

 Query optimizer redistributes the rows to the compute nodes as needed to perform any joins and aggregations. Designate both the dimension table's primary key and the fact table's corresponding foreign key as the DISTKEY.

Types of Distribution Styles

KEY Distribution Style :Distributes table rows based on the values in one specified column, the distribution key (DISTKEY). 

CREATE TABLE orders (

    order_id INT,

    customer_id INT,

    order_date DATE,

    amount DECIMAL(10, 2)

)

DISTKEY (customer_id);er_id); 

EVEN Distribution Style :Distributes table rows evenly across all nodes in the cluster. 

CREATE TABLE table_name (

    column1 data_type,

    column2 data_type,

    ...

)

DISTSTYLE EVEN; 

ALL Distribution Style :Distributes a full copy of the table to every node. 

         Dimension tables to use ALL distribution.

DISTSTYLE ALL is a table distribution style that stores a full copy of the entire table on each node in the Redshift cluster. 

When to Use DISTSTYLE ALL

Disadvantages

Advantages

Here’s an example of creating a small dimension table with DISTSTYLE ALL: 

CREATE TABLE products (

    product_id INT,

    product_name VARCHAR(100),

    category VARCHAR(50)

) DISTSTYLE ALL; 

          Amazon Redshift choose the appropriate distribution style, specify AUTO for the distribution style.

A SORTKEY (sort key) is a feature used to optimize the performance of queries by defining how the data in a table is sorted. Amazon Redshift stores your data on disk in sorted order according to the sort key, which can significantly speed up query execution, especially for large tables. 

There are two types of sort keys: compound and interleaved. 

Compound Sort Key

Interleaved Sort Key

Compound Sort Key Example

CREATE TABLE sales (

    sale_id INT,

    sale_date DATE,

    customer_id INT,

    amount DECIMAL(10, 2)

) COMPOUND SORTKEY (sale_date, customer_id); 


Interleaved Sort Key Example


CREATE TABLE sales (

    sale_id INT,

    sale_date DATE,

    customer_id INT,

    amount DECIMAL(10, 2)

) INTERLEAVED SORTKEY (sale_date, customer_id); 

Best Practices



CREATE TABLE region (

  R_REGIONKEY bigint NOT NULL,

  R_NAME varchar(25),

  R_COMMENT varchar(152))

diststyle all;


CREATE TABLE nation (

  N_NATIONKEY bigint NOT NULL,

  N_NAME varchar(25),

  N_REGIONKEY bigint,

  N_COMMENT varchar(152))

diststyle all;


create table customer (

  C_CUSTKEY bigint NOT NULL,

  C_NAME varchar(25),

  C_ADDRESS varchar(40),

  C_NATIONKEY bigint,

  C_PHONE varchar(15),

  C_ACCTBAL decimal(18,4),

  C_MKTSEGMENT varchar(10),

  C_COMMENT varchar(117))

diststyle all;


create table orders (

  O_ORDERKEY bigint NOT NULL,

  O_CUSTKEY bigint,

  O_ORDERSTATUS varchar(1),

  O_TOTALPRICE decimal(18,4),

  O_ORDERDATE Date,

  O_ORDERPRIORITY varchar(15),

  O_CLERK varchar(15),

  O_SHIPPRIORITY Integer,

  O_COMMENT varchar(79))

distkey (O_ORDERKEY)

sortkey (O_ORDERDATE);


create table part (

  P_PARTKEY bigint NOT NULL,

  P_NAME varchar(55),

  P_MFGR  varchar(25),

  P_BRAND varchar(10),

  P_TYPE varchar(25),

  P_SIZE integer,

  P_CONTAINER varchar(10),

  P_RETAILPRICE decimal(18,4),

  P_COMMENT varchar(23))

diststyle all;


create table supplier (

  S_SUPPKEY bigint NOT NULL,

  S_NAME varchar(25),

  S_ADDRESS varchar(40),

  S_NATIONKEY bigint,

  S_PHONE varchar(15),

  S_ACCTBAL decimal(18,4),

  S_COMMENT varchar(101))

diststyle all;                                                              


create table lineitem (

  L_ORDERKEY bigint NOT NULL,

  L_PARTKEY bigint,

  L_SUPPKEY bigint,

  L_LINENUMBER integer NOT NULL,

  L_QUANTITY decimal(18,4),

  L_EXTENDEDPRICE decimal(18,4),

  L_DISCOUNT decimal(18,4),

  L_TAX decimal(18,4),

  L_RETURNFLAG varchar(1),

  L_LINESTATUS varchar(1),

  L_SHIPDATE date,

  L_COMMITDATE date,

  L_RECEIPTDATE date,

  L_SHIPINSTRUCT varchar(25),

  L_SHIPMODE varchar(10),

  L_COMMENT varchar(44))

distkey (L_ORDERKEY)

sortkey (L_RECEIPTDATE);


create table partsupp (

  PS_PARTKEY bigint NOT NULL,

  PS_SUPPKEY bigint NOT NULL,

  PS_AVAILQTY integer,

  PS_SUPPLYCOST decimal(18,4),

  PS_COMMENT varchar(199))

diststyle even;

Loading Data

https://catalog.us-east-1.prod.workshops.aws/event/dashboard/en-US/workshop/gettingstarted/lab2#loading-data

COPY region FROM 's3://redshift-immersionday-labs/data/region/region.tbl.lzo'

iam_role default

region 'us-west-2' lzop delimiter '|' COMPUPDATE PRESET;


Note: A few key takeaways from the above COPY statements.