Basic SQL

Some basic SQL commands (mostly in sqldf syntax which is an R package that allows you to write SQL queries on data frames)

1) GUI and clients

SequelPro and PgAdmin

Install PostgreSQL from link

2) Distinct elements

sqldf("select distinct(container_type) from imports_customs_short")

3) Group by

df = sqldf("select sum(fob_amt_aud) as sum_fob_amt_aud, intended_export_date from exports_customs_short where transport_mode_type_code = 'S' group by intended_export_date")

4) Order by (after group by)

df = sqldf("select sum(fob_amt_aud) as sum_fob_amt_aud, intended_export_date from exports_customs_short where transport_mode_type_code = 'S' and loading_port_code = 'AUSYD' and cargo_id_type_code = 'N' group by intended_export_date order by intended_export_date")

5) Inner join operation

sqldf("select sum(exports_quarterly_raw.Weight) as sum_weight, exports_quarterly_raw.code2, lookup_code.description from exports_quarterly_raw inner join lookup_code on exports_quarterly_raw.code2 = lookup_code.tariff_code and exports_quarterly_raw.POrt_of_loading in ('Boston') and exports_quarterly_raw.mode_of_transport = 'SEA' group by exports_quarterly_raw.code2 order by sum_weight desc")

# Sum over some variable

# Join to determine description from a lookup table

6) Min, max

sqldf("select max(gross_weight), min(gross_weight) from imports_customs_short")

7) More group by dates in R

# Recast datetime columns as R Date object (makes SQL queries easier)

imports_customs_short$search_arrival_date <- (as.Date(imports_customs_short$search_arrival_date, "%d/%m/%y"))

df= sqldf("select sum(gross_weight) as sum_gross_weight, search_arrival_date from imports_customs_short group by search_arrival_date order by search_arrival_date ")

8) Case statement in SQL

sqldf("select

case

when gross_weight_unit_code = 'T' then gross_weight * 1000

when gross_weight_unit_code = 'KG' then gross_weight

when gross_weight_unit_code = 'G' then gross_weight / 1000

end

as fob_amt_aud

from df_short_cont ")

9) bool_and bool_or for any opeartion that does any or does all (link)

SELECT category

, bool_or(express_delivery) as ever_expressed

FROM orders

GROUP BY category

10) PostGres commands from command line (courtesy Claudio Aracena)

# Connecting from the PostGres SQL terminal client

psql -h <server_url> -p 5432 -d <database_name> -U <username>

# List all tables

\dt

# Select command

select * from traffic_times_friday_58_data limit 10;

# Copy result of query into csv file

\copy traffic_times_friday_58_data TO 'traffic_times_data_friday.csv' DELIMITER ',' CSV HEADER;

11) List of PostgreSQL commands and documentation (link)

12) CASE and IF statements in SQL (link)

SELECT *,

CASE

WHEN drug_table.drug in ('Paracetamol','paracetamol') THEN 1

ELSE 0

END

AS flag_paracetamol

FROM drug_table