Basic SQL
Some basic SQL commands (mostly in sqldf syntax which is an R package that allows you to write SQL queries on data frames)
1) GUI and clients
SequelPro and PgAdmin
Install PostgreSQL from link
2) Distinct elements
sqldf("select distinct(container_type) from imports_customs_short")
3) Group by
df = sqldf("select sum(fob_amt_aud) as sum_fob_amt_aud, intended_export_date from exports_customs_short where transport_mode_type_code = 'S' group by intended_export_date")
4) Order by (after group by)
df = sqldf("select sum(fob_amt_aud) as sum_fob_amt_aud, intended_export_date from exports_customs_short where transport_mode_type_code = 'S' and loading_port_code = 'AUSYD' and cargo_id_type_code = 'N' group by intended_export_date order by intended_export_date")
5) Inner join operation
sqldf("select sum(exports_quarterly_raw.Weight) as sum_weight, exports_quarterly_raw.code2, lookup_code.description from exports_quarterly_raw inner join lookup_code on exports_quarterly_raw.code2 = lookup_code.tariff_code and exports_quarterly_raw.POrt_of_loading in ('Boston') and exports_quarterly_raw.mode_of_transport = 'SEA' group by exports_quarterly_raw.code2 order by sum_weight desc")
# Sum over some variable
# Join to determine description from a lookup table
6) Min, max
sqldf("select max(gross_weight), min(gross_weight) from imports_customs_short")
7) More group by dates in R
# Recast datetime columns as R Date object (makes SQL queries easier)
imports_customs_short$search_arrival_date <- (as.Date(imports_customs_short$search_arrival_date, "%d/%m/%y"))
df= sqldf("select sum(gross_weight) as sum_gross_weight, search_arrival_date from imports_customs_short group by search_arrival_date order by search_arrival_date ")
8) Case statement in SQL
sqldf("select
case
when gross_weight_unit_code = 'T' then gross_weight * 1000
when gross_weight_unit_code = 'KG' then gross_weight
when gross_weight_unit_code = 'G' then gross_weight / 1000
end
as fob_amt_aud
from df_short_cont ")
9) bool_and bool_or for any opeartion that does any or does all (link)
SELECT category
, bool_or(express_delivery) as ever_expressed
FROM orders
GROUP BY category
10) PostGres commands from command line (courtesy Claudio Aracena)
# Connecting from the PostGres SQL terminal client
psql -h <server_url> -p 5432 -d <database_name> -U <username>
# List all tables
\dt
# Select command
select * from traffic_times_friday_58_data limit 10;
# Copy result of query into csv file
\copy traffic_times_friday_58_data TO 'traffic_times_data_friday.csv' DELIMITER ',' CSV HEADER;
11) List of PostgreSQL commands and documentation (link)
12) CASE and IF statements in SQL (link)
SELECT *,
CASE
WHEN drug_table.drug in ('Paracetamol','paracetamol') THEN 1
ELSE 0
END
AS flag_paracetamol
FROM drug_table