An ETL pipeline implemented using PostgreSQL on AWS RDS, S3 and Python. This project demonstrates how to create a database, define tables, and populate them programmatically using Python, showcasing my proficiency in cloud services and database management."
CORE FEATURES
Cloud-hosted PostgreSQL database setup on AWS RDS.
Python-based automation for creating and managing database tables.
Database schema design for structured data storage.
TECHNOLOGIES USED
Python, PostgreSQL, AWS RDS, AWS S3
PROJECT WORKFLOW
Creating or acquiring sample dataset.
Setting Up S3 buckts.
Setting Up AWS RDS.
Writing Python Scripts.
upload_to_s3.py
create_table.py
transform_data.py
load_to_postgres.py
Quering the filtered dataset in Database.
CODE SNIPPETS
create_table_query = """
CREATE TABLE IF NOT EXISTS sales_data (
order_id SERIAL PRIMARY KEY,
order_date DATE,
customer_name TEXT,
price numeric
);
import boto3
# Create an S3 client
s3 = boto3.client('s3')
# Parameters
bucket_name = 'etl-project-bucket-final'
s3_file_name = 'sales_data.csv'
download_path = 'downloaded_sales_data.csv'
# Download the file
s3.download_file(bucket_name, s3_file_name, download_path)
print('File downloaded successfully')
SCREENSHOT
FULL PROJECT
The full project and the full project on Github can be found here