ETL Pipeline with S3, RDS & Python

An ETL pipeline implemented using PostgreSQL on AWS RDS, S3 and Python. This project demonstrates how to create a database, define tables, and populate them programmatically using Python, showcasing my proficiency in cloud services and database management."

CORE FEATURES

Cloud-hosted PostgreSQL database setup on AWS RDS.
Python-based automation for creating and managing database tables.
Database schema design for structured data storage.

TECHNOLOGIES USED

Python, PostgreSQL, AWS RDS, AWS S3

PROJECT WORKFLOW

Creating or acquiring sample dataset.
Setting Up S3 buckts.
Setting Up AWS RDS.
Writing Python Scripts.
- upload_to_s3.py
- create_table.py
- transform_data.py
- load_to_postgres.py
Quering the filtered dataset in Database.

CODE SNIPPETS

create_table_query = """

CREATE TABLE IF NOT EXISTS sales_data (

order_id SERIAL PRIMARY KEY,

order_date DATE,

customer_name TEXT,

price numeric

);

import boto3

# Create an S3 client

s3 = boto3.client('s3')

# Parameters

bucket_name = 'etl-project-bucket-final'

s3_file_name = 'sales_data.csv'

download_path = 'downloaded_sales_data.csv'

# Download the file

s3.download_file(bucket_name, s3_file_name, download_path)

print('File downloaded successfully')

SCREENSHOT

FULL PROJECT

The full project and the full project on Github can be found here

Page updated

Google Sites

Report abuse