Biruk Getaneh Gezmu

Project Highlights

Data warehouse Tech Stack with Postgres, dbt, and Airflow

In this project I have done an ELT pipeline which involves the process starting from the data extraction to presentation. The workflow of this pipeline starts by reading the raw CSV data from the source into Postgress database, transform it using dbt, and present the output using Redash. In the meantime, Airflow was used to orchestrate the workflow and schedule a daily job to sync the data from the source to the Postgres data warehouse.


Speech-to-Text Data Collection

In this group project, me and my team created a web app that have the ability to receive audio records of a given text displayed to the users on our front end application. From another project we have clearly noticed that the amount of data was a crucial factor behind the effectiveness of deep learning models. Therefore, in these project we have done a data collection system by integrating the three Apache tools, Kafka, Airflow, and Spark. Kafka is used as a broker, Airflow as our event listener and initiator, and Spark to do the data transformation and cleaning.

Scalable Data Migration from PostgreSQL to MySQL Database

In this project I have done a data migration from PostgreSQL to MySQL database. The result data is to explore, query, and visualized using Superset. Airflow takes care of task scheduling and workflow management.


Advertisement Data Analysis

In this project, I used the data that was registered at different steps of the creative creation and ad placement process to perform a data engineering process and a machine learning prediction. The data elements coming from different steps were linked accordingly. After ingesting the data into a data lake, I have modeled and merged the data to a single unit in the data warehouse and expose the the interface for the machine learning task.