Biruk Getaneh Gezmu
Biruk Getaneh Gezmu
Addis Ababa, Ethiopia
About me
As a Data Engineer and Analyst with an M.Sc. in Computer Science, I possess expertise in SQL, data preprocessing, transformation, visualization, analytics, feature engineering, and modeling. I have experience building fault-tolerant, distributed, and scalable end-to-end data pipelines, and am committed to delivering effective solutions that help organizations leverage their data to drive meaningful insights and achieve their strategic goals.
Programming Languages
Python
JavaScript
Java
Scala
Automation Tools
GitHub Actions
Docker
MLflow, DVC, CML
Unit-testing
Data Engineering Tools
SQL
Python
Fivetran
dbt
Spark
Airflow
Snowflake
PostgreSQL
Salesforce Marketing Cloud
Data Analytics Tools
Microsoft Excel
Looker
Microsoft Power BI
Tableau
Spark
Machine Learning Tools
Matplotlib, Seaborn, Plotly
Scikit-learn
CNN
Tensorflow
Work Experience
BIG Data Analytics Engineer
Sep 2022 - Dec 2023 | Safaricom, Ethiopia
Leading extensive data management at Safaricom, a major African telecom company, during its strategic expansion into Ethiopia. I worked on overcoming data volume challenges, designing and implementing robust ETL pipelines, and ensuring a seamless process from extracting data from the datalake to efficiently loading it into the warehouse. Employing Apache Airflow for workflow orchestration, I leveraged Spark (Scala) for diverse data transformations, thereby enhancing analytical insights. Additionally, I utilized Superset for crafting insightful reports and dynamic dashboards to facilitate informed decision-making.
Data Engineer
Jan 2023 - Apr 2024 | Exxon Mobil, US
In this remote role, I actively engaged in a customer relationship management initiative leveraging Salesforce Marketing Cloud and Snowflake Datawarehouse. My core responsibilities encompased data processing, analysis, automated workflow development, scheduled email sends & tracking, and ensuring seamless data processes. I collaborated with cross-functional teams and support data-driven decision-making while adhering to industry best practices.
Junior Data Engineer
Jun 2022 - Oct 2022 | 10 Academy, Ethiopia
Worked on diverse Machine Learning, Data Engineering, and Web3 Engineering projects.
Mentored by industry professionals to grasp state-of-the-art tools and processes.
Completed 11 impactful projects, demonstrating practical problem-solving skills.
Collaborated effectively with cross-functional teams on multidisciplinary initiatives and gained hands-on experience in the dynamic landscape of cutting-edge technologies .
Employing deep convolutional neural networks to classify plant leaf diseases.
Data Scientist
Oct 2021 - Jan 2022 | EthioAI, Ethiopia
I have contributed to projects aimed at addressing different community issues. Specifically, I have worked on the following two projects:
Utilizing deep learning techniques to develop an Amharic speech synthesis project.
Employing deep convolutional neural networks to classify plant leaf diseases.
Lecturer
Nov 2015 - Sep 2022 | Haramaya University, Ethiopia
Four years of IT teaching, fostering interactive learning, guiding students in innovative research projects, advising and evaluating graduating students’ practical projects for industry collaboration, leading external training sessions, and showcasing strong organizational skills.
Courses taught includes: Fundamental and Advanced Database Systems, Data Structure and Algorithm, Fundamental and Advanced Programming (with Java), Network Design and Configuration
Education
10 Academy (May 2022 - July 2022)
Machine Learning, Data Engineering, and Web3 Engineering Training
Key Courses:
Designing and building data pipelines (ELT and ETL)
Building Machine Models and deployment
MLOps and CI/CD
Dashboard and data visualization
Community building and career thinking
Technical writing and blogging
Haramaya University (Oct 2019 - Oct 2021)
M.Sc. in Computer Science
Key Courses:
Artificial Intelligence,
Natural Language Processing,
Image Processing,
Big Data,
Algorithm Analysis and Design,
Research Methods
Debre Berhan University (Nov 2011 - Jul 2015)
B.Sc. In Information Technology
Key Courses:
Applied and Discrete Mathematics,
Probability and Statistics,
Fundamentals and Advanced Programming,
Object Oriented Programming,
Data Structure and Algorithm,
Fundamentals and Advanced Database
Projects
Data warehouse Tech Stack with Postgres, dbt, and Airflow
In this project I have done an ELT pipeline which involves the process starting from the data extraction to presentation. The workflow of this pipeline starts by reading the raw CSV data from the source into Postgress database, transform it using dbt, and present the output using Redash. In the meantime, Airflow was used to orchestrate the workflow and schedule a daily job to sync the data from the source to the Postgres data warehouse.
Speech-to-Text Data Collection
In this group project, me and my team created a web app that have the ability to receive audio records of a given text displayed to the users on our front end application. From another project we have clearly noticed that the amount of data was a crucial factor behind the effectiveness of deep learning models. Therefore, in these project we have done a data collection system by integrating the three Apache tools, Kafka, Airflow, and Spark. Kafka is used as a broker, Airflow as our event listener and initiator, and Spark to do the data transformation and cleaning.
Scalable Data Migration from PostgreSQL to MySQL Database
In this project I have done a data migration from PostgreSQL to MySQL database. The result data is explored, queried, and visualized using Apache Superset. Airflow takes care of task scheduling and workflow management.
Advertisement Data Analysis
In this project, I used the data that was registered at different steps of the creative creation and ad placement process to perform a data engineering process and a machine learning prediction. The data elements coming from different steps were linked accordingly. After ingesting the data into a data lake, I have modeled and merged the data to a single unit in the data warehouse and expose an interface for the machine learning task.