Projects

Birhanu Gebisa Muleta

Highlights

Build a data engineering pipeline that allows recording Amharic and Swahili speakers to read digital texts on in-app and web platforms. An end-to-end ETL data pipeline that uses Apache Kafka, Apache Spark, and Apache Airflow in order to receive user voice audio files, transform them, and load them into a data warehouse that will later be used for text-to-speech conversion machine learning projects.

Approach

  • Building a data pipeline can be challenging, especially when you have to take into account portability, flexibility, scalability, etc.

  • Apache Kafka, Airflow, and Spark are open-source task management platforms that help companies create seamlessly functioning workflows to organize, execute, and monitor their tasks.

  • To overcome these challenges, Docker is one of the well-known solutions. In this article, we are going to talk about building a data pipeline via a docker-compose file.

  • Finally, the list of cleaned audio text pairs gets stored in an Amazon s3 bucket.

  • You can find the implementation of this project here



Design and build a robust, reliable, large-scale trading data pipeline for both crypto and stock market trading that can run various backtests and store various useful artifacts in a robust data warehouse system. Users are prompted with several different stock and crypto trading options and parameters.

APPROACH

  • Building a data pipeline could be challenging, especially when you have to take into account portability, flexibility, scalability, etc. Apache Kafka, Airflow, and docker are open-source task management platforms that help companies create seamlessly functioning workflows to organize, execute and monitor their tasks.

  • We concentrated on the data engineering and machine learning tracks this week in order to scale the backtesting infrastructure. We implemented the ml flow to track the model and built the LSTM model for the trade price prediction.

  • Furthermore, we built the data pipeline, the machine learning flow, and the machine learning model using a variety of tools, including Kafka, Airflow, and Docker Compose, with backtesting to improve the effectiveness of the tests.