Projects

Elias Andualem

Highlights

Github Repository

Amharic to Ethiopia Sign language Translator

A research project that aims to create an application that takes Amharic sentences in text or audio format. The application will initially transcribe the audio data into Amharic text if the input is an audio signal. The transcribed sentences are then translated into Ethiopian sign language native grammar, and the application displays the final result as a graphical animation. The project was carried out by a team of four people.

AMHARIC TO ETHIOPIA SIGN LANGUAGE TRANSLATION

Method Used:

Collect and preprocess training data.
Build an application that displays the translated sentences in graphics animation.
Deploy Grammer translation and Speech recognition models.

Machine Learning:

A pre-trained model for word Lemmatization.
A speech recognition system.
A grammar-translation system, from Amharic to Ethiopian sign language

Technologies used:

Python, C#, TensorFlow, Keras, Flask, UnityGameEngine and Blender.

Medium Blog.

TWITTER COVID DATA ANALYSIS

The project explores the impact of COVID19 on people’s livelihoods via a dashboard using Twitter data. The system provides an insight into the impact COVID19 had on people’s livelihoods and aid in understanding people’s knowledge, attitude, and perceptions towards COVID 19.

TWITTER COVID DATA ANALYSIS

Preprocessing:

Analysis of covid data scraped from Twitter following the CRISP-DM methodology.

Machine Learning:

Topic modeling
Sentiment Analysis

Technologies used:

Python, MySQL, NLP libraries, StreamLit, Docker and Heroku

Medium Blog.

Amharic SPeech Recognition

Deep Learning has changed the game when it comes to voice recognition by introducing end-to-end models. These models take in an audio signal and directly output transcriptions. This project was carried out as a team, and we built an automatic end-to-end speech recognition pipeline for Amharic.

AMHARIC SPEECH RECOGNITION

Preprocessing:

Audio data preprocessing like resampling, normalization, and removing noise.
Perform data augmentation by adding noise, changing the speed and pitch of the audio.
Implemented audio data feature extraction method using log Mel spectrogram.

Machine Learning:

Build a model with two main neural network modules. Three layers of Residual Convolutional Neural Networks to learn the relevant audio features. Then a set of Bidirectional Recurrent Neural Networks to leverage the learned audio features.

Technologies used:

Python, TensorFlow, Keras, NLP libraries, DVC, MLflow, and StreamLit

Kaggle Kernel

Kagle Analytical Challenge

Took part in a Kaggle analytics competition to determine the state of digital learning in 2020. In this analytics competition, I collaborated with a team of three people to identify digital learning trends. To that end, we examined how engagement and digital learning relate to factors such as district demography, broadband access, and state/national policies and events.

COVID-19 IMPACT ON DIGITAL LEARNING

EDA:

Applied Data wrangling techniques on telecommunication data.
Performed exploratory data analysis.
Conducted univariate and multivariate graphical analyses using Seaborn and Plotly

Machine Learning:

Applied time series based clustering to find out which communities responded similarly using Tslearn library.
Analyzed the impact of school closure on states with the most and least pct_black/Hispanic; to check if covid 19 has disproportionately impacted student engagement with online learning platforms in areas where there are more Black or Hispanic Students using CausalImpact library.

Technologies used:

Python, Pandas, Seaborn, Plotly, Tslearn, and CausalImpact

Medium Blog.

Causal Inference

Causal inference is a technique for determining whether or not a causal explanation is correct. It works by controlling for confounding variables. It provides us with a better understanding of the causes and impacts, allowing for more informed decisions.

CAUSAL INFERENCE

EDA:

Applied Data wrangling techniques in Breast Cancer Wisconsin (Diagnostic) data available on Kaggle.
Exploratory analysis was carried out to observe features that have a higher correlation with the diagnosis.
Performed feature extraction and scaling

Machine Learning:

Perform feature descritization using a supervised learning approach.
Used CausalNex to develop Bayesian Network models, that go beyond correlation and consider causal relationships.
Trained a Logistic Regression model with the entire dataset and features.

Technologies used:

Python, CausalNex, and Scikitlearn

Github Repository

sales Forecasting

In any business, there is a strong desire to analyze performance and predict future sales. By collecting historical data related to previous sales, businesses can analyze their performance and predict their future. This is important to deliver the best customer experience and avoid losses, thus ensuring that the business is sustainable for operation.

SALES PREDICTION

EDA:

Data exploration using Pandas, Matplotlib, Numpy. Modular code.
Creation of new features.

Machine Learning:

Applied Random Forest model that takes multiple variables as an input and predicts sales
Applied an LSTM recurrent neural network that takes six weeks of historical sales data and makes predictions for future sales.
Calculated Feature Importance to see which variables are mainly responsible for affecting the number of Sales and Customers.
Deployed TensorFlow model in a production environment with StreamLit dashboard.

Technologies used:

python, Scikitlearn, Tensorflow, DVC, MLflow, StreamLit, and Docker

Github Repository

Smart Ad A/B Testing

A/B testing is a user experience research methodology. A/B testing allows comparing two or more versions of a given service against each other to find out which variation performs better.

A/B TESTING

Metric Choice:

Invariant metrics: Used this to ensure that the experiment (the way we presented a change to a part of the population) is not inherently wrong. E.g. Number of users in both groups.
Evaluation metrics: metrics we expect to change and are relevant to the goals we aim to achieve egg (brand awareness) Hypothesis testing for A/B testing.
We use hypothesis testing to test the two hypotheses:
- Null Hypothesis: There is no difference in brand awareness between the exposed and control groups in the current case.
- Alternative Hypothesis: There is a difference in brand awareness between the exposed and control groups in the current case.

Machine Learning:

Carried out 3 types of classification analysis to predict whether a user responds yes to brand awareness, namely: Logistic Regression Decision Trees XGboost, then compared the different classification models to assess the best performing one(s).

Technologies used:

Python, Scikitlearn, XGBoost, SciPy, DVC, MLflow, StreamLit, Docker, and Heroku

Github Repository

Pythonlidar

PythonLidar is a python package for fetching, manipulating, and visualizing point cloud data. The package accepts boundary polygons in Pandas data frame and returns a python dictionary with all years of data available and a geopandas grid point file with elevations encoded in the requested CRS.

PYTHONLIDAR

Features:

Download point cloud data from the EPT resource on AWS cloud storage.
Terrain visualization
Data transformation

Technologies used:

Python, PDAL, Laspy, Geopandas, Pydocs, and Heroku

Github Repository

Telecom Data Analysis

Prior to investing in a new company, a thorough analysis of the data behind the company, and above all; the identification of opportunities to boost profitability is essential. The main goal of this project is to analyze Telco's data to determine whether it is worth buying or selling.

TELECOM DATA ANALYSIS

EDA:

Applied Data wrangling techniques on telecommunication data.
Exploratory analysis was carried out to observe customer behavior in the telecommunication industry.
Perform dimensionality reduction using PCA.
Performed self-explanatory visualizations using tools such as plotly, seaborn, and matplotlib to get rich insights to improve customer experience and reduce the churn rate.
Metrics like Experience Analysis, Satisfaction Analysis, Engagement Analysis were computed.
Provided comprehensive report on the analysis to management for decision making.

Machine Learning:

Perform clustering of the customers using the k-means clustering algorithm.

Technologies used:

Python, Scikitlearn, SciPy, StreamLit, Docker, and Heroku

Page updated

Google Sites

Report abuse

Projects

Highlights

Amharic to Ethiopia Sign language Translator

AMHARIC TO ETHIOPIA SIGN LANGUAGE TRANSLATION

Method Used:

Machine Learning:

Technologies used:

TWITTER COVID DATA ANALYSIS

TWITTER COVID DATA ANALYSIS

Preprocessing:

Machine Learning:

Technologies used:

Amharic SPeech Recognition

AMHARIC SPEECH RECOGNITION

Preprocessing:

Machine Learning:

Technologies used:

Kagle Analytical Challenge

COVID-19 IMPACT ON DIGITAL LEARNING

EDA:

Machine Learning:

Technologies used:

Causal Inference

CAUSAL INFERENCE

EDA:

Machine Learning:

Technologies used:

sales Forecasting

SALES PREDICTION

EDA:

Machine Learning:

Technologies used:

Smart Ad A/B Testing

A/B TESTING

Metric Choice:

Machine Learning:

Technologies used:

Pythonlidar

PYTHONLIDAR

Features:

Technologies used:

Telecom Data Analysis

TELECOM DATA ANALYSIS

EDA:

Machine Learning:

Technologies used:

For more information on hiring from 10 Academy, contact team at 10academy.org