Elias Andualem
A research project that aims to create an application that takes Amharic sentences in text or audio format. The application will initially transcribe the audio data into Amharic text if the input is an audio signal. The transcribed sentences are then translated into Ethiopian sign language native grammar, and the application displays the final result as a graphical animation. The project was carried out by a team of four people.
Collect and preprocess training data.
Build an application that displays the translated sentences in graphics animation.
Deploy Grammer translation and Speech recognition models.
A pre-trained model for word Lemmatization.
A speech recognition system.
A grammar-translation system, from Amharic to Ethiopian sign language
Python, C#, TensorFlow, Keras, Flask, UnityGameEngine and Blender.
The project explores the impact of COVID19 on people’s livelihoods via a dashboard using Twitter data. The system provides an insight into the impact COVID19 had on people’s livelihoods and aid in understanding people’s knowledge, attitude, and perceptions towards COVID 19.
Analysis of covid data scraped from Twitter following the CRISP-DM methodology.
Topic modeling
Sentiment Analysis
Python, MySQL, NLP libraries, StreamLit, Docker and Heroku
Deep Learning has changed the game when it comes to voice recognition by introducing end-to-end models. These models take in an audio signal and directly output transcriptions. This project was carried out as a team, and we built an automatic end-to-end speech recognition pipeline for Amharic.
Audio data preprocessing like resampling, normalization, and removing noise.
Perform data augmentation by adding noise, changing the speed and pitch of the audio.
Implemented audio data feature extraction method using log Mel spectrogram.
Build a model with two main neural network modules. Three layers of Residual Convolutional Neural Networks to learn the relevant audio features. Then a set of Bidirectional Recurrent Neural Networks to leverage the learned audio features.
Python, TensorFlow, Keras, NLP libraries, DVC, MLflow, and StreamLit
Took part in a Kaggle analytics competition to determine the state of digital learning in 2020. In this analytics competition, I collaborated with a team of three people to identify digital learning trends. To that end, we examined how engagement and digital learning relate to factors such as district demography, broadband access, and state/national policies and events.
Applied Data wrangling techniques on telecommunication data.
Performed exploratory data analysis.
Conducted univariate and multivariate graphical analyses using Seaborn and Plotly
Applied time series based clustering to find out which communities responded similarly using Tslearn library.
Analyzed the impact of school closure on states with the most and least pct_black/Hispanic; to check if covid 19 has disproportionately impacted student engagement with online learning platforms in areas where there are more Black or Hispanic Students using CausalImpact library.
Python, Pandas, Seaborn, Plotly, Tslearn, and CausalImpact
Causal inference is a technique for determining whether or not a causal explanation is correct. It works by controlling for confounding variables. It provides us with a better understanding of the causes and impacts, allowing for more informed decisions.
Applied Data wrangling techniques in Breast Cancer Wisconsin (Diagnostic) data available on Kaggle.
Exploratory analysis was carried out to observe features that have a higher correlation with the diagnosis.
Performed feature extraction and scaling
Perform feature descritization using a supervised learning approach.
Used CausalNex to develop Bayesian Network models, that go beyond correlation and consider causal relationships.
Trained a Logistic Regression model with the entire dataset and features.
Python, CausalNex, and Scikitlearn
In any business, there is a strong desire to analyze performance and predict future sales. By collecting historical data related to previous sales, businesses can analyze their performance and predict their future. This is important to deliver the best customer experience and avoid losses, thus ensuring that the business is sustainable for operation.
Data exploration using Pandas, Matplotlib, Numpy. Modular code.
Creation of new features.
Applied Random Forest model that takes multiple variables as an input and predicts sales
Applied an LSTM recurrent neural network that takes six weeks of historical sales data and makes predictions for future sales.
Calculated Feature Importance to see which variables are mainly responsible for affecting the number of Sales and Customers.
Deployed TensorFlow model in a production environment with StreamLit dashboard.
python, Scikitlearn, Tensorflow, DVC, MLflow, StreamLit, and Docker
A/B testing is a user experience research methodology. A/B testing allows comparing two or more versions of a given service against each other to find out which variation performs better.
Invariant metrics: Used this to ensure that the experiment (the way we presented a change to a part of the population) is not inherently wrong. E.g. Number of users in both groups.
Evaluation metrics: metrics we expect to change and are relevant to the goals we aim to achieve egg (brand awareness) Hypothesis testing for A/B testing.
We use hypothesis testing to test the two hypotheses:
Null Hypothesis: There is no difference in brand awareness between the exposed and control groups in the current case.
Alternative Hypothesis: There is a difference in brand awareness between the exposed and control groups in the current case.
Carried out 3 types of classification analysis to predict whether a user responds yes to brand awareness, namely: Logistic Regression Decision Trees XGboost, then compared the different classification models to assess the best performing one(s).
Python, Scikitlearn, XGBoost, SciPy, DVC, MLflow, StreamLit, Docker, and Heroku
PythonLidar is a python package for fetching, manipulating, and visualizing point cloud data. The package accepts boundary polygons in Pandas data frame and returns a python dictionary with all years of data available and a geopandas grid point file with elevations encoded in the requested CRS.
Download point cloud data from the EPT resource on AWS cloud storage.
Terrain visualization
Data transformation
Python, PDAL, Laspy, Geopandas, Pydocs, and Heroku
Prior to investing in a new company, a thorough analysis of the data behind the company, and above all; the identification of opportunities to boost profitability is essential. The main goal of this project is to analyze Telco's data to determine whether it is worth buying or selling.
Applied Data wrangling techniques on telecommunication data.
Exploratory analysis was carried out to observe customer behavior in the telecommunication industry.
Perform dimensionality reduction using PCA.
Performed self-explanatory visualizations using tools such as plotly, seaborn, and matplotlib to get rich insights to improve customer experience and reduce the churn rate.
Metrics like Experience Analysis, Satisfaction Analysis, Engagement Analysis were computed.
Provided comprehensive report on the analysis to management for decision making.
Perform clustering of the customers using the k-means clustering algorithm.
Python, Scikitlearn, SciPy, StreamLit, Docker, and Heroku