Euel Fantaye
Hypothesis testing is the cornerstone of evidence based decision making. The A/B testing framework is the most used statistical framework for making gradual but important changes in every aspect of today’s business.
A/B testing is a user experience research methodology. A/B tests consist of a randomized experiment with two variants, A and B. which are identical except for one variation that might affect a user's behavior. It includes the application of statistical hypothesis testing or "two-sample hypothesis testing" as used in the field of statistic
Invariant metrics-Used this to ensure that the experiment (the way we presented a change to a part of the population )is not inherently wrong. eg number of users in both groups
Evaluation metrics-metrics we expect to change and are relevant to the goals we aim to achieve eg (brand awareness) Hypothesis testing for A/B testing
We use hypothesis testing to test the two hypotheses: Null Hypothesis: There is no difference in brand awareness between the exposed and control groups in the current case. Alternative Hypothesis: There is a difference in brand awareness between the exposed and control groups in the current case.
Carried out 3 types of classification analysis to predict whether a user responds yes to brand awareness, namely: Logistic Regression Decision Trees XGboost, then compared the different classification models to assess the best performing one(s).
User analytics on the customer overview use of the Telecom product, customer engagement, experience, and satisfaction analysis. The main goal of the project is to analyze opportunities for growth and identify opportunities to drive profitability by changing the focus of which products or services are being offered.
The key skill acquired is data visualization using seaborn and matplotlib in python.
APPROACH
1. Understand the dataset, identify the missing values & outliers if any using visual and quantitative methods to get a sense of the story it tells
2. Identifying the top 10 handsets used by the customers. Then, identifying the top 3 handset manufacturers Next, identify the top 5 handsets per top 3 handset manufacturer.
3. Aggregating per user the information in the column number of xDR sessions Session duration the total download (DL) and upload (UL) data and by the total data volume (in Bytes) during this session for each application
4. Analysis
Univariate, Bivariate, and Multivariate analysis
Correlation Analysis
User engagement Analysis
4. Using k-means clustering algorithm, grouping users in k engagement clusters based on the engagement metrics:
5. Experience and Satisfaction Analytics.
Speech-to-text technology for the Swahili language. Building a deep learning model that is capable of transcribing a speech to text.
I helped the project in Setting up team git repository projects assignment, DVC, MLFlow, and CML integration, preparing the metadata for easily accessible JSON format, helped the team in AWS server setup, and partially helped in the modeling & testing process.
APPROACH
1. Load Audio files: The input data are audio files of the spoken speech that are in the audio format “.wav”
2. Resample the audio files so as to have uniform sample rates for each item
3. Convert all the items to have the same number of channels. The channels could be mono (one channel) or stereo (2 channels). Our data has mono channels
4. Convert all items to have the same duration, which involves padding the shorter audio files and truncating the longer ones
5. Time Shift our audio left or right randomly by a small percentage, or change the Pitch or the Speed of the audio by a small amount so as to add noise to our audio files
6. Convert the raw audio files to Mel Spectrograms which capture the nature of the audio files as images by decomposing them into sets of frequencies
7. The Mel Spectrograms are converted to Mel Frequency Cepstral Coefficients (MFCCs) which is important when dealing with human speech. This is because MFCCs correspond to the frequency ranges at which humans speak
The finance team wants to forecast sales in all their stores across several cities six weeks ahead of time. Managers in individual stores rely on their years of experience as well as their personal judgment to forecast sales.
Time analysis is performed to check for trends and seasonality in sales overtime for all the Rossmann pharmaceutical sales. A predictive model, Prophet, is used to forecast sales in all their stores across several cities six weeks ahead of time
Water is very important for crop growth and health. We can better predict maize harvest if we better understand how water flows through a field, and which parts are likely to be flooded or too dry. One important ingredient to understanding water flow in a field is by measuring the elevation of the field at many points. The USGS recently released high-resolution elevation data as a lidar point cloud called USGS 3DEP in a public dataset on Amazon. This dataset is essential to build models of water flow and predict plant health and maize harvest.
A python module that can be used to fetch, visualize, and transform publicly available satellite and LIDAR data and interface with USGS 3DEP and fetch data using their API.
WORK IN PROGRESS...