Model: Linear Regression, Linearizable Regression
Model: Generalized Linear Models
The effectiveness of sales promotions can be assessed using store-level aggregated sales data or individual household level scanner panel data. In this project, I use household scanner panel to assess the profit implications of price cuts, controlling for other sales promotions, on individual consumers’ purchase behavior, specifically, in terms of their category purchase incidence, brand choice, and purchase quantity decisions.
Model: Multinomial Logit Model, Binary Logit Model, Semi-log, log-log, Poisson regression
The goal of this project is to identify market segments based on drivers of store image and choose the entry strategy for a particular food retailer. Retailers achieve a distinctive positioning through the development of a unique store image. Store image has been found to relate to key indicators of retail success such as store visit frequency, store loyalty, and share-of-wallet.
Model: Mixture Cluster Model, Mixture Regression Model
Companies spend enormous amount of resources on new product developments. Much of the costs, such as those on research and development (R&D), engineering, and test marketing, incur before a new product is actually launched. Since a lot of money is spent on products that never reach the market, it is very important for companies to conduct adequate research into the potential of new products before they are launched, and to predict the market shares of new products and t heir impact on a company’s and competitors’ existing products. Conjoint analysis is a powerful research technique for identifying new product opportunities and for predicting market shares of new and existing products.
Model: Conjoint Analysis, Mixture Multinomial Logit Model
Divvy is Chicagoland's bike share system across Chicago and Evanston. Divvy provides residents and visitors with a convenient, fun and affordable transportation option for getting around and exploring Chicago.
Supposedly, I am currenly working as a junior data analyst at Divvy. The director of marketing believes the company's future success depends on maximizing the number of annual memberships. Therefore, my team wants to understand how casual riders and annual members use Divvy bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Divvy executives must approve my recommendations, so they must be backed up with compelling data insights and professional data visualizations.
Language and Packages : R, ggplot,tidyverse,dplyr,leaflet,ggmap,tidyr,forcats
The Most Valuable Player (MVP) award is one of the most prestigious accolades that an NBA player can receive in his career. At the end of each regular season (typically during April/May), the MVP award is presented to a single player that is deemed worthy of this title.This is a project that I use BeautifulSoup package in Python to scrape NBA stats from www.basketball-reference.com. It is a preparation step for future machine learning that predict who has the best chance to win current year NBA MVP based on historic stats data. I created 3 datasets from scraping to feed the machine learning model: MVP voting per year, all players stats per year and team standings per year. I assume these 3 datasets that can make the prediction better than others. All datasets are saved in csv format.
Language and Packages : Python, BeautifulSoup, pandas, requests,selenium
This is a crucial step for future analysis and machine learning. I combined all data extracted html into one csc file. I also did some data exploration and showed some interesting data visualizations i.e., highest scorers' name overall, highest scorers' score per year.
Language and Packages : Python,pandas
This is the final machine learning step. Algorithms I have used in this part includes: Ridge regression, Random Forest.
In order to test the accuracy of the algorithm, I used "average precision" to measure the difference between actural rank of a certain player and the predicted rank. Also I am using backtesting method to test the accuracy of this algorithm on historic data so that I can have confidence to apply this model in the future.
Language and Packages : Python, pandas, sklearn, Ridge, Random Forest
This is a dataset containing 850 Starbucks reviews from all over the world. In this project, I was experimenting with 3 types of NLP tools for sentiment analysis: VADER, Roberta Pretrained Model and Huggingface's transformers pipeline. I compared 3 tools' advantages and disadvantages by looking at specific examples and printing barplots for each of the result. VADER uses bag-of-words method that lacks understanding the context of the sentence. Roberta takes the pretrained model based on tweets and apply its model to measure reviews' sentiment. Transformers pipeline turns out to be the most easy way to implement, I just downloaded pretrained model and use the function directly. This could largely improve efficiency.