Projects

Pricing Decisions for Tropicana Orange Juice

Nick’s is a regional grocery retailer with 15 stores in a mid-west market . I have recently been hired by Nick’s to assist in the pricing of the retailer’s products in selected categories. The specific task for me is to determine the optimal retail price for a particular item in the refrigerated orange juice category: Tropicana 64 ounces. This item, defined by brand×size combination, is the largest selling item in the category for the retailer.

Model: Linear Regression, Linearizable Regression

Analyzing Print Ad Designs using Eye-Movement Data

A big challenge facing print advertisers is that print media are becoming more cluttered with ads: many magazines have more than half of their pages for advertisements. It has become more difficult to attract consumers’ attention in these cluttered media environments. Ads that fail to attract consumers’ attention cannot be effective, but attention is not enough: an effective ad needs to generate a reasonably durable memory of the brand (or other target). To achieve this goal, advertisers first need to understand how consumers pay attention to print advertisements and how this contributes to memory for the advertised brands/products/firms.

Model: Generalized Linear Models

Evaluating Sales Promotion Effects Using Scanner Panel Data

The effectiveness of sales promotions can be assessed using store-level aggregated sales data or individual household level scanner panel data. In this project, I use household scanner panel to assess the profit implications of price cuts, controlling for other sales promotions, on individual consumers’ purchase behavior, specifically, in terms of their category purchase incidence, brand choice, and purchase quantity decisions.

Model: Multinomial Logit Model, Binary Logit Model, Semi-log, log-log, Poisson regression

International Market Segmentation for Global Retailers

The goal of this project is to identify market segments based on drivers of store image and choose the entry strategy for a particular food retailer. Retailers achieve a distinctive positioning through the development of a unique store image. Store image has been found to relate to key indicators of retail success such as store visit frequency, store loyalty, and share-of-wallet.

Model: Mixture Cluster Model, Mixture Regression Model

New Product Development for Philips Coffee Marker

Companies spend enormous amount of resources on new product developments. Much of the costs, such as those on research and development (R&D), engineering, and test marketing, incur before a new product is actually launched. Since a lot of money is spent on products that never reach the market, it is very important for companies to conduct adequate research into the potential of new products before they are launched, and to predict the market shares of new products and t heir impact on a company’s and competitors’ existing products. Conjoint analysis is a powerful research technique for identifying new product opportunities and for predicting market shares of new and existing products.

Model: Conjoint Analysis, Mixture Multinomial Logit Model

Divvy Bike Analysis Project

Divvy is Chicagoland's bike share system across Chicago and Evanston. Divvy provides residents and visitors with a convenient, fun and affordable transportation option for getting around and exploring Chicago.

Supposedly, I am currenly working as a junior data analyst at Divvy. The director of marketing believes the company's future success depends on maximizing the number of annual memberships. Therefore, my team wants to understand how casual riders and annual members use Divvy bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Divvy executives must approve my recommendations, so they must be backed up with compelling data insights and professional data visualizations.

Language and Packages : R, ggplot,tidyverse,dplyr,leaflet,ggmap,tidyr,forcats

NBA MVP Web Scraping

The Most Valuable Player (MVP) award is one of the most prestigious accolades that an NBA player can receive in his career. At the end of each regular season (typically during April/May), the MVP award is presented to a single player that is deemed worthy of this title.This is a project that I use BeautifulSoup package in Python to scrape NBA stats from www.basketball-reference.com. It is a preparation step for future machine learning that predict who has the best chance to win current year NBA MVP based on historic stats data. I created 3 datasets from scraping to feed the machine learning model: MVP voting per year, all players stats per year and team standings per year. I assume these 3 datasets that can make the prediction better than others. All datasets are saved in csv format.

Language and Packages : Python, BeautifulSoup, pandas, requests,selenium

NBA MVP Data Cleaning

This is a crucial step for future analysis and machine learning. I combined all data extracted html into one csc file. I also did some data exploration and showed some interesting data visualizations i.e., highest scorers' name overall, highest scorers' score per year.

Language and Packages : Python,pandas

NBA MVP Machine Learning

This is the final machine learning step. Algorithms I have used in this part includes: Ridge regression, Random Forest.

In order to test the accuracy of the algorithm, I used "average precision" to measure the difference between actural rank of a certain player and the predicted rank. Also I am using backtesting method to test the accuracy of this algorithm on historic data so that I can have confidence to apply this model in the future.

Language and Packages : Python, pandas, sklearn, Ridge, Random Forest

Starbucks Review NLP

This is a dataset containing 850 Starbucks reviews from all over the world. In this project, I was experimenting with 3 types of NLP tools for sentiment analysis: VADER, Roberta Pretrained Model and Huggingface's transformers pipeline. I compared 3 tools' advantages and disadvantages by looking at specific examples and printing barplots for each of the result. VADER uses bag-of-words method that lacks understanding the context of the sentence. Roberta takes the pretrained model based on tweets and apply its model to measure reviews' sentiment. Transformers pipeline turns out to be the most easy way to implement, I just downloaded pretrained model and use the function directly. This could largely improve efficiency.

Page updated

Google Sites

Report abuse