C/C++, Python(Pandas, Scipy, Scikit-Learn, Seaborn, Matplotlib), Java, JavaScript, HTML5
Data Visualization: Tableau
Database structural language: JSP, SQL
Machine learning: Regression, Classification, Clustering, Random Forest, XGBoost, SVM, PCA
Git, GCP, Flask, Spark, Linux/Unix
Dashboard, Customer Segmentation, Anomaly Detection, Churn Analysis in energy, e-commerce, and Fintech
Employee Attrition Predictor
To mitigate the impact of internal employee attrition on company operations, developed an Employee Churn Predictor to identify high-churn-risk employees, discovered underlying factors, and provided a retention plan.
Conducted EDA and segmented employees based on demographic and work-related data, and analyzed employee churn dataset; Trained the classification models and found Random Forest is the best model based on the highest precision (0.96) and recall (0.93) scores of the test dataset; proposed retention strategies: optimizing labor cost based on the visualized Decision Tree and saved $262k annually for the company.
Wrapped the best model of Random Forest as an API using Flask and deployed on GCP.
Credit Card Default Detection
Predicted the likelihood of default for credit card customers by using customer transaction data.
Performed data quality check and data processing including formatting, cleaning and sampling.
Built ML classification models (Logistic Regression, Random Forest, KNN, Voting Classifier).
Improved the model effectiveness by hyper-parameter tuning, model ensemble.
Food Distribution Service Analysis and Prediction
Predicted which customers would benefit the most from the enhanced service (Food daily delivery service).
Visualized and pre-processed dataset, including sanity checks and data cleaning.
Conducted EDA and segmented customers based on transaction and survey data, analyzing 10,000 customers.
Built ML classification models (Logistic Regression, Random Forest, XGBoost) and used SHAP values to interpret model outputs and provide actions to reduce the cost to serve for the company.
Customer Reviews Analysis using NLP
Developed an NLP-based system to detect the “bad” reviews in text data from Amazon Alexa users.
Conducted EDA, including data visualization, cleaned data with NLTK, vectorized text data with TF-IDF, trained multi-class classifiers (Naïve Bayes, Random Forest, Gradient Boosting) and built an end-to-end pipeline for reviews classification.
Evaluate the model performance models using Normalized Discounted Cumulative Gain (NDCG) and selected Random Forest as the best model (ROC-AUC: 0.93).
E-commerce Customer Shopping Behaviors Analysis and Prediction
Provided detailed insights for customer online shopping behaviors and marketing suggestions by analyzing market trends and customer lifetime values (CLTV).
Visualized and preprocessed dataset, including sanity check and data cleaning.
Explored sales trend, product bucket, and sales funnel of the website, including churn, conversion, and retention rate of the customers; provided recommendations to improve customer retention rate by 10%.
Segmented customers into 5 groups by K-means based on customers RFM analysis and provided the marketing strategy for each group to improve customer retention and achieve the sales growth.
Ph.D. in Chemistry, Tamkang University, Taiwan (2015)