Visa Inc. (12/2021 - Present)
2022.10 - present: data engineering and pipeline automation
2021.12 - 2022.06: analytics and insights
Peet's Coffee (06/2021 - 12/2021)
Data engineering and ETL pipelines
-Python, JSON, SQL, Azure
Sphere Institute Affiliated with Acumen, LLC. (06/2019 - 06/2021)
Analytics
SQL
Achieved $13 RMSE when comparing model performance with or without PCA pipelines and cross validation on the out-of-sample data
Categorized users according to the total revenue and the sample size of each combination of installation platforms; encoded event variables according to their relationship with payments; created new features as the product of payment event encoders and general event encoders; optimized models for each group
Code: https://github.com/QingchuanLyu/Customer-Payment-Prediction
Achieved 0.03 cross-entropy loss on the unseen data by creating a resampling method for this Multi-label classification problem, building pipelines with PCA and cross-validation to reduce multicollinearity and overfitting; trained Random Forest, K-Nearest-Neighborhood and Multi-K-Nearest-Neighborhood models
Investigated outlier labels, and the changes of feature distribution at different aggregated levels; confirmed 30% of features were highly correlated; removed a redundant feature
Code: https://github.com/QingchuanLyu/Multilabel-Classification
Investigated the impacts of holidays on daily sales during different times of a year; explored sales trends in different states, item categories, and specific stores across a year"
Used Dicky-Fuller Test to decide the order of differencing and ACF/PACF plots to decide the order of Moving-average/Autoregressive models; verified the selected orders by p-values
Trained Decision Tree, Random Forest and Light Gradient Boosting Tree; achieved 0.08 mean-absolute-percentage-error (MAPE) on out-of-sample data
Achieved 0.03 root-mean-square-error with Random Forest, and 0.02 root-mean-square-error with 10-fold cross validated Elastic Net with a PCA pipeline on out-of-sample data
Checked if missing values were missing at random and imputed/dropped missing values accordingly; applied One-Hot-Encoder before training linear models, and Label-Encoder before training tree algorithms
Code: https://github.com/QingchuanLyu/Predicting-House-Prices
Achieved 0.7 Jaccard Index on the unseen data by training a 4-layer convolutional networks with bidirectional encoders; trained 8-iteration Named-entity Recognition for each sentiment separately
Investigated top common words, special characters and different lengths of texts and support phrases; customized techniques to clean part of stop words, special characters and punctuations
Code: https://github.com/QingchuanLyu/Tweet-Sentiment-Extraction