Hey, I'm
-Hangyu (Cedric) Liu-
A growing data scientist!
Biography
I am currently a last year Master student at Brown University, and majoring in Data Science.
Recently, I'm working as a Data Science Co-op at Wayfair's MAD science team, and focusing on developing a Bayesian Hierarchical Marketing Mix Modeling to optimize the allocation of advertising expenditure.
And this summer I worked at Biogen Inc. as a Data Mining intern, where I was focused on some NLP tasks, such as Topic Modeling, Text Similarity Search Engine, etc. For Topic Modeling part, I cooperated with Prof. Tracy Ke at Harvard University to use AutoEncoder to extend her new SVD-based topic modeling method. Besides, I also developed a web application with Flask and Dash module to improve the ease of use.(Web Application Demo)
I received my B.S. degrees in Management Science and Information System & Mathematical Economics from Xiamen University (2014-2018).
Areas of Interest:
My areas of interest are: Advanced Data Visualization(Dash and Plotly), Machine Learning, Computer Vision, Natural Language Processing, Recommendation System, Reinforcement Learning, Causal Inference, A/B Test and Data Mining.
Programming and Software Skills:
Data Collection and Wrangling: Python: beautifulsoup, pandas, Numpy, SciPy, json; R: Rvest, dplyr, jsonlite; RegEx
Data Visualization: Python: seaborn, matplotlib, Plotly, Dash, Tabpy; R: ggplot2, lattice, htmlwidgets
Data Modeling and Machine Learning: Python: scikit-learn, XGBoost, mlens.ensemble; R: e1071, nnet, caret
Hyperparameter Optimization: Grid Search, Random Search and Bayesian Optimization in Python and R
Version Control: Github, ReviewNB
Deep Learning Frameworks: TensorFlow, Keras
Big Data & Scalability: PySpark, AirFlow
Other Skills: Julia Programming (Built SVM, Neural Network and etc. from scratch), Unix Shell(Bash), SQL, C, Jupyter Notebook , A/B Test, Python-driven Web Application, Time Series Analysis, NLTK and gensim(NLP)
Languages:
English (Full Professional Proficiency); Mandarin (Native)
Professional Experience
Wayfair Inc., Data Science Co-op, Boston MA Aug. 2019-Present
- Focused on developing an end-to-end workflow of Marketing Mix Modeling to optimize advertising spend
- Reproduced Adstock and Hill function in Google’s research paper in Python and built a customized MLE optimizer with L2 regularizer to estimate parameters
- Designed a matrix algorithm to reduce the model training time from 7000s to 100s (faster by 98.5%)
- Established causality by splitting out confounders, and proposed a methodology to validate the model with A/B test results
Biogen Inc., Data Mining Intern( Demo) , Cambridge MA May. 2019-Aug. 2019
- Automated and mechanized data cleaning process by developing a customized python module with NLTK, spaCy and RegEx
- Tackled category selection bias by conducting topic modeling: implemented LDA model with genism and mallet respectively, and cooperated with Prof. Tracy Ke at Harvard University to use AutoEncoder to extend her new SVD-based topic modeling
- Reduced potentially redundant work and helped to conduct cross check for data quality by developing a customized text similarity search engine with NLP techniques (TF-IDF, Word2Vec, FastText, Google’s Universal Sentence Encoder, etc.)
- Improved the ease of use by converting Python scripts into a web-based user interface with Flask and Dash module
Xiamen Yucheng Limited, Co-Founder, Xiamen China April. 2016-June. 2018
- Co-founded a tech start-up with 12 people, dedicated to develop an app which do translate from Mandarin to Uighur
- Responsible for Fundraising, and explored to utilize Seq2Seq Network and Attention Mechanism to refine translation task
Relevant Projects
Toxic Comment Classification (Demo) Apr. 2019-May. 2019
- Used RNN with LSTM cells to tackle the toxic comment classification problem, obtained 0.96 accuracy on test set
- Mitigated overfitting problem by adding L2 regularizer and adding early stopping callback
- Avoided vanishing gradient problem and achieved faster training by adding batch normalization layers
One Shot learning & CV: Face Verification System (Demo) Dec. 2018-Jan. 2019
- Built a Siamese Network with Google FaceNet (NN4 architecture) as three side-by-side CNN components and Triplet loss
- Developed NN with Keras, trained it on Google Colab and evaluated model on LFW face dataset, achieved 95.3% accuracy
Kaggle Competition: Predict the Housing Price with High-dimensional Features (Top 15%) Nov. 2018-Dec. 2018
- Conducted detailed EDA and visualization (t-SNE) and used PCR, Lasso Regression and XGBoost, achieved 0.078 RMSE
Data Acquisition & Deployment: Spotify and Billboard Text Data Analysis Oct. 2018-Nov. 2018
- Used beautifulsoup in Python to build a web scraper to obtain data, converted nested JSON data into pandas DataFrame
- Utilized Google SDK to deploy my data cleaning function and fuzzy matching algorithm on Google Cloud Platform
JDD-2017 Global Data Challenge – Transaction Risk Detection Oct. 2017-Dec. 2017
- Conducted well-performed feature engineering, and utilized SMOTE method and Under-Sampling to tackle imbalanced issue
- Used ensemble learning algorithms: AdaBoost, XGBoost and Random Forest, and achieved top 5% final ranking
Awards
- National Academic Excellence Scholarship (top 1) in 2015 at School of Management
- Dean’s list for three consecutive years (2015 - 2017) at The Wang Yanan Institute for Studies in Economics
- National 1st Prize in China National College Student "Innovation, Originality and Entrepreneurship" Challenge (2017)
Academic Dissertation
- Research on Optimal Portfolio Efficiency Based on Improved Particle Swarm Optimization Algorithm
- Forecasting Model of Bitcoin Market Based on Bayesian Structural Time Series
- China’s saving rate: The impact of the age structure of the population
Contact:
E-mail: hangyu_liu@brown.edu
Phone Number: (+1) 401-601-1828
LinkedIn: www.linkedin.com/in/hliu5
Github: https://github.com/Cedric-Liu/Coding-Journal/tree/master/Model%20Building%20From%20Scratch
Projects Writing Sample: https://xinyanhe1.wixsite.com/2040finalproject
Address:
Greater Boston Area