Hey, I'm

-Hangyu (Cedric) Liu-

A growing data scientist!

Biography


I am currently a last year Master student at Brown University, and majoring in Data Science.

Recently, I'm working as a Data Science Co-op at Wayfair's MAD science team, and focusing on developing a Bayesian Hierarchical Marketing Mix Modeling to optimize the allocation of advertising expenditure.

And this summer I worked at Biogen Inc. as a Data Mining intern, where I was focused on some NLP tasks, such as Topic Modeling, Text Similarity Search Engine, etc. For Topic Modeling part, I cooperated with Prof. Tracy Ke at Harvard University to use AutoEncoder to extend her new SVD-based topic modeling method. Besides, I also developed a web application with Flask and Dash module to improve the ease of use.(Web Application Demo)

I received my B.S. degrees in Management Science and Information System & Mathematical Economics from Xiamen University (2014-2018).

Areas of Interest:

My areas of interest are: Advanced Data Visualization(Dash and Plotly), Machine Learning, Computer Vision, Natural Language Processing, Recommendation System, Reinforcement Learning, Causal Inference, A/B Test and Data Mining.

Programming and Software Skills:

Data Collection and Wrangling: Python: beautifulsoup, pandas, Numpy, SciPy, json; R: Rvest, dplyr, jsonlite; RegEx

Data Visualization: Python: seaborn, matplotlib, Plotly, Dash, Tabpy; R: ggplot2, lattice, htmlwidgets

Data Modeling and Machine Learning: Python: scikit-learn, XGBoost, mlens.ensemble; R: e1071, nnet, caret

Hyperparameter Optimization: Grid Search, Random Search and Bayesian Optimization in Python and R

Version Control: Github, ReviewNB

Deep Learning Frameworks: TensorFlow, Keras

Big Data & Scalability: PySpark, AirFlow

Other Skills: Julia Programming (Built SVM, Neural Network and etc. from scratch), Unix Shell(Bash), SQL, C, Jupyter Notebook , A/B Test, Python-driven Web Application, Time Series Analysis, NLTK and gensim(NLP)

Languages:

English (Full Professional Proficiency); Mandarin (Native)

Professional Experience

Wayfair Inc., Data Science Co-op, Boston MA Aug. 2019-Present

  • Focused on developing an end-to-end workflow of Marketing Mix Modeling to optimize advertising spend
  • Reproduced Adstock and Hill function in Google’s research paper in Python and built a customized MLE optimizer with L2 regularizer to estimate parameters
  • Designed a matrix algorithm to reduce the model training time from 7000s to 100s (faster by 98.5%)
  • Established causality by splitting out confounders, and proposed a methodology to validate the model with A/B test results

Biogen Inc., Data Mining Intern( Demo) , Cambridge MA May. 2019-Aug. 2019

  • Automated and mechanized data cleaning process by developing a customized python module with NLTK, spaCy and RegEx
  • Tackled category selection bias by conducting topic modeling: implemented LDA model with genism and mallet respectively, and cooperated with Prof. Tracy Ke at Harvard University to use AutoEncoder to extend her new SVD-based topic modeling
  • Reduced potentially redundant work and helped to conduct cross check for data quality by developing a customized text similarity search engine with NLP techniques (TF-IDF, Word2Vec, FastText, Google’s Universal Sentence Encoder, etc.)
  • Improved the ease of use by converting Python scripts into a web-based user interface with Flask and Dash module

Xiamen Yucheng Limited, Co-Founder, Xiamen China April. 2016-June. 2018

  • Co-founded a tech start-up with 12 people, dedicated to develop an app which do translate from Mandarin to Uighur
  • Responsible for Fundraising, and explored to utilize Seq2Seq Network and Attention Mechanism to refine translation task

Relevant Projects

Toxic Comment Classification (Demo) Apr. 2019-May. 2019

  • Used RNN with LSTM cells to tackle the toxic comment classification problem, obtained 0.96 accuracy on test set
  • Mitigated overfitting problem by adding L2 regularizer and adding early stopping callback
  • Avoided vanishing gradient problem and achieved faster training by adding batch normalization layers

One Shot learning & CV: Face Verification System (Demo) Dec. 2018-Jan. 2019

  • Built a Siamese Network with Google FaceNet (NN4 architecture) as three side-by-side CNN components and Triplet loss
  • Developed NN with Keras, trained it on Google Colab and evaluated model on LFW face dataset, achieved 95.3% accuracy

Kaggle Competition: Predict the Housing Price with High-dimensional Features (Top 15%) Nov. 2018-Dec. 2018

  • Conducted detailed EDA and visualization (t-SNE) and used PCR, Lasso Regression and XGBoost, achieved 0.078 RMSE

Data Acquisition & Deployment: Spotify and Billboard Text Data Analysis Oct. 2018-Nov. 2018

  • Used beautifulsoup in Python to build a web scraper to obtain data, converted nested JSON data into pandas DataFrame
  • Utilized Google SDK to deploy my data cleaning function and fuzzy matching algorithm on Google Cloud Platform

JDD-2017 Global Data Challenge – Transaction Risk Detection Oct. 2017-Dec. 2017

  • Conducted well-performed feature engineering, and utilized SMOTE method and Under-Sampling to tackle imbalanced issue
  • Used ensemble learning algorithms: AdaBoost, XGBoost and Random Forest, and achieved top 5% final ranking

Awards

  • National Academic Excellence Scholarship (top 1) in 2015 at School of Management
  • Dean’s list for three consecutive years (2015 - 2017) at The Wang Yanan Institute for Studies in Economics
  • National 1st Prize in China National College Student "Innovation, Originality and Entrepreneurship" Challenge (2017)

Academic Dissertation

  • Research on Optimal Portfolio Efficiency Based on Improved Particle Swarm Optimization Algorithm
  • Forecasting Model of Bitcoin Market Based on Bayesian Structural Time Series
  • China’s saving rate: The impact of the age structure of the population

Contact:

E-mail: hangyu_liu@brown.edu

Phone Number: (+1) 401-601-1828

LinkedIn: www.linkedin.com/in/hliu5

Github: https://github.com/Cedric-Liu/Coding-Journal/tree/master/Model%20Building%20From%20Scratch

Projects Writing Sample: https://xinyanhe1.wixsite.com/2040finalproject

Address:

Greater Boston Area