Xuntao Hu

About me

Machine Learning and Data Science enthusiast, with fluent Python skills and 6 years research experience as a PhD in Mathematics. Project experience in predictive modeling, Computer Vision and NLP (with Scikit-Learn, Tensorflow and OpenCV).

SKILLS:

Data Science: Machine Learning, Data Mining, A/B Testing, Recommender System, Web Crawler.

Programming: Python (Proficient, 30+K lines of codes), SQL (Proficient); C++, Java, R (Intermediate).

Machine Learning Models: Linear / Logistic Regression, Random Forest, Boosting, GDA, SVM, K-means Clustering, Hierarchical Clustering, Gaussian Mixed Models.

Python Libraries: Scikit-Learn, Tensorflow, Keras, Pandas, Numpy, Scipy, Matplotlib.

Deep Learning: CNN, RNN, LSTM, GRU.

Languages: English (fluent), Chinese Mandarin (native), Cantonese (native).

Contact Information

Email: huxuntao AT gmail.com

LinkedIn: www.linkedin.com/in/xuntao-hu

GitHub: https://github.com/XT286/

Resume

Please find my resume on the upper right corner, or upper left corner if you are on a phone.

Instead, you can also use the link here.

Selected Projects

1. Prediction of NBA Rookies’ Performances

- Built Machine Learning models to predict NBA rookies’ draft positions and first-year performances based on their NCAA statistics. Attained a 0.927 R-squared value on lottery rookies.

- Scraped/cleaned NCAA data from past 20 years. Applied feature engineering and backward selection.

- Trained Multi-class Logistic Regression on rookie draft positions. Deployed Linear Regression and Random Forest to predict the PER values of rookies within each class.

- Technologies: Python, SciKit-Learn, Statsmodels, Matplotlib, BeautifulSoup.

2. Dog-Cat Photo Classification (Kaggle Competition) https://github.com/XT286/DogCat

- Classified images of cats and dogs using Computer Vision architecture. Ranked top 3% in all participants.

- Constructed Convolutional Neural Network to recognize patterns from over 25,000 images of cats and dogs. Implemented transfer learning that combines InceptionV3, Xception and ResNet50 models.

- Technologies: Python, Tensorflow, Keras, OpenCV, CNN, Computer Vision.

3. News Categorization (Kaggle Competition) http://github.com/XT286/News_Category_Kaggle

- Categorized News by constructing Natural Language Processing architecture. Achieved 63% accuracy with limited data and hardware.

- Implemented Deep Neural Networks: CNN, Bidirectional GRU and LSTM with Attention to recognize contents and retrieve information from the titles and text bodies of News.

- Technologies: Python, Tensorflow, Keras, Natural Language Processing, GRU, LSTM.

4. Prediction of Revenue Levels https://github.com/XT286/OnlineBehavior

– Quantified customers’ online behavior by applying Feature Engineering. Applied backward selection on features to reduce collinearity.

– Used Multi-class Logistic Regression and Random Forest to predict revenue levels.

– Technologies: Python, Pandas, Scikit-Learn, Numpy, Matplotlib.