Xuntao Hu
About me
Machine Learning and Data Science enthusiast, with fluent Python skills and 6 years research experience as a PhD in Mathematics. Project experience in predictive modeling, Computer Vision and NLP (with Scikit-Learn, Tensorflow and OpenCV).
SKILLS:
Data Science: Machine Learning, Data Mining, A/B Testing, Recommender System, Web Crawler.
Programming: Python (Proficient, 30+K lines of codes), SQL (Proficient); C++, Java, R (Intermediate).
Machine Learning Models: Linear / Logistic Regression, Random Forest, Boosting, GDA, SVM, K-means Clustering, Hierarchical Clustering, Gaussian Mixed Models.
Python Libraries: Scikit-Learn, Tensorflow, Keras, Pandas, Numpy, Scipy, Matplotlib.
Deep Learning: CNN, RNN, LSTM, GRU.
Languages: English (fluent), Chinese Mandarin (native), Cantonese (native).
Contact Information
Email: huxuntao AT gmail.com
LinkedIn: www.linkedin.com/in/xuntao-hu
GitHub: https://github.com/XT286/
Resume
Please find my resume on the upper right corner, or upper left corner if you are on a phone.
Instead, you can also use the link here.
Selected Projects
1. Prediction of NBA Rookies’ Performances
- Built Machine Learning models to predict NBA rookies’ draft positions and first-year performances based on their NCAA statistics. Attained a 0.927 R-squared value on lottery rookies.
- Scraped/cleaned NCAA data from past 20 years. Applied feature engineering and backward selection.
- Trained Multi-class Logistic Regression on rookie draft positions. Deployed Linear Regression and Random Forest to predict the PER values of rookies within each class.
- Technologies: Python, SciKit-Learn, Statsmodels, Matplotlib, BeautifulSoup.
2. Dog-Cat Photo Classification (Kaggle Competition) https://github.com/XT286/DogCat
- Classified images of cats and dogs using Computer Vision architecture. Ranked top 3% in all participants.
- Constructed Convolutional Neural Network to recognize patterns from over 25,000 images of cats and dogs. Implemented transfer learning that combines InceptionV3, Xception and ResNet50 models.
- Technologies: Python, Tensorflow, Keras, OpenCV, CNN, Computer Vision.
3. News Categorization (Kaggle Competition) http://github.com/XT286/News_Category_Kaggle
- Categorized News by constructing Natural Language Processing architecture. Achieved 63% accuracy with limited data and hardware.
- Implemented Deep Neural Networks: CNN, Bidirectional GRU and LSTM with Attention to recognize contents and retrieve information from the titles and text bodies of News.
- Technologies: Python, Tensorflow, Keras, Natural Language Processing, GRU, LSTM.
4. Prediction of Revenue Levels https://github.com/XT286/OnlineBehavior
– Quantified customers’ online behavior by applying Feature Engineering. Applied backward selection on features to reduce collinearity.
– Used Multi-class Logistic Regression and Random Forest to predict revenue levels.
– Technologies: Python, Pandas, Scikit-Learn, Numpy, Matplotlib.