Michael Smith

A full stack developer, data scientist and problem solver. I'm currently looking for a role that will challenge my analytical skills and give me a chance to lead through data-driven decision making.

Current Projects

A New House

I've recently bought a house that was built in the early 1920s, and I'm learning all about the nuances of home ownership, especially for a century old home. After completely redoing the wiring inside and out with a dedicated 15 amp quad outlet for my home office set up, I get to experience the joy of kitchen renovation as I hope to install a dishwasher in cabinets that look like they were last painted in the 70s.

Back to the 80s

My grandfather gifted me his 1985 Honda Goldwing LTD after learning that I had been riding for the last few years. There's a lot to do to get this bike road worthy, but I'm up to the challenge especially since I've already restored a 1985 Honda Rebel and a 2002 Yamaha Roadstar. This bike joins my collection alongside a 2006 Triumph Rocket III and a 1986 Honda Elite 150 Deluxe.

First Competition

I've had an account for years, ever since I started my Master's at SMU. But I'm finally going to dip my toes into the pool and try my hand at a Kaggle competition. My aim is to build up my analytics portfolio and demonstrate my skills. Thankfully, I have a couple of friends that can give me a few pointers.

Past Projects

CitiBike NYC Open Data Set Analysis

As a project for a course in my Data Science program at SMU, I, along with two other students, performed a cluster-based random forest classification regression analysis on CitiBike's NYC open data set. This data set consists of millions of rental records of CitiBike bicycles in Manhattan. We settled on the question of, "Can we identify CitiBike subscribers based on rental usage?" Using spectral clustering, we identified key attributes that can be used to identify different types of customers, some of which were subscribers. We were ultimately able to identify subscribers to CitiBike's services with ~73% accuracy. While not stellar, we believe some of the discrepancy lied in false positives, which we identified as non-subscribers that matched travel patterns of subscribers. In which case, we developed a simple model to apply to real-world, real-time data to identify key marketing areas to promote CitiBike's subscription service where it would have the most impact, based on rental locations with high false positive renters.

Download (PDF)

Employee Attrition: What makes an employee leave?

In this paper, we present a model for employee attrition prediction and discuss the ethical impacts of using such a model within private and public sector organizations. As it is in Human Resource personnel’s best interest to improve retention, implementing statistical and machine learning techniques is the most viable means to attrition abatement. To this end, we examine Office of Personnel Management public sector, Bureau of Labor Statistics public sector, and IBM anonymized private sector employee separation data. Three classification models (Methodologies include Logistic Regression, Random Forest, and K Nearest Neighbor) are trained and tested on these data before selecting our best fit model for attrition prediction. We finally use metrics such as Gini and Permutation Importance to identify the most impactful variables in determining prediction outcome before presenting the ethical ramifications of using such outputs in HR planning.

Download (PDF)

Anonymization of Data

In the name of scientific progress, an amazing number of data sets have been released as a way to promote collaboration and development of the data science community. This paper critically reviews an array of anonymization theories and techniques and applying them to a fake data set, with the goal of helping others weigh the cost, benefit, and, most importantly, the ethics of anonymizing data. We test these techniques on a fake dataset, taking an in-depth look at with what ease someone can be armed with a small amount of data and yet cause serious ramifications.

Download (PDF)

Skills

Coding Languages - Python, Javascript, Java, C++, PHP, HTML, CSS, C#, R, SAS, XML/XSLT

Database - SQL, MongoDB, NoSQL

Server - IIS, Apache, Node.js, Grunt, Gulp, Rhino

Frameworks - JQuery, Ember, Backbone, Angular, CodeIgniter, CakePHP

Data & Visualization - Scikit, Tensorflow, R, SAS, Seaborn, Processing, Tableau, Excel

Education

Master of Business Administration
Texas Wesleyan University, 2021 (Anticipated)

Data Science, Master of Science
Southern Methodist University, 2017

English, Bachelor of Arts
Texas Wesleyan University, 2010

Employment

Web Developer, July 2012 - Present
Texas Wesleyan University, Fort Worth, TX

Integrated CRM backend with CMS front end through a REST API and custom content controls for content contributors.
200+ content contributors supported by developing modular and accessible content controls within CMS
Captured “student stories” with start to finish tracking and targeted analytics
Reported and presented KPI metrics to executive level staff
Lead interdepartmental projects such as upgrading our hosting solution, distributed analytics reporting, and student enrollment predictive analysis scoring

Web Specialist, March 2011 - June 2012
Texas Wesleyan University, Fort Worth, TX

Migrated 1,600+ page website from static HTML files to CMS
Managed and trained small team of contract employees during migration
Established web site architecture based around an admissions first focus

Technical Writer, August 2010 - December 2010
TekCenture, Inc., Irving, TX

Produced interactive, demonstrative videos for mobile devices
Scripted introductory marketing videos for mobile devices
Required rapid self-training in new technology and the ability to distill the information into digestible end-user instructions

[This space intentionally left blank.]