Automatic Machine Learning Pipeline
I created a streamlit application to automate a machine learning pipeline for analyzing and modeling from a dataset. Key Features include:
Data ingestion (handles multiple file types)
Exploratory data analysis (handles missing values, feature and target distributions, correlation matrix, etc)
Feature extraction and transformation (handles scaling, imputing, and can create PCA components)
Automatic detection of regression or classification problems
Training multiple models with hyperparameter tuning and cross validation steps
Selecting and saving best model for making predictions on test set (will also generate predictions from test set if provided)
Generates HTML report for modeling process as well EDA output that can be downloaded as zip file
Future work will include incorporating more model types including neural networks (with automatic detection of most effective model architecture), time series predictions, clustering algorithms, and ensemble models.