Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
Support-vector machines (SVM) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier.
Decision trees are a series of sequential decisions that branch to a specific end result. Think of a decision tree as one root node at the top branching off to different nodes. Eventually the decision reaches an end node (leaf) that classifies an unknown dataset.
Random Forest is a robust machine learning algorithm that can be used for a variety of tasks including regression and classification. It is an ensemble method, meaning that a random forest model is made up of a large number of small decision trees, called estimators, which each produce their own predictions. The random forest model combines the predictions of the estimators to produce a more accurate prediction.
So far, we have built models to predict severity, and found that Random Forest is the best model. In the future, we want to develop an application by capturing the dynamic data, and using this model to predict severity, so that it can remind users where there will be a long delay or short delay caused by the car accident.