Every model and methodology demands particular data formats. Similarly, for the SVM, the data should be labeled as it is a supervised machine learning algorithm for effective processing. Moreover, it works well when the data is numeric as it is a pure math model. In case if categorical values are present, it can be encoded to numerical values and used as an input in the model, but its not advised as the results wont be reliable.
Initial data before processing for SVM
In the context of SVM, Python is employed. Python is initially utilized for data preparation tasks, ensuring the dataset is appropriately formatted. Once the data is formatted, then SVM is modeled to predict the flight delay whether is it an extended delay or a short delay based on the given input data. The SVM algorithm aims to create a linearly separate the data into their respective classes. Even if the data is not linearly separable in the given input space, the SVM leverages the kernel trick to transform the data into higher dimensional space where it can be linearly separable.
The dataset initially consisted of many columns, but the relevant features considered for analysis include the origin_temperature, destination_temperature, and the total_weather_delay. Total_weather_delay is a customized column that sums up both the origin_weather delay and the destination_weather delay. Then it's further converted into labeled data using a function that categorizes them into two categories: Short Delay and Extended delay based on the hours of delay. The short delay includes flight delays that are less than 1 hour, and extended delays are delays that include flight delays more than 1 hour.
The dataset is divided into two disjoint subsets, with a split of 70% for the training set and 30% for the testing set. These datasets are disjoint because it's important they don't share data because using the same data for both can make the model seem more accurate than it is, which isn't good.
Final input data after processing for SVM
Sample train data after processing for SVM
Sample test data after processing for SVM
Furthermore, once all these steps are verified, the data can be now modeled using the SVM algorithm.