PFBI Sample Data provided by Malaysia Airlines Berhad contains 273 columns of attributes and 6054 rows of data. There are more than 200 columns are blank (missing 6054 values) and columns like TOT_PAX_CT missing hundreds values.
After look through the dataset, we come out with 2 questions to gain insight and analysis
Identify the variables and data types after 17 attributes are selected
Analyze the basic metrics such as mean, max, min and standard deviation.
Correlation matrix heatmap shows the correlation between two variables
Base on dataset, there are total 3294 flights are on time and 2760 flights are delay.
Let's enjoy the video about data visualization dashboard on PFBI Sample Data
Insight & Analysis:
The probability of flight delay is higher when the destination of flight is non-Southeast Asia (rest of the world such as Australia, etc.) compare to Southeast Asia (Singapore, etc.).
Freighter services such as Kargo have higher probability of flight delay when the fly duration less than 500 minutes(8 hours),whereas passenger services have higher probability of flight delay when the fly duration is more than 300 minutes (5 hours). Normally, the fly duration more than 5 hours is international flight, so it may delay due to air traffic control restriction or late arrival of aircraft from another flight.
If the number of passenger booked is more than passenger boarded(load), it has a higher probability delay. We assume that this is due to they wait for the late comers.
The business class passenger probably relate to flight delay.