Data source is the first step in which data is collected from various sources based on the need for analysis.
STATIC DATA
Flight delay data spanning two years (2022 - 2023) was obtained from the Bureau of Transportation Statistics for in-depth analysis. The data was acquired in CSV format, downloaded monthly, and subsequently compiled for analytical purposes. The choice to leverage this specific dataset was driven by the central goal of scrutinizing and predicting flight delays. This dataset serves as the foundational information, offering details on the dates of flight delays and the underlying reasons behind them. Additionally, the decision to utilize this data is reinforced by its status as government-owned data in the United States, ensuring a level of reliability.
API
For analytical purposes, weather data was retrieved from the Open Meteo Weather API. The fetched data pertains to temperature in Celsius, determined by input parameters such as Latitude, Longitude, and Date. The decision to utilize this API was driven by the necessity for weather information in the analysis, aiming to identify patterns that may contribute to flight delays.
WEB SCRAPING
Data was web scraped from google developers website for analytical purposes. The data collected include the latitude and longitude information of US states. This data was scraped because there was no option to download the data and there were no proper dataset in the web which had all the information required including the state abbrevation state name, latitude and longitude. So using python the data is scrapped.