The Federal Deposit Insurance Corporation (FDIC) API is extensively used in this application. It provides detailed information about insured U.S. banks, including basic details, financials, and historical data. Here are the key endpoints and their usage:
getInstitutionsAll(): Fetches a DataFrame containing information about all FDIC-insured institutions. It includes data such as the bank's name, certificate number, and classification.
getLocation(): Retrieves location data for a specific bank using its certificate number. This includes information on all domestic branches of the bank.
API Calls for Historical and Financial Data: The script constructs URLs to make specific API calls to retrieve historical and financial information based on either the bank's RSSD ID or its certificate number. This data includes metrics like net income, total assets, and employment figures.
The application constructs API requests to pull financial data for specific banks over a range of dates, detailing key financial metrics. This information is crucial for analyzing the financial health and performance of the banks.
The code also integrates data from the U.S. Small Business Administration (SBA), which offers insights into the lending activities of banks:
SBA Data (2019, Present): CSV files containing data on SBA loans approved in these years are downloaded and parsed. This data helps in understanding the involvement of banks in providing loans to small businesses, including the volume and size of loans.
Data preparation is a critical step in any data analysis or application development process, as it ensures the quality and usability of the data for insightful analysis and visualization. In the described application, the data preparation involves several key techniques and methods tailored to handle the specifics of financial and banking data. Here’s a more detailed look at the data preparation steps that can be discussed on a website:
The application begins with importing data from various sources, primarily the FDIC and the U.S. Small Business Administration (SBA). This data is often in CSV format or fetched via API calls, containing extensive raw data that require refinement to be usable:
Reading CSV Files: Utilizing pandas to load CSV files directly from URLs ensures that the most current data is accessed. Special considerations, like encoding specifications, are made to handle any potential data integrity issues during the import process.
API Data Fetching: Constructing and sending requests to specific API endpoints to retrieve banking data based on user input or predefined queries. The responses, often in JSON format, are parsed and converted into pandas DataFrames.
Given the complexities of financial data, several cleaning steps are applied to ensure accuracy and consistency:
Handling Missing Values: In datasets like those from the SBA, missing values can significantly affect the results. These are either filled with zeros or appropriate statistical measures (like the median or mean, depending on the distribution) to maintain data integrity without skewing the analysis.
Data Type Conversion: Ensuring that all data columns are in the correct format is crucial for subsequent analysis. For instance, converting financial figures from strings to integers or floats and parsing dates into datetime objects to facilitate time-series analysis.
To tailor the analysis to specific user inputs or requirements, the application includes mechanisms to filter and select data based on various criteria:
Selecting Relevant Columns: To streamline the datasets and enhance performance, unnecessary columns are removed, focusing only on those that are pertinent to the analysis.
Applying Filters Based on User Input: The application dynamically filters data based on parameters such as bank certificate numbers, date ranges, or specific financial metrics. This ensures that the analysis is both relevant and manageable in scope.
To ensure the application's robustness, error handling is integrated throughout the data preparation stages. This includes catching exceptions during data fetching and processing, providing error messages, and allowing for graceful recovery from common issues such as network errors or missing data.
For more insightful analysis, the data often needs to be aggregated or summarized:
Grouping Data: The application groups data by specific dimensions, such as fiscal years or financial metrics, to explore trends and patterns.
Computing Summary Statistics: Calculating sums, averages, medians, and other statistical measures gives a clearer view of the financial landscape, facilitating better decision-making.
Streamlit
Streamlit is an open-source app framework specifically designed for machine learning and data science teams. It allows developers to create beautiful, interactive web applications quickly and with minimal coding. By writing simple Python scripts, developers can use Streamlit to turn data scripts into shareable web apps. Streamlit's straightforward API supports rapid prototyping and enables data scientists to create and publish complex interactive dashboards with ease. This framework integrates seamlessly with major Python libraries such as Pandas, NumPy, and Plotly, making it a highly versatile tool for building applications that require real-time data updates, interactive plots, and user input to drive analyses.