This Python program analyzes teacher salaries across U.S. states using data from the Bureau of Labor Statistics. The dataset includes occupation titles, industries, and salary figures by state. The program filters the data to include only teaching-related jobs in the Educational Services sector and computes the average annual salary (A_MEAN). This analysis helps identify national salary trends for teachers and provides supporting evidence for questions related to teacher compensation and cost of living.
The program uses the pandas library to handle and process tabular data. It is structured into five modular functions:
read_as_dataframe() loads the CSV dataset into a pandas DataFrame.
get_subset() filters the DataFrame for teaching jobs within the Educational Services industry.
compute() converts the salary column (A_MEAN) to numeric format and calculates the average.
output_stats() prints the relevant columns from the filtered dataset and displays the calculated mean salary.
main() manages the sequence of function calls and handles data flow through the program.
The data used in this program comes from publicly available state salary reports and cost of living indices. Three lists were created:
file_name: A string representing the name of the CSV file.
df: The original DataFrame created from the full dataset.
teacher_subset: A filtered DataFrame containing only rows with "Teacher" in OCC_TITLE and "Educational Services" in NAICS_TITLE.
avg_salary: A float representing the computed average of the A_MEAN column.
A filtered list of teaching occupations by state, showing each job title and its associated annual mean salary.
The average annual salary for teachers in the filtered subset, formatted with commas and two decimal places.