A modular Python program that provides users with several options to display surveys on the Google form and satisfaction questions to doctors by conducting statistics on questionnaires, as well as performing all data analysis on the health dataset. In this program, the user can choose the following eight options:
Health dataset overview:provides users with a brief description of the physician and the health dataset, similar to the description on themainpage of this website.
Data-Driven Questions and Predictions:This gives the user a brief description of the data-driven questions about the health dataset (that is, what the answers to these questions will be). It is similar to the description on the front page of this website.
Basic statistics on smoking:This gives user the basic understanding about smoking numbers and groups, as well as death rate for the smoking based on the calculus. It provides the user with another set of options to choose from based on the options give in the program created in Assignment 7.
Simple visualization on smoking: this give user the visualization about the smoking data analysis. It provides the user with another set of options to choose from based on the options give in the program created in Assignment 8.
Survey Analysis:If user choose this option, it will processing and cleaning the data set. After removed all errors, it provides the user with another set of 3 options to choose from based on the option give in the program created in Assignment 9.If user choose option 6 or 7 before this option, then the data will no need cleaning again, it will directly provides the user with another set of 3 options to choose from based on the option give in the program created in Assignment 9.
Data Set Analysis: If user choose this option, it will processing and cleaning the data set.After removed all errors, it provides the user with another set of 10 options to choose from based on the option give in the program created in Assignment 11 and 12. If user choose option 5 or 7 before this option, then the data will no need cleaning again, it will directly provides the user with another set of 10 options to choose from based on the option give in the program created in Assignment 11 and 12.
Answers to Data Driven Questions:If user choose this option, it will processing and cleaning the data set. this option will output the the conclusion for the Data-Driven Questions and Predictions.
Quit: This allow user to quit the program.
General solution: define functions to open and read data from CSV files, clear the panda data stream deleted from CSV data, and get and check the validity of user input. Define functions responsible for outputing data set related information, data-driven questions and conclusions, running health statistics or investigating python programs, generating and outputing summary tables, pivottables, and/or cleaned data visualization. DataFrame based on user input. Use loops and conditions to help navigate menus as needed.Program description:Program description: A Python program that provides information about health datasets and related data-driven issues.The conclusive answers to data-driven questions come from data analysis conducted within the program.Consider external findings. Users can browse menus to access different information.Data/questions or exit procedures regarding the healthhazards of smoking. Next comes the output of the information,Run or investigate python in relation to data/problems or solutions to problems.Generation and output of summary tables, pivottables, and/or data visualization for the project,Clean up the data aframe based on user input. This program defines and calls functions to get the user input and check input validity, open and read data from CSV file to panda DataFrame,Clean up the data based on the values in the variables and eventually call all other functions through the main function.
welcome_msg: This function outputs the name of the topic, data-driven question and a welcome message to the data analysis program.
get_choice: This function outputs the list of eight choices for the main menu and calls the check_choice function to obtain a valid input from the user. It returns the valid input back to the main function that called it.
check_choice: This function asks the user for their choice and then check that the user's input is valid (i.e., entered a valid digit in between the minimum and maximum choice options for the menu).
overview: This function output an overview of the topic and data set.
dq: This function outputs the data-driven questions and predictions of the topic.
basic_stats: This function provides the user with another set of five options described in the Basic Statistics.
simple_visualizations: This function provides the user with another set of four options described in the Simple Data Visualizations.
survey_analysis: This function provides the user with another set of four options described in the Survey Analysis above.
data_analysis: This function provides the user with another set of ten options described in the Data Set Analysis above.
process_df: This function reads thedatasetinto a DataFrame and cleans it outputting the errors in it as well as how these errors have been resolved. It returns the cleaned DataFrame back to the main function that called it.
conclusion: This function outputs the answers to the data-driven questions using data statistics, pivot tables and summary statistics, and visualizations as evidence.
main: This function setups the program and manages calls to functions defined for specified options.
Expected Output: The program will produce several outputs described above for each option. You can view the Final Project: Data Analysis program, whose output is shown in the following code. This indicates that the program runs as expected and calculates the correct output described above.