For us to achieve our aim we have collected data from the California Department of Education. We have taken schools from following district types into consideration: Unified, Elementary, High, Others. Altogether we have taken data 1,037 districts in California. We have considered Public schools only for our study and research work.
Below mentioned are the data sets used for the analysis:
Data collected from The Department of Education California link: https://www.cde.ca.gov/ds/sd/sd/
Enrollment Data link: https://www.cde.ca.gov/ds/sd/sd/filesenr.asp, we have only considered data from year 2014-2018, About it: The “Enrollment by School” is an open-source data maintained by the California Department of Education. The dataset contains data from the year 2007 to 2018. It only includes the primary enrollments. It does not include short-term enrollments. The dataset contains more than 100,000 enrollments that have happened from 2007 to 2018 and is a structured data containing 24 attributes such as, county, district, gender etc. But here we only took data from the year 2014-2018.
Dropouts by Race and Gender: https://www.cde.ca.gov/ds/sd/sd/filesdropouts.asp
Graduates by Race and Gender: https://www.cde.ca.gov/ds/sd/sd/filesgrads.asp
Graduates by ethnicity and School: https://www.cde.ca.gov/ds/sd/sd/filesgrad.asp , About it: This data collection consists of all students who graduated or dropped out during the year, regardless of when they started high school.
The data from The California Department of Education is raw, and needed to be cleaned and properly arrange to make sense out of it, also data was kind of code, to make it easily understandable, we have cleaned the excel files downloaded from the CDE website. And we have only used data from 2014-2018.
Our data was in multiple files, and to connect it, and make visualizations we used Tableau’s join feature, it was very helpful and hence we could connect multiple files based on CDS code and county names.
For visualization purpose we used Excel and Tableau. We were not required to do much analytics, as our main focus was to understand school completion and dropouts’ rates based on gender and ethnicity. Hence the graphs and visualizations created by tableau has been very useful to understand them, achieve our goal.
We did not only use tableau, we also used Microsoft excel for visualizations, because using excel and understanding the graphs made by excel were easier to understand.
As our audience will be the households of California, hence it was our duty to make it as easily understandable for people, hence the bar graphs.