Our original datasets were sourced through surveys conducted by OSMI on the overall attitude toward mental health within the tech industry. We compiled 8 years of survey data, from 2014 to 2021, which consisted of roughly 3,500 rows and ~82 columns.
Given the original data included over 82 columns, the cleaning process focused heavily on condensing into necessary predictors and altering misspelled rows. There were several survey questions that were omitted the following year, we aimed to compile each survey to fit into a cohesive dataset.
We used excel to combine and clean the data into a readable CSV. Using excel's replace function we were able to convert rows into numerical data for the modeling process. Along with data modeling and visualization in python, we utilized Tableau and ArcGIS to create interactive visualizations and spatial analysis. In order to make this data more digestable, we wanted to provide several visualizations and understandable graphs.
○ year: The year in which the survey was conducted
○ age: Participant's age
○ gender: Participant's gender
○ work_country, work_state, work_city: Work location
○ currently_diagnosed: Participant's current mental health diagnosis
○ family_history: Participant's family history with mental illness
○ treatment: Participant's treatment status.
○ work_interference: Whether or not participant's mental health interferes with their work.
○ employeed_by_tech: Employed by a tech organization
○ mentalhealth_help_options: If participant is aware of mental health options provided by employer
○ coworkers, supervisors: If participant is comfortable discussing mental health with coworkers or supervisors.
○ mental_health_importance: Participant's opinion on the importance of mental health to employers
○ healthcare_benefits: Healthcare/Mental Healthcare benefits provided by employer.
○ disorder_type: Explains which type of disorder each participant has (2014-2018)
○ anxiety, mood, psychotic, eating, adhd, personality, ocd, ptsd, stressresp, dissociative, substance, addictive: Provides information on the disorders of each participant.
○ industry_improvements: Participant's detailed opinion on how the industry can improve in terms of mental health.