The dataset used for this analysis was obtained from Kaggle using the Kaggle API. It contains various factors related to anxiety attacks, including:
Demographic details: Age, Gender, Occupation
Lifestyle factors: Sleep Hours, Caffeine Intake, Alcohol Consumption, Physical Activity
Health-related information: Heart Rate, Breathing Rate, Therapy Sessions
Anxiety-related severity levels
The dataset was downloaded using the Kaggle API, which provides programmatic access to datasets hosted on Kaggle. This approach ensures data integrity and allows for automation in data retrieval.
Source: Kaggle Dataset: Anxiety Attack Factors, Symptoms, and Severity
API Used: Kaggle API
API Name: Kaggle API
Website: https://www.kaggle.com
Core Endpoint: https://www.kaggle.com/api/v1/datasets/download/ashaychoudhary/
GET Request Example: https://www.kaggle.com/api/v1/datasets/download/ashaychoudhary/anxiety-attack-factors-symptoms-and-severity
LINK TO THE MAIN CODE: https://colab.research.google.com/drive/13B8Wh6TzyyKWOQT46pI46GfgQX0bI1qi#scrollTo=6ChrprXYZXCt
Missing Values (NaN) – Present in Physical Activity, Alcohol Consumption, Stress Level, Heart Rate, etc.
Inconsistent Categorical Values – Mixed capitalization in Gender, Occupation, Smoking, Family History.
Data Type Issues – ID should be integer; numeric columns have text ("NAN").
Outliers – Extreme values in Caffeine Intake, Alcohol Consumption, Heart Rate.
Duplicate Entries – Potential duplicates need checking.
Extra Spaces – Leading/trailing spaces in categorical values.
Numerical Formatting – Inconsistent decimal representation.
Column Header Standardization – Spaces and inconsistent naming.
Raw Data Link - Kaggle Dataset: Anxiety Attack Factors, Symptoms, and Severity
The dataset was cleaned using the following steps:
Handled missing values - Filled missing numerical values (e.g., Sleep Hours, Caffeine Intake) with the median.
Filled missing categorical values (e.g., Therapy, Gender) with the mode.
Standardized text formatting - Converted Gender ("Male", "male", "MALE" → "male")
Unified Therapy column ("yes", "YES" → "yes")
Removed duplicate records - Fixed Outliers using the Z-score method for Heart Rate and Breathing Rate.
Python_code: https://colab.research.google.com/drive/13B8Wh6TzyyKWOQT46pI46GfgQX0bI1qi#scrollTo=6ChrprXYZXCt
This dataset was chosen to analyze the factors contributing to anxiety attacks and explore correlations between lifestyle habits, demographic details, and anxiety severity. The primary objective is to identify patterns and key influences that might contribute to anxiety attacks, such as stress levels, caffeine intake, sleep quality, and physical activity levels. By gaining insights into these relationships, we aim to contribute to mental health research and awareness, helping individuals, researchers, and healthcare professionals understand how different variables influence anxiety severity.
To ensure data quality and uncover potential trends, an Exploratory Data Analysis (EDA) was conducted using various visualizations:
Distribution of Stress Levels – A histogram was created to observe the spread of stress levels among individuals. The KDE line provided a smoothed density estimation, highlighting the most common stress levels.
Caffeine Intake vs. Anxiety Severity (Scatter Plot) – This visualization explored how caffeine consumption varied across different levels of anxiety severity, categorized by gender.
Correlation Heatmap – A heatmap was generated to examine the relationships between multiple variables, such as stress levels, heart rate, sleep duration, and anxiety severity. This helped identify strong positive or negative correlations between factors.
Anxiety Severity Levels (Pie Chart) – The distribution of anxiety severity levels was visualized in a pie chart, showing the percentage of individuals falling into different severity categories.
Stress Level vs. Anxiety Severity (Hexbin Plot) – A hexbin plot was used to illustrate the density of data points in the relationship between stress levels and anxiety severity, revealing areas where individuals cluster.
Family History of Anxiety (Bar Chart) – A bar chart was used to compare the number of individuals with and without a family history of anxiety.
Occupational Distribution (Horizontal Bar Chart) – This visualization helped identify the most common professions among individuals in the dataset.
Sleep Hours by Gender (Boxplot) – A boxplot analysis showed variations in sleep hours across genders, highlighting the median, quartiles, and potential outliers.
Physical Activity vs. Anxiety Severity (Bar Chart) – This visualization examined whether physical activity levels had a significant impact on anxiety severity.
Gender Distribution Across Anxiety Severity Levels (Stacked Bar Chart) – A stacked bar chart represented the distribution of male and female participants across different anxiety severity levels.
These visualizations provided key insights into demographic patterns, lifestyle behaviors, and their potential impact on anxiety severity, forming the foundation for further analysis and hypothesis testing.
Python_code: https://colab.research.google.com/drive/13B8Wh6TzyyKWOQT46pI46GfgQX0bI1qi#scrollTo=6ChrprXYZXCt
This visualization represents the distribution of stress levels on a scale from 1 to 10. The histogram bars show the frequency of each stress level, while the overlaid KDE (Kernel Density Estimate) line provides a smoothed representation of the distribution.
This scatter plot visualizes the relationship between caffeine intake and anxiety severity, categorized by gender. Each dot represents an individual's caffeine consumption (mg/day) across different levels of anxiety severity (1-10), with colors distinguishing gender groups.
This heatmap illustrates the correlation between various anxiety-related factors, with color intensity indicating the strength and direction of relationships. Darker red signifies stronger positive correlations, while dark blue represents weaker or negative correlations between variables.
This pie chart illustrates the distribution of anxiety severity levels, with each slice representing the proportion of individuals at different severity levels (1-10). The color gradient enhances readability, showing how evenly anxiety severity is spread across the dataset.
This hexbin plot visualizes the relationship between stress levels and anxiety severity, where darker hexagons indicate higher data density. The marginal histograms provide additional insights into the distribution of both stress levels and anxiety severity across the dataset.
This bar chart represents the distribution of individuals with and without a family history of anxiety. The taller bar indicates a higher count of participants without a family history of anxiety compared to those with a family history.
This horizontal bar chart visualizes the distribution of occupations in the dataset, showing the frequency of individuals in different job categories. The highest count is for unemployed individuals, followed by engineers, students, and other professions.
This boxplot compares sleep hours across different genders, displaying the median, interquartile range, and potential outliers. The distribution appears similar across genders, with most individuals sleeping between 5 to 8 hours.
This bar chart represents the relationship between physical activity (hours per week) and anxiety severity levels (1-10). The relatively consistent values suggest that physical activity levels do not vary significantly across different anxiety severities.
This stacked bar chart illustrates the gender distribution across different anxiety severity levels (1-10). Each bar represents the total count for a specific anxiety severity, with different colors indicating the proportion of each gender.
The exploratory data analysis (EDA) provided valuable insights into anxiety-related factors through multiple visualizations. The stress level distribution revealed a concentration of individuals experiencing moderate to high stress. A scatter plot of caffeine intake vs. anxiety severity, categorized by gender, suggested no clear linear correlation but indicated variations in consumption patterns. The correlation heatmap highlighted significant relationships between factors like stress, heart rate, and anxiety severity. A pie chart of anxiety severity levels showed a fairly even distribution, with more individuals experiencing moderate to high severity. The hexbin plot of stress vs. anxiety severity illustrated a strong clustering of individuals in higher stress and anxiety levels. A bar chart on family history of anxiety indicated a larger proportion of individuals without a family history of anxiety. The horizontal bar chart of occupations highlighted the prevalence of certain professions in the dataset. A boxplot of sleep hours by gender showed similar sleep patterns across genders, with minor variations. The bar chart of physical activity vs. anxiety severity suggested no significant differences in physical activity levels across anxiety severities. Lastly, the stacked bar chart of gender vs. anxiety severity indicated relatively balanced distribution across severity levels, with slight variations between genders. These visualizations collectively provided a comprehensive understanding of anxiety attack factors and their associations