Evaluating Happiness
Supervised and Unsupervised Learning Techniques Exploring the Relationship Between Values and Wellbeing
Supervised and Unsupervised Learning Techniques Exploring the Relationship Between Values and Wellbeing
Client: NA
Goal: Identify features that predict life satisfaction ratings
Impact: Identify a way to examine factors that influence satisfaction that can later extend to investigating issues of retention and turnover/churn/attrition
Turnaround: 3 weeks (May - June 2021)
Programming Language: Python
Analytical Methods: Descriptive analytics, logistic regression, tree-based models, UMAP dimensionality reduction
Tools: Jupyter notebook, pandas, numpy, scipy, matplotlib, seaborn, plotly, sklearn, umap, ipywidgets
My roles: Independent project - decided on topic; proposed data questions; retrieved and cleaned data; designed analytical approach and methodology; performed analyses; presented findings
Analyzed the relationship between happiness and values using data from Wave 6 (2010-2014) of the World Values Survey
Compared the utility of a logistic regression classifier and a tree-based ensemble learning algorithm (adaptive boosting) for accurately predicting happiness ratings based on values data
Used feature importance scores to see which survey items were most useful for predicting happiness
Employed UMAP dimensionality reduction to determine and visualize how survey respondents’ answers within and across countries compare
For the most part, people are pretty happy (85%) and stay happy.
Unsurprisingly, respondents who were in good health, felt that they had more freedom of choice, who were satisfied with their household income, and who belonged in a higher income group were typically less unhappy.
Regardless of health or socioeconomic circumstances though, most respondents tended to rate themselves as happy more often than unhappy.
These features seemed to reveal more about correlates with unhappiness than happiness.
For the most part, people are pretty happy and stay happy.
Interestingly, there were still high proportions of respondents in poorer health, who had less perceived freedom of choice, who were less satisfied with their household incomes, and who belonged to lower-income groups in their societies who also tended to rate themselves as happy more often than unhappy.
These features seemed to reveal more about correlates with unhappiness than happiness.
Based on a combination of expressed values (what people find important and/or identify as ideal) and respondent’s recorded experiences (with poverty, safety, health, etc.), both models tended to identify three types of values that are most useful for influencing the likelihood that a respondent will be classified as unhappy or happy by either model:
Financial and physical security values/conditions
Companionship values/conditions
Freedom of choice and opportunity values/conditions
UMAP analyses allowed me to explore whether random samples of respondents from the same country were responding to survey items in the same way. In the 2D representation below, each point represents one respondent. Note that country labels were assigned after clusters were formed.
Tight clusters of points signify similarity in survey responses, personal experiences, and viewpoints. Points of the same color that are spread out represent variety in survey responses and thinking.
Respondents from Haiti tended to answer survey questions very similarly, and in ways that seem to be distinct from other respondents.
Their points (in orange) are pretty tightly clustered and away from other respondents.
Respondents from Nigeria show some solidarity and range in their within country responses; their responses (in yellow) are somewhat clustered together, but the radius is wide compared to people from Haiti.
When comparing responses from Nigeria with those from Ghana (in dark blue), we see that their respondents’ points overlap; this suggests some level of alignment or similarity in Nigerians' and Ghanaians' experiences and viewpoints captured on the survey.
In contrast to the geographical neighbors discussed above, we see very little overlap between responses from survey participants from India and Pakistan.
When comparing responses from Indian respondents (in yellow) with those from Pakistani respondents (in dark blue), we see that their respondents’ points are pretty far apart; this suggests major differences in their experiences and viewpoints that influence how they respond to survey items.
For more details and to see my code, check out this project's repo