This project has been carried out using tools such as Pyhton, Visual Studio Code, Power Point, Git Hub, together with the import of different libraries such as Numpy, Pandas, Matplotlib, Seaborn, Plotly in order to perform data mining, data wrangling for analyze a Kaggle dataset on alcohol consumption in adolescents and thus, be able to raise and analyze hypotheses obtaining information that could shed light on that problem.
NOTE: When I interpret the graphs with the hypotheses, I will always talk about correlation and never of causation. In addition, the conclusions to such hypotheses are only possible explanations or interpretations since the absolute truth does not exist and may be influenced by a series of contaminated or strange variables, unexpected to the investigator, that modulate the relationship between themselves.
We can appreciate that there are some outliers in 20 and 21 years old.
The gender distribution is quite uniform: We can appreciate that the 53,08% of the students are Female while 46,92% are Male.
This is the age distribution after droping outliers. We can see that it is quite uniform.
Alcohol consumption decreases with age: After performing a scatterplot, I can point out that, in my sample, it can be seen how, indeed, it is decreasing. This is especially relevant on weekends. The explanation may be due to the fact that, with age, responsibilities grow, adolescents mature and no longer need so much excessive social approval derived from continuous social comparison with the peer group. Priorities change, the reference and membership group becomes smaller.
Initially from what we can hear in the media, we can think that women consume more alcohol than men but the countplot graph shows us that this is not the case. We can observe that, both in daily consumption and in weekend consumption, the frequency with which they consume more alcohol is greater in men than in women. This may be due to the fact that the basal ganglia of men generate a greater amount of Dopamine than those of women and, consequently, a greater search for this pleasant effect.
The swarmplot graph from above shows that both daily and weekend alcohol consumption is higher in those students whose tutor is the father. It could be explained that parents, traditionally, in a generalist way, have exercised more authoritarian parenting styles of education.
The swarmplot graph shows that both daily and weekend alcohol consumption is higher in those students whose tutor is the father. It could be explained that parents, traditionally, in a generalist way, have exercised more authoritarian parenting styles of education.
Taking into account the catplot graph, it is evident that alcohol consumption is higher in those who live in urban areas. The explanations for this conclusion can be based on the fact that in urban areas there are more places to go to buy alcohol since, simply, there is a greater proportion of students who live in these areas and therefore they can meet more easily to consume alcohol . In addition, journeys can be made more easily thanks to public transport.
We could draw the aforementioned conclusions based on the fact that those students whose parents are separated have less vigilance or have gone through a complicated situation that can lead them to use maladaptive coping strategies. The reality is that, by analyzing this graph, I have had the opportunity to see that the data is unbalanced. Said problem will be analyzed in section 6 of this report.
Taking into consideration the sunburst, we can find that those students who consume less alcohol, want to study a higher education. The explanation is simple: the lower the alcohol consumption, the more conserved are the cognitive functions of the Dorsal CP involved in decision-making and future planning.
Romantic relationships are a protective factor for less alcohol consumption:An argument in favor may be that couple relationships function as an emotional and / or instrumental support, preventing students from engaging in maladaptive coping strategies in the face of stressful life events.
Those students who spend more time using internet, consume more alcohol. However, before jumping to conclusions, we must remember that the data is unbalanced. Consequently, most students have access to the Internet.
In order to explore more relationships between variables in a multivariant analysis, let´s do a Heat Map!
I could affirm that there aren´t significant relationships between different variables because no one is more than 0.5 or 0.7 However, it would be intersting to explore some relations.
Although I can not assume that there is a relationship between going out with friends and consume weekend alcohol because the correlation is not statistically significant, I could say that taken into account the graphic it could be a pattern because we can appreciate that the more students go out, the more they consume alcohol.
The hypotheses that have been analysed shed light to understand the problem of alcohol consumption in students. However, it is really difficult to modify these variables. A possible solution may be to carry out a psychoeducational psychosocial intervention in the form of a training course with both school students and their families.