In this section, you can explore two projects that were completed as part of my Python learning journey.
The first project was completed as part of the Data Analyst with Python program on the DataCamp platform. It involved delving into the rich and insightful "The Nobel Prize" dataset, spanning an extensive period from 1901 to 2016. You can find more details about this project at the following link: https://www.kaggle.com/code/brusnikina/eda-nobel-prizes-1901-2016
The second project was completed on Kaggle as a test assignment to enhance my understanding of the "numpy" library. You can also explore this project by following the link: https://www.kaggle.com/code/brusnikina/eda-with-pandas
Below are described the key stages of work for both projects.
The Nobel Foundation has made a dataset available of all prize winners from the start of the prize, in 1901, to 2016. Let's load it in and take a look.
Looking at all winners in the dataset, from 1901 to 2016, we can see that Male and the USA is the most commonly represented.
The USA became the dominating winner of the Nobel Prize first in the 1930s and had kept the leading position ever since.
Overall the imbalance is pretty large with physics, economics, and chemistry having the largest imbalance. Medicine has a somewhat positive trend, and since the 1990s the literature prize is also now more balanced. The big outlier is the peace prize during the 2010s, but keep in mind that this just covers the years 2010 to 2016.
We see that people use to be around 55 when they received the price, but nowadays the average is closer to 65. But there is a large spread in the laureates' ages, and while most are 50+, some are very young.
We also see that the density of points is much high nowadays than in the early 1900s -- nowadays many more of the prizes are shared, and so there are many more winners. We also see that there was a disruption in awarded prizes around the Second World War (1939 - 1945).
Who are the oldest and youngest people ever to have won a Nobel Prize? The answer is below:
Oldest winner: Leonid Hurwicz
Youngest winner: Malala Yousafzai
The task is to use Pandas to answer a few questions about the Adult dataset.
Let's take a look at our dataset.
Within the dataset, certain columns contained unwanted whitespace. To enable necessary data manipulations, we removed these spaces from the columns.
1. How many men and women (sex feature) are represented in this dataset?
Male 21790
Female 10771
Name: sex, dtype: int64
2. What is the average age (age feature) of women?
36.86
3. What is the percentage of German citizens (native-country feature)?
0.42%
4. 4-5. What are the mean and standard deviation of age for those who earn more than 50K per year (salary feature) and those who earn less than 50K per year?
salary std mean
<=50K 14.020088 36.783738
>50K 10.519028 44.249841
6. Is it true that people who earn more than 50K have at least high school education? (education – Bachelors, Prof-school, Assoc-acdm, Assoc-voc, Masters or Doctorate feature)
False
7. Display age statistics for each race (race feature) and each gender (sex feature). Use groupby() and describe(). Find the maximum age of men of Amer-Indian-Eskimo race.
The maximum age of men of Amer-Indian-Eskimo race is 82.
8. Among whom is the proportion of those who earn a lot (>50K) greater: married or single men (marital-status feature)? Consider as married those who have a marital-status starting with Married (Married-civ-spouse, Married-spouse-absent or Married-AF-spouse), the rest are considered bachelors.
salary <=50K >50K
status_cat
Alone 0.935546 0.064454 Based on the available data, married individuals tend to earn higher incomes
Married 0.563080 0.436920 exceeding $50,000 compared to their single counterparts.
9. What is the maximum number of hours a person works per week (hours-per-week feature)? How many people work such a number of hours, and what is the percentage of those who earn a lot (>50K) among them?
The majority of individuals, regardless of their salary level, tend to work fewer than 99 hours per week. Additionally, the proportion of individuals working longer hours (99 hours or more) is relatively small, particularly among those earning higher salaries.
10. Count the average time of work (hours-per-week) for those who earn a little and a lot (salary) for each country (native-country). What will these be for Japan?
On average, individuals earning higher salaries in Japan tend to work slightly longer hours per week compared to those earning lower salaries. However, it is important to note that these averages may not capture the full range of variation within each salary group and country.