DATA Analytics Projects

geographical position of universities in CANADA

2 weeks

This project on inspecting the percentage of each province’s population in Canada that are university or college students and also on showing the location of a number of universities throughout Canada. The aim of the project was to see the geographical spaces among the universities and probable position for new ones.

Among the findings, I found that proportion of students has only modest variations throughout Canada. Each university is represented by a point, and the size of each point corresponds to the number of students enrolled at that university. Inset into the map is a close-up view of Southern Ontario, where there is a high density of universities.

Libraries used: geopandas, plotly, matplotlib, pandas, mpl_toolkits, inset_axes.

Top Baby Names in UK

1 week

This project was based on a data set containing the rank (popularity) and count of baby names in England and Wales from the period 1996 to 2020. This was a re-creation the interactive experience in the “Top baby names” visualization published by the UK Office for National Statistics, which can be found here:

https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/babynamesenglandandwales/2020

analyzing business needs of a landscaping company

1 month

This project was from a landscaping company based in St. John’s, Newfoundland, Canada. The company offers a variety of landscaping services, and accepts work from residential and commercial customers in the St. John’s metropolitan area. The company’s broad goal was to answer the question, “How can our company be improved?”

To answer to the company’s broad goal, I identified two sub-goals within the data that I plan to convey. They are:

1) Is the company providing quality service in all the types of services they are providing? How can the service quality be improved?

2) Does the company has a good combination of work-schedule with the employees throughout a whole month? Can the work-schedule be more balanced and developed?

Libraries used: matplotlib, pandas, ipywidgets, IPython, seaborn. 

AthletEs in Olympic Games

2 weeks

Based on a data set containing information about all the athletes that have participated in the Olympics up until the 2016 games. The analysis included several goals, some of which are:

1) Show the top 10 most decorated Olympians. That is, the Olympic athletes that have won the highest total number of medals. Each bar is color coded according to the number of gold, silver and bronze medals won by that athlete.

2) Display the number of athletes that competed in the 2012 (dark blue) and 2016 (light blue) Olympics from a set of different countries. Additionally, show the overlap on the number of athletes that competed in both Olympics, and how many competed in just one of these Olympics.

trend of world population

1 week

Based on a Population Database containing population trends of all the countries in the world from 1950 to 2020. The analysis had several goals, some of which are:

1) Yearly population change per continent over the past 60 years.

2) The population of the top 10 most populous countries in 2020.

3) Display the median age and percentage population growth from the year 2019 to 2020 for all the countries of the world by color coding them according to their continents.