Of all the internship projects I worked on with my group, this particular one had us spend roughly four days of work just cleaning.
It was then I recalled what my Power BI facilitator told us. She said that sometimes, a huge aspect of data visualization is data cleaning.
The data set contains 6 tables in csv format totalling 28 million taxi trips in New York City from 2017 - 2020. The data set also contains pick-up/drop-off times and locations, distances, fares, passengers and more.
We worked with a 454 calendar with fiscal year, quarter, month and week and a 265 zone locations in NY having location ID, Borough, and service zone.
Now after cleaning and prep I got:
Total trip count = 26 million (out of the initial 28 million);
Average trips per day = 17.94k;
Average Fare collected = $12.28;
Average Distance covered = 2.86 miles;
Average trips per week = 123.61k.
The Taxi trip forecast seen in the visualization shows a decrease in taxi trips in New York. If we look at the continous data from 2017, we would simply understand that during the Covid pandemic, New York city Taxis recorded a huge decline in taxi fares.
So yes, our forecast is correct.
For 100 years, certain species of Sharks have attacked human beings in various countries.
I was priviledged to have access to the following dataset:
a) location of attacks;
b) activity during attacks;
c) victim information (which contained name, gender and age); and
d) shark species.
Since the job of a data analyst is to make sense of raw data, we got to work as a team.
The first thing we realized was how messy the raw data was. It needed a lot of meticulous cleaning. For instance, how do you clean a text that had a lot of body parts clung together and you're expected to report the exact parts attacked by the sharks?
Simply do a text analysis!
At the end of the cleaning, I got to work visualizing the cleaned data. I was able to arrive at the following:
a) the number of shark attacks per annum since 1900 (I used the 'density' mark to show this);
b) countries with the most shark attacks(I showed a map of all the countries of the world on my dashboard) ;
c) areas and location of shark attacks (mostly shores);
d) popular shark attacks time across countries (morning and evening time stood out);
e) the shark attacks that are most popular (the white shark) and of course the body parts with most attacks (obviously the legs. lol).
I really do appreciate the enormous work and data collation carried out by the World Health Organization.
Working as a data analytics intern offered me the opportunity to have access to millions of data across the world containing:
Number of Total Deaths;
Number of People vaccinated;
Number of Handwashing facilities recorded;
Number of ICU patients in different continents, and so on.
The job of a data analyst is to make sense of data such as the one from the WHO on Covid-19 and I hope I did just that - BTW I created new measures aside the raw data to get exact count by the way.
A part of my visualization included looking into the Key performance index for the countries of the world as recorded, while also comparing the death rates per million by the number of aged persons in such continents.
(Europe has the highest number of elderly persons; it suffices to know they've recorded the highest number of deaths too from Covid-19).
Africa will need to do a lot in terms of improving on the availability of handwashing facilities.
Together we can all fight this!
Mexico is the third largest country in Latin America after Brazil and Argentina. Mexico city is one of the most populous cities and metro areas in the world. With her economic booms and glooms, what if you are given data on:
a) Names of restaurants in Mexico;
b) Consumer preferences in those restaurants & its effects on ratings, and
c) Consumer demographics & its bias in the data sample
And then you're asked as a data analyst to:
1) Check for the demand and supply gaps that are worth exploiting in the market and
2) Check for certain characteristics worth looking out for assuming investment is adviced.
The video below are the insights I was able to generate and then visualize using #Tableau.
You'd notice that one of the observations are that students in the city patronize restaurants more than any other group.
I also noticed that restaurants that permit smoking everywhere do not get a good number of customers.
Please watch the video and #share with me your feedback too. Comments and engagements are welcome.
P.S. BTW From my analysis, single people patronize restaurants more than married people. Lol. You gerrit?
This video is one of my projects (Nigerian Covid-19 report/analysis) and please listen to what my observations are.
So far the #NCDC data shows that Nigeria has:
confirmed cases of 214,622,
active cases of 4,192,
discharged cases of 207,450 and
recorded deaths of 2,980.
The really interesting thing about this visualization is that I was able to contain my displeasure at the level of reportage made by the country when trying to compare the data I have with the total population of the country.