The project is about cleaning 4 sets of CSV files with covid-19 related data in Singapore and making some columns that may be useful to the government and as well as making charts on power BI for analyasing.
This is the layout of my final project of Logic and Mathematics (LoMa) and Data Visualisation and Analytics (DaVa).
Done with Knime Software
We Start of first by joining the 4 csv sheet that had different columns for the same years eg: 2020-A and 2020-B
then we join both sets 2020 and 2021 data and concatenate the rows which is then place to a column filter which removes all the missing columns and as while as remove some rows where there is partial data with the missing value node.
We change some of the of the numbers to string in order to for editing . We then next we remove some of the negative values to a column called still hospitalized .
Also we rename all the columns so that the the first letter is capitalized.
On these set of nodes we cleaned up a phase column making it more specific example when phase 3 not all of them were during heightened alert.
After that we convert still hospitalized back into integer. Then we use a missing value node to fill the gap with the value zero.
From this onwards is mostly making new columns that may be useful for analyzing .
The forth node on the picture , we make a row to tell us about the transmission type of the virus, e.g.: (Local , Imported or both ) everyday.
On the next node we made a new column to make total death by adding daily death (due to covid) and those that were tested positive but passed not due to covid and as well as for unrelated death.
After that we decided to calculate the amount of people that still with covid
For this last row of nodes was just to make one column for daily people admitted (to hospital from covid).
Firstly I added a duplicated row of (still hospitalized) column but all the value is shifted down by one row .
Secondly the missing value node was to fill a 0 in the first row in order fill up the missing slot.
After that we use a math formula node to subtract the still hospitalized with both daily discharged and the lagged column of still hospitalized to get how much people that were newly admitted to the hospital.
Followed by cleaning up the data by changing the negative value into 0 (as its possible the a lot of people might be discharge with 0 people admitting on a certain day causing a negative output). And as well as removing the lag column that has no other use.
One the last set of node our team decided to make a k-means clustering by normalizing and denormalizing the data into to then add make a bin column for the days were there we no cases locally (0) , local case (1-15), increased local place (16-30) and severe local cases(30 to infinite) . [note this was done before when 3000 case a day was normal so the scaling may seem not to scale]
Then on the last node we had it to produce a csv file as the output for the final data and would replace the file if there is a duplicate in the same directory that was set.
For the last part of the project we were suppose to come up with 5 insights with the data they gave us, either using Knime or power Bi. Our team opted for Power Bi is they agreed that it was easier to represent the insight visually using it . We had a teammate that made four of them and I made the final insight. So each of us (5 teammates total) recorded and showed the insights that we have and have it all sent to me so i could edit in the names as well as combine into a single video so we could do it in our time.
Here is the link to the video we made.
I was the leader of the team base of the teams expressions as I guess the team had no idea how to start of the assignment and I kick start the progress of our team which gradually lead me to become the leader as we needed one for summiting the work anyways
I would say I did most of the work as most of my groupmate were from O-levels while I graduated from higher nitec, so I was more accustomed to collage life as it was their first year and semester. The other group mate from ITE as well was seem to have trouble communicating with the team as well as we did not confirm on which medium we communicated in. But in the last week we mostly had settled most of our issues with each other and quick touch up the remainder of the project. (The hard part was mostly done by then)
But in the end I decide to do most of the data handling and making of new columns. while I mostly ask my teammates to fill up the word document as much as they could. In the end the outcome was not exactly what I have expected as a group work as it seems better to have done that solo or maybe that generalization on of IT on the first year is quiet hard to catch up to the new pace of poly or maybe people aren't interested in certain topic leading to them having lower effort put into it and as well during that period we had 3 plus major project to summit and others. And also this made me think to better pace myself for poly as I did not expected a huge workload to suddenly come up. Overall I am satisfied with my performance in that project