DSA2101

Essential Data Analytics Tools: Data Visualisation

AY21/22 Sem 1 Russell Saerang

Lecturer

Prof. Vik Gopal

Assessment

Module Difficulty

The module difficulty actually depends on whether you know R well or not. It is very useful to understand how to read R help pages and the main star of this module, dplyr and ggplot2.

For the first few weeks you are refreshed upon the basic R commands, and how you should try to program functionally instead of declaratively, i.e. use the apply command family such as sapply and lapply instead of the well-known control flows like for and while or even repeat.

Next up is about importing data, where it can be from an HTML page, some API, excel files, you name it. This just opens our mind to what kinds of data storage are there out in the wild. You will also touch the concept of simple feature objects (sf) which is useful for spatial data.

Starting the mid of the semester we were introduced to dplyr which is very convenient to use when manipulating data. It feels like method chaining at its finest with the help of the pipe operator.

With all data importing and manipulating skills in hand, it's time to output and visualize the data as the module title says, which is using ggplot2. There are many ways of visualizing the same data, and we also went through the rationale on visualizing different types of data, such as considering the data-ink ratio, the accuracy of the data, and how the visualization is approachable to layman people.

As a supplementary material but not examinable, we were also introduced to Tableau on the last week, and how the basic tools work.

Module Workload

The workload below is measured weekly.

Prof decided to use Microsoft Teams as the main communication platform, which I had no complain about. He will definitely provide the recording, but his lectures are just so calming I'd actually come for it. He also touched on some extra information such as some statistical methods that might be useful for further modules. It's a good idea to graze on these so as to give us awareness of their existence, e.g. how does the loess method work, instead of just using it in lm's method.

He also has a website that basically displays his code live, and we can comment on the code that he's currently explaining or copying them for easy reference and learning. It's just such easy to work around with these.

Basically, for every week, there will be a tutorial worksheet released to be done in a week by submitting Rmd files (R markdown). As someone who's been familiar with GitHub Markdown files, I had no trouble familiarizing myself with Rmd files, but learning this becomes crucial when you have to always submit everything in this file format.

The one-hour session is to simply discuss about these problems for you as a help to solve these questions. It is also recorded.

Personally, the questions take a while to solve (at least 3 hours?) but I enjoy exploring these datasets provided beforehand. There's actually more insights you can gain the moment you explore them.

DataCamp Assignments

Prof used DataCamp for non-tutorial submissions which basically is an extra guide on how to work with everything taught in DSA2101. It is a comfortable tool and you don't have to worry about being wrong here because you are just graded for the sake of completeness. Also very chiong-able, how about finishing all ten in the end of Week 0 :)

Exams and Personal Opinion

Hands up the mod I had the most fun during the semester. Had ups and downs during debugging but isn't that the way of life.

Midterms was a one-day take home thing. So, you can ask for clarification during the next 24h of the paper's release. Difficulty-wise, it's medium difficulty and definitely can be searched by R's help page as well.

Finals was held F2F and the difficulty was easier than I expected, thankfully. There was a moment where I realized understanding the help page in a short amount time is a useful thing, since there was one package rarely used I only came upon it during exam yet I had to use it.

Overall, I learnt a lot from this module and how do we deal different kinds of datasets as a way of adapting.

Expected grade : A

Final grade : Somewhere around expected

(Disclaimer: Prof/Tutor for the module may not be the same every semester.)