In this section, you are required to describe about your data preparation and the data exploration technique performed. Elaboration about the data in visual and code must be provided. The example is provided below.
Example:
The Ministry of Health, Malaysia has released various format of records regarding dengue cases in Malaysia. This study utilizes records of dengue cases in each state and district locality from 2010 until 2015.
Figure 1: Example of the content in the records of dengue cases in each state and district locality from 2010 until 2015.
As shown in Figure 1, each of the data typically contains the number of cases in each locality and the outbreak's duration.
Another similar records are available but is an aggregated data where the total number of cases per state in each week is reported http://www.data.gov.my/data/ms_MY/dataset/jumlah-kes-keseluruhan-wabak-denggi-yang-telah-dilaporkan-di-malaysia/resource/bc0f93fa-f95b-4100-9d0f-a2d4c194b317 but is not chosen because the aim of the study is to focus on the relationship between the districts of the chosen state when modeling the number of dengue cases.
The temperature and rainfall data is obtained from http://sdwebx.worldbank.org/climateportal/index.cfm?page=country_historical_climate&ThisCCode=MYS
The dataset may be viewed using a spreadsheet software such as Microsoft Excel of Tableau. An example of data exploration on the dataset using Tableau is as shown below where the dataset is opened and then the Data Interpreter function is ticked. Once the Data Interpreter has finished its job, the attributes need to be renamed.
Figure 2: Example of data exploration using Tableau where the average total collected case and average duration is plotted by each state and year
The video shows how to explore the data as displayed in Figure 2.
The advantage of using Tableau is that it allows manipulation of data so the trends of data can be explored easily. Based on this, it can be seen that the records of average total collected cases and average duration of outbreak are distributed unevenly across the years and state. This indicates that modeling the weekly records relationship between the districts of the state require the data to be prepared separately for each state. This is so that stakeholders could prepare for better operation service such as beds and wards for healthcare, patient's food and medication amount prediction and health practitioners work schedule.