VAST-BNL-MC2

Entry Name: "BNL-BS-MC2"

VAST Challenge 2017

Mini-Challenge 2

Team Members:

Bo Sun, Rowan University, drbethsun@gmail.com PRIMARY

Rumeel Jessamy, Lincoln University, rumeel.jessamy@lincoln.edu

Sungsoo Ha, Stony Brook University, hasungsoo@gmail.com

Wei Xu, Brookhaven National Lab, xuw@bnl.gov

Student Team: No.

Tools Used:

We chose Tableau for visualization due to limited time. We used Excel and wrote Python script to do necessary data regularization and analysis.

Approximately how many hours were spent working on this submission in total?

We started this project in mid June with only one student developer as primary participant. The total working hours are about 200.

Video

https://www.youtube.com/watch?v=wN1ZMKZoVhk


Questions

MC2.1 Characterize the sensors' performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.

According to the sensor data, there are total 9 sensors performed readings in April, August and December of 2016 in the wildlife preserve. Each sensor read 4 different chemical releases at every single hour, which should generate 216 readings (24 x 9) per day per chemical and 20,056 readings in total for each chemical. However, we found less number of record/readings as showed in figure 1. Using a python program, we found that a number of readings are completely missing at twelve or one a.m. for all sensors. This includes following 5 days: Apr. 2nd, 6th, Aug. 4th, 7th, and Dec. 2nd. In addition, on Aug. 2nd and Dec. 7th, there are empty readings for most of sensors as Figure 1 shows. Table 1 provides the summary of these missing data for each sensor and chemical.

Fig. 1. Total Number of Records Per Chemical

In Figure 2, we plotted the count of hourly reading for each chemical at each monitor. It is very clear that there are also times that sensor 3 performed double readings presented as a larger square in Figure 2 and no reading presented as blank in Figure 2 for the other chemical at the exact same time. Therefore, we did not count this missing data in table 1 because we think that one of the double readings presents the other chemical release. So we removed all the repeated chemical readings and blank readings from sensor 3 in the following analysis.

Fig2. Double Reading and Blank Reading

Additionally, we plotted the monthly heatmap representing the daily release amount and pattern for each chemical at each sensor. In Figure 4, our x-axis is arranged in month with daily scale, while the y-axis organizes per chemical observation for each sensor. We observed that sensor 3 consistently captures larger amount of all chemicals in three months presented as larger square in figure 4 comparing with other sensors (so as sensor 4 in month Aug and Dec). Even when there is no wind blowing to the sensors according to figure 3 (where gray circle indicates wind direction in y axis along with time in hour in x axis), there is still a consistent level of release. We conclude it should be the chemicals from the environment.

Fig. 3. Sensor and Wind View in Aug. and Dec. for Sensor 3 and 4.

Fig.4. Monthly Heatmap for each sensor per chemical

Based on the provided dataset, we can conclude that all nine monitors are not running at maximum efficiency. This can be due to a variety of factors such as poor quality electronic components, instability over the temperature range, or inadequate engineer manufacturing. For the missing data this could be due to the monitors simply not picking up readings from any chemical for that hour of the day, or they could have shut down all production in all five factories at that time.


MC2.2 Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?

Limit your response to no more than 6 images and 500 words.

There are four chemicals, AGOC-3A, Appluimonia, Methylosmolene and Chlorodinine, that are detected by the sensor group. In order to see the chemical release pattern, we generated figures below for the amount chemical release per day and hour, where x axis presents time in day (figure 5) or hour (figure 6) and y axis presents chemical readings. Chemical release of Appluimonia and Chlorodinine are consistent with no dramatic increase (see figure 5). Chemical release of AGOC-3A has many peak readings that are almost 19 times more than regular release of it and they are detected at both day and night in the 24-hour period (see figure 6). Chemical release of Methylosmolene also has many peak readings that are approximately 10 times over than the regular release of it, these peak readings are only detected during evening hour between 10pm-5am (see figure 7).

Fig. 5 Consistent chemical release for Appluimonia and Chlorodinine

We can also observe, AGOC-3A and Methylosmolene have more release than other two chemicals. In figure 7, we can clearly see the release pattern per hour. In order to have a compact view, we summarize the pattern in a weekly manner. It is clear that between 6am to 9pm, the most captured chemical is AGOC-3A. In contrast, between 10pm to 5am, the most captured chemical is Methylosmolene.

Fig. 7. Chemical Release Pattern Per hour: AGOC-3A leads chemical release between 6am-9pm, while Methylosmolene was captured the most during 10pm to 5am

The most significant issue we saw was how many repeated readings there were for chemical AGOC-3A. This issue can be seen in Figure 8 where the blocks are larger than normal. When we compared that specific monitorís time and date with the other chemicals; we found that the reading for Methylosmolene was missing for that exact same time and date. This issue can be seen in Figure 8 where blank space is located. This same pattern of AGOC-3Aís repeated readings and Methylosmoleneís missing reading happened frequently for all three months in the dataset. We think one of readings regarding AGOC-3A is for Methylosmolene. As we could not distinguish the readings between the two chemicals, we removed both readings from this analysis.

Fig.8. Double reading for AGOC-3A and missing reading for Methylosmolene at exact same time

MC2.3 ñ Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

Limit your response to no more than 8 images and 1000 words.

We define the possibility of a factoryís chemical release most likely contributing to a sensorís reading by relating with wind direction at given time. The value w can be considered as the unit vector of wind direction, and the value v can be considering the unit vector from factory to a sensor. The contribution is calculated as a dot product of the two vector as amount = dot(w, v), which is equal to the cosine of the angle between two vectors. An example of this can be seen in the figure 9.

Fig. 9. the amount of contribution of a factory to a sensor

If two vectors are perfectly aligned, i.e. the location of a sensor is along the wind direction, the amount is 1 or zero angles between two vectors. As it is deviating from the perfect alignment (i.e. non-zero angles between two vectors), the amount is reducing by following cosine curve. Once it reaches 90 degrees, the amount is zero. In other words, the factory does not give any effects to the sensor. A negative value can be considered as zero effects.

Please note, we chose this simple model without considering the wind speed and the real distance in miles is because we didnít know the map scale when we started the project. We noticed that missing information was added some time in July, which we did not have time to complete. But theoretically, if we know the real distance between factory and sensor, we can adopt the wind speed, and compute an accurate arrival time of the chemical spread. This patch computation will shift the time of observation but wonít change the linkage between chemical release and factories.


Then we filter out small chemical readings (<5) by selecting only ìpeakî readings, and we filter out small possibilities (<49%). This really helped us to remove environmental contribution as mentioned in question 1, and put our focus on only salient locations. We use two color trends to visualize that as in Figure 10 and 11. Bar charts show chemical release, while circles show possibilities. This is the overview in date time order. In the figure, the x-axis is the Date Time in hours, and the y-axis is monitor. We plotted both chemical readings and the possibilities in the same plot by adding the possibilities of factories as the second axis. The bar charts are for chemical readings which are colored in blue series, while the possibilities are plotted by circles whose colors represent factories and sizes represent possibility values.


Fig. 10. Possibility of a factoryís chemical release most likely contributing to a sensorís reading

Fig. 11. Descending order of possibility for each monitor captured data

By highlighting each chemical in turn, for example ìAppluimoniaî as in the figure 12 below, we quickly find that factory Indigo consistently has the most possibility.

Fig. 12. Indigo consistently has the most possibility of contributing to Appluimonia release