Tracking Deliveries on Campus

The Aim of the Project

The final project for our Quantified Self aims to track something beyond just ourselves. After a few rounds of voting, we agreed on trying to track the different aspects of deliveries on campus. Over the course of two weeks, the entire class worked to record any sightings of delivery drivers, and recorded data such as the company the delivery people worked for, what part of campus they were delivering goods to, what type of goods they were delivering, and so on. 

This was an interesting experiment, since we were no longer recording just ourselves, but others. No matter what approach you have to personal privacy, it is extremely important to be mindful about others, since you don't want to infringe on their privacy in any way. Since we were tracking delivery drivers, who for the most part did not know about this study, nor knew that they were being tracked, it was imperative that we didn't track markers that could be traced back to them. Navigating the data collection with this added complexity was an interesting challenge though, but it presented unique opportunities for learning and exploration.

We set up a survey on Survey123 with key components we wanted information on. 

Whenever you walk around campus, there's a high chance you'll come across a Talabat or Deliveroo driver, trying to find their way around campus to deliver the goods they're carrying. Often, they asked for help, so we thought it would be interesting to see how frequently delivery drivers on campus need help in navigating the campus.

Secondly, we wanted to track the location of the drivers, to see where people were ordering stuff from the most. Since Survey123 does this for you automatically, there was nothing to fill out here. However, often you see drivers far away. To make sure the location data wasn't misleading, we added a few extra location markers. First, there is a check to see if you are close to the delivery person. If so, then the location recorded by the survey is actually the location of the delivery driver, which is what we want. In the case that the delivery driver is far away from us but still withing viewing range, we can select the option that says that we are not next to the delivery person, so that we can use the manually entered location data.

On that note, the form requires us to enter where the delivery person was going. This information is obtained by being at the site of delivery, or asking the person where they are delivering to. Personally, I mostly sighted people when they were making deliveries, or just finished making deliveries, so this field was pretty easy to enter for me.

There was a temporal element as well, which was recorded automatically at the time of collection.

Lastly, we wanted to see what kind of goods are being delivered. We decided on a few major categories, like prepared foods, water, groceries, laundry, and other for companies like Amazon and Noon.

Collecting Data

To collect data, we opted for a centralized system known as Survey123, which is a form-centric data collection platform that integrates with ArcGIS, which is a mapping and spatial analysis software. Using Survey123 was key, as it is particularly useful for tracking location data, which I believe was key to our experiment.


The Survey Form

The Collected Data

Once we had enough datapoints, we downloaded the dataset that all of us had spent the last two weeks collecting data for. The data was stored in a CSV file, which looks like this after importing it into a Pandas Dataframe in Python.

Of course, I removed sensitive information, like usernames of people who has collected the data. Interestingly, this dataset contains data of who the person is, and their location history over the past few weeks that is captured while they are tracking delivery drivers. In a funny way, this sousveillance experiment ends up being surveillance of ourselves, as we are providing information about us e.g. the locations we visit, what times we are out, and so on.

Data Anomalies

There were some anomalies, particularly in the geolocated data. When I mapped the data, I realized that some datapoints appeared on the Gulf of Guinea. We discussed this in class, but the reason that this appears is because the app wasn't able to track geolocation data. Because of this, it defaults to the (0,0) coordinates, which are the coordinates for Null Island. To continue with my analysis, I exclude these data points when looking at spatial data, and just focus on data points on campus. 

Fig 1: Null Island

Company Trends

The first thing I wanted to see was what delivery companies were spotted the most. This would give us an insight into what types of things are being ordered on a daily basis, which might give us some information about people on campus' purchasing preferences.

Using the location data, along with the data about the delivery person's company, I was able to see which companies people are ordering from the most in different parts of campus, and what the overall demographic of the deliveries was.

Fig 2: Company Trends in the spatial dimension

As we can see in the figure about, it seems as if Deliveroo and Talabat drivers as seen the most often. These companies deliver prepared food items, and our data might lead us to believe that people are mostly ordering food. This is made clear with the pie chart below:

Fig 3: Share of Deliveries by Company

Consumer Trends

Deliveroo is the clear winner, having around 50% of the total delivery share on NYUAD campus. Talabat is close by, and the other companies share little slices. Since Talabat and Deliveroo are prepared food delivery services, we might be able to conclude that people on campus are ordering food the most out of anything. 

Just to confirm this, I cross checked with the other data that we collected. Conveniently, we collected field about a "delivery type". The count plot on the right shows that prepared foods really are the majority of the deliveries made on campus.

Fig 4: Share of Deliveries by Type

Spatial Trends

By plotting a heatmap of the deliveries using the geolocated data, we can identify several hotzones. Delivery people are sighted the most in the A5/A6 residential area, and also heavily sighted in the A2 residential block. There are a few deliveries across the map, but the majority of deliveries are concentrated in these three regions.

Fig 5: Overall Spatial Trend

Is the data misleading?

Having gone through the data and analyzed different aspects of it, one question I have is: does the data really tell me what it seems to be telling me. We came to a few conclusions during this experiment, such as the majority of deliveries on campus are food deliveries, and that they are concentrated in the A2, A5, and A6 residential blocks.

Are all of these conclusions correct? Are any of these conclusions correct? What are we missing?

Fig 6: Highline vs Ground

Secondly, it seems unlikely that there is such little share of other deliveries on campus, other than food deliveries. Of course, that's what we come across most often, but we have to contextualize the data that we are collecting. Packaged food delivery drivers have to hand deliver their goods to the person directly. This means that they are in residential areas quite often. On the contrary, delivery services like Amazon and Aramex rarely deliver directly to the person. Instead, they deliver to the mailroom, from where you can collect your packages. Therefore, it would be incorrect to infer anything about the overall nature of deliveries on campus.

Assumptions

The main assumption we have made is that by tracking delivery people that we see across campus, we are able to somewhat accurately depict the overall picture of deliveries being made and received on campus. However, there are key issues in our assumption given out methodology of recording a delivery upon sighting.

Firstly, we are not tracking all deliveries being made on campus. What we are tracking instead, are delivery people that we come across in our daily routine. This is very different, since this may lead to biases in our data. Take Figure 6 for example. It shows that 75% of all delivery people were sighted on the highline. Maybe this means that 75% of all delivery people know where the highline is, and they go there right away. However, what makes more sense is that the people collecting the data, us students, were on the highline for a majority of the time, leading to us collecting data about delivery drivers on the highline and overrepresenting the highline delivery drivers.

Fig 7: Mailroom

What can we infer instead?

 Instead of attempting to make broad generalizations about all deliveries on campus, we can refine our study to offer insights into the spending habits of students, specifically with regard to delivered food. This targeted approach allows us to draw more meaningful and accurate conclusions within the scope of our data and methodological constraints. 

Lastly, an interesting aspect to explore is the time of day when most deliveries are made. Plotting this data reveals that the majority of deliveries occur between 2-4 PM during the day. However, there is also a spike in the morning at 8 AM, which is intriguing. 

Since I don't usually wake up that early, I wasn't aware that people order food at this time, making it surprising. It might be possible that most of the students in our class are early risers, which might explain why there aren't as many records of drivers in the AM. Personally, I only order food at night when everything else is closed. I think that that might be a reason why other people also order food at night, and I spot delivery drivers at night he most often. It would be interesting to explore how much bias was introduced in this study with the method of study, which depends a lot on the lifestyle of my class collecting the data.

The next questions we might ask are why students are ordering online, during hours where food is available across campus. On campus, we have mandatory meal plans, and food is available throughout the day at various locations. Exploring the reasons behind this data would be interesting.

Key Takeaways and Data Empathy

In navigating the complexities of our delivery tracking project, the underlying theme of data empathy emerges, urging us to consider the human aspects, ethical considerations, and potential biases inherent in our data collection and analysis.

Respect for Privacy:

Practicing data empathy starts with a profound respect for privacy. As we shifted our focus from personal tracking to observing delivery drivers on campus, we recognized the importance of safeguarding the privacy of individuals unknowingly becoming part of our study. Acknowledging their lack of awareness required us to tread carefully and avoid tracking markers that could compromise their privacy.

Ethical Data Collection:

Choosing Survey123 as our data collection platform aligned with the principles of data empathy. The centralized system, integrated with ArcGIS, not only facilitated location tracking but also allowed us to implement safeguards. We consciously designed our survey to minimize the potential impact on delivery drivers' privacy, recognizing the ethical responsibility that comes with data collection.

Anomalies and Interpretation:

Data anomalies, such as geolocated points in unexpected locations, served as a reminder of the importance of empathy in interpretation. By excluding misleading data points and focusing on spatial trends within the campus, we demonstrated a commitment to presenting accurate and meaningful information, avoiding misrepresentation.

Understanding Human Behavior:

Moving beyond the initial conclusions, we delved into the assumptions guiding our study. Recognizing the limitations of tracking only the delivery drivers encountered during our daily routines, we embraced the concept of data empathy by acknowledging the potential biases introduced by our own behaviors and locations on campus.

Refining Insights:

The ultimate act of data empathy in our analysis is reflected in our decision to refine the study's focus. Instead of making broad generalizations about all deliveries on campus, we redirected our efforts to gain insights into the spending habits of students, particularly in the realm of delivered food. This targeted approach ensures more meaningful and respectful conclusions within the bounds of ethical data practices.

READY FOR GRADING!