Assignment 1

Exploring Social Interactions:

I've noticed over the past year or so, that in moments where I see a familiar face out in the wild, I have a moment where time stops and I am forced to make a decision: should I approach them, offer a casual wave, or simply walk away? 

I've been thinking about this for a while, so I wanted to uncover the factors that go into the decisions I make. Does the time of the day affect whether or not I avoid someone? Maybe the day of the week matters. It goes without saying that my decisions really depend on the person that I see.

A major aspect of this self-reflection project has been meticulously recording these encounters. The result is a detailed log of my interactions, a snapshot of my social landscape over the course of several days. Now, the question arises: does this snapshot truly capture an accurate representation of my social behaviors?

Tracking Process

I decided to collect data extensively for a short period of time. For just over a week, I recorded every time I had to think about whether or not I want to interact with someone I recognized. With each recording, I made sure to note a variety of details. This included the date and days of each interaction, as I thought it might be interesting to see if there was a temporal relation between my decisions. I recorded the time data to the nearest 5 minutes in order to strike a balance between precision and avoiding overly specific data.


mk7641 Self Tracking
Figure:  My interaction data

Of course, I had to record the type of decisions I made, which included either approaching a person, waving at them, or avoiding them. When I mention 'avoiding,' I mean that in a scenario where I spotted them, but they didn't see me. If I chose then not to initiate an interaction, that was avoiding. It was not as if they saw me as well and I decided not to interact. That would be mean.

I also wondered if my decisions were related to the space I was in when I would see someone. For example, I think I avoid people when I'm on the highline, and at most, I offer a wave instead of stopping for a conversation. This is mostly because I'm on the highline usually when I have to be somewhere—which itself is an interesting realization about myselfso I can't really stop and chat. 

Closeness Chart
Figure:  Closeness Chart

Lastly, I decided to see whether or not I have a preference to specific people. Obviously, this has to be true since I don't think anyone is completely impartial as we all have people that we are closer to and people that we aren't as close to. 

So, to make things interesting, I decided to rank people on a scale of 1 through 5, with 5 being the friends I think are closest to me and 1 being the people who aren't really close or are just acquaintances.

I updated the closeness list as I added new people to my interaction dataset, so the people that I decided to record weren't decided before the experiment began.

One particular day when I met a bunch of friends one after the other, I thought that I would streamline the process of tracking my interactions by letting my friends in on the project. This way, I could still record my meetings even if I didn't have my phone on me.

My usual method of collecting data involved writing down the details of my interaction in a CSV file locally stored on my laptop.  I kept this data locally due to its sensitive nature, containing names of individuals I interacted with.

Figure: Reminder texts from friends as a method of double-checking data
Figure: Reminder texts from friends as a method of double-checking data

When my tracking period was over, I assigned each person a unique numbered ID in my dataset. After updating the data, I uploaded it to a Google Sheets document. Although I could have initially worked on Google Sheets, which would have been a more convenient process, I realized that I did not want to store this data alongside the actual names. With Google documents, you can view the edit history of a document. In that sense, I would not have been truly anonymizing the data. To ensure that the data was anonymous, I opted to follow these steps.

What did I learn?

The first things that I wanted to see were just some statistics about my interactions. I plotted a Word Cloud plot for the set of locations that were included in my data, with text size reflecting frequency of occurrence.

With this in mind, we can clearly see that D2 (West Dining Hall) and the Highline are places where I have this interactivity dilemma the most. It also means that the conclusions that I make for these locations will be better more accurate, as I have a larger sample size to work with.

On the other hand, it seems as if I do not really have the interaction dilemma at the library, considering the size of the text is so small. This may partly be because I did not visit the library as frequently as I was on the Highline or at D2. However, I think that with a much larger dataset, the distribution of frequency of interaction across these locations will more or less be the same. For the library that makes sense, because I go to the library when I want to focus on academics, and I know that most people are there for that reason. Because of this, I do not really face the dilemma as I choose not to interact with anyone before I even go to the library.

Figure: Word cloud plot of places I have the interaction dilemma the most inLarger text size indicates higher frequency
Figure: Ratios of each type of interaction

Next, I wanted to see how often I avoided people. Before I ran the numbers, I expected my avoidance rate to be closer to 50%, so it was interesting to see that I chose to interact with people around 80% of the time.

If we conducted this study last semester, I bet that the numbers would have been way different, and closer to what I expected. Some factors that I can think of that may have caused this change include a less stressful schedule, and also the fact that many of my friends are back from study away.

The element of time

I looked at two time-scales to see if there was a dependence on either one. Firstly, I wanted to see whether or not my decisions depended on the day of the week. To examine this relation, I plotted a probability bar graph that showcased the probability of each interaction type for every day.

It seems like I am more social on Thursdays, as I am not avoiding people at all based on my collected data. The highest avoidance rate is on Wednesdays, which is interesting as Wednesday is the least packed day in my schedule. Overall, the ratios of each interaction type is very similar across all days.

Figure:  Probability of each interaction based on time of day

Figure:  Probability of each interaction based on day of the week 

I repeated the same process for the hours of the day. However, we see something interesting. There are some hours, like around 11 a.m and 5 p.m, where I seem to be avoiding everyone. On the other hand, I seem to be going up to everyone I see at 9 a.m. and 7 a.m. But are these predictions correct? And if not, what is the reason?

To check what the reason was, I decided to plot a heatmap of the number of interactions I had across the hours of the day as well as days of the week. This makes the picture clearer.

When we cross check the time-probability chart against this heatmap. If we look at 11 a.m. and 5 p.m. again, we see that there is only one interaction datapoint for each hour. Therefore, there's not enough data to really create an accurate picture of my decision behavior. So, it's not really the case that I have hours where I will only avoid or only interact with a person. Rather, to capture an accurate image of my interaction behavior, it would be better to keep recording this data for a longer period of time, so that for each hour we have a decent number of datapoints.

On the other hand, we see that the datapoints for each day are much more evenly spread. Each day has multiple datapoints, which makes the relationship between interaction type and days more reliable.

Figure:  Heatmap of frequency of interactions against  time of day and day of the week

Figure:  Heatmap of frequency of different interaction types against location

Spatial dependence

I assumed that there would be some spatial dependence on whether or not I choose to interact with someone. It makes sense because I think I can categorize places that I go to based on their purpose. There are two choices: either I go there for something related to work, or I don't. I count any task with a deadline as work. With this in mind, going to D2 to eat is not work for me because I decide when I want to eat. Therefore, I'm much more open to going up to people and starting conversations. This is reflected in my data as well, and I expect that as I meet people mostly to share a meal and catch up with them.

On the highline, the frequency for each type of interaction is evenly split. As expected, I did avoid people on the highline. However, what I failed to consider were the times where I was not on the highline on the way to work. Thinking back, I do go up to meet people on the highline if I'm going to D2, as that's one place where I sit and chat with people over food, so the more the merrier.

Closeness Factor

Finally, we can take a look at the bar graph to see what information we can gather. Firstly, it seems like person 1 is my favorite person as I always went up to them, whereas it seems as if I hate persons 10, 11, 12, and 14 as I seem to have avoided them every single time. Persons 2, 3, 4, and 7 seem to be the next closest as I rarely avoid them if at all, which may mean that I want to interact with them as much as possible.

For persons 9, and 13, I only wave at them, which might mean that they are acquaintances.

Figure:  Probabilities of each interaction type based on people

Figure:  Closeness rating for each person

 To verify these findings, I match them against my closeness ranking for each person. I ranked person 1 the highest, which matches the findings of the project. The data also reflects my ranking of persons 2, 3, and 4 to be the next closest to me. Though it seems I ranked person 7 lower than the closeness level that this research's findings portray.

Moreover, for persons 9 through 14, the findings of the data more or less match the closeness chart that I had made.

Now, when looking at the heatmap as a final metric, as well as for verification, we see that the findings for person 1 have to be accurate, as I meet them a whopping 15 times across the period of this experiment. The next most frequent are persons 2 through 7. However, for persons 8 through 14, there are only one or two datapoints. Does that mean that the assertions about my relationship with them based on this data are inaccurate?

I would argue that the fact that the total numbers of interactions decrease as closeness decreases is not a coincidence. Instead, it might be the case that it is easier for me to decide whether or not I want to interact with a person that is not very close to me than it is for a person that is relatively closer. 

This is different from the previous case in which not having enough datapoints for each hour resulted in not having a concrete conclusion about the way time is tied to my decision making. In this case, it makes perfect sense that I would have less interactions with people that I am less close to. Therefore, the data supports my initial ranking of each person based on their proximity to me.

Figure: Interaction frequency for each person based on location  





Looking back, this process of thinking about my interactions and looking at the data has taught me a lot about how I socialize. It's shown me that every time I decide whether to say hi or have a longer chat, there are many things influencing that choice.

On the other hand, this process of examining my own interactions and decisions brings to light the power of data. It's a bit like how big companies collect a lot of information about people. They can use this data to understand our behaviors and preferences. Just like I learned about myself from my own data, they can make predictions about what we might do in the future. This shows how important it is for us to be aware of the information we share, and how it can be used to understand us better. It serves as a reminder to be mindful of our privacy and how we interact with the digital world.

READY FOR GRADING!