9-11

Schwartz et al., 2015. Extracting Human Temporal Orientation from Facebook Language

Paper pdf (NAACL-2015: North American Chapter of the Association for Computational Linguistics).

Discussion Leader Slides (Vivek Kulkarni).

Questions from Group with Responses from Author:

1. What are the biases inherent in social media communications? Messages are written for friends/family and sometimes for population at large. Is communication with friends more “honest” or reflective of mental states as opposed to tweets sent out to the whole wide world. It seems remarkable that signals about psychological variables emerge despite the idiosyncrasies of social media. There are generally two types of bias for such research: 1. sample bias: the fact that people using social media and/or those that sign up for such studies are not representative of the population. From experience, this seems to be much less of issue than people might expect. Introverts are nearly as common as extraverts, males as females...etc... The one bias that holds strong is that towards younger people. We have a paper speaking to the effect of this in assessing disease prevalence here, but there is room for much much more exploration of how exactly these biases affect things. In the case of this NAACL paper, we controlled for age and gender when running the correlations between each orientation and the other attributes in order to be sure that demographics didn't explain the difference (older individuals are more likely to talk about the future, so are more conscientious individuals; thus, we wanted to be sure the connection between future and conscientiousness wasn't just older individuals were more likely to be high in both). 2. presentation of self: This is what I think you allude to most, and it is very tricky to account. What is the "honest" true person? Are we ever not presenting ourselves? My colleague Lyle Ungar has a blog post on the subject here. In the end, we find correlates in the direction they are theorized to occur, and we also find the metric is predictive of useful outcomes (like depression, IQ, or number of friends). Keep in mind, especially in this case, we're not tracking people's direct mention of how future-oriented they are -- we are quantifying their behavior in social media to get a somewhat more objective measure (in that they are not self-reporting what they think they are) To a psychologist, a behavior-based measure like this is at least not as prone to "self-report biases" as questionnaires -- the idea that one generally prefers to answer questions a certain way (i.e. preferring extremes or not) and that they are comparing themselves to their family/friends when doing so rather than the whole world. Depressed people may try to talk less depressed online but there is still a relative difference. As you note, potentially, "presentation" (whatever it is) is ramped up in Twitter where messages are public by default. Another issue is that people seem to put different types of content on twitter -- a lot more sharing of "information" like articles and less discussion of one's self. A study is very much needed to quantify these differences -- if anyone is interested, I have data.
2. What are the types of errors in temporal orientation detection? What are the causes? Errors of the model itself mostly seemed to reflect difficult cases where statuses mentioned varying orientations (first two examples below) or where the orientation wasn't explicitly mentioned but needed to be inferred (3rd example below): From Table 1:
  1. I just watched Oprah and am posting what it was about.
  2. really wanted a snow day, but probably not going to get one tomorrow. now homework.
  3. Another day of great restraint.
3. We found that the model was more likely to make errors where the human annotators had disagreement. A possible solution is to count orientation on the clausal-level (i.e. every subject-verb-object), rather than tweet level (i.e. what is most prevalent across the multiple clauses of the tweet). However, for the application at the user-level, it's not clear if that would make much of a difference.
4. Lack of a linear correlation doesn’t necessarily mean lack of correlation. Could temporal orientation have non-linear correlation with the attributes? Visualizing the data could help. How about RF’s feature importance? Sure. The Pearson correlation coefficient describes a linear relationship, but other relationships exist, and in fact, the predictive model we used seemed to pick up on some with the linguistic features. When comparing to attributes though (i.e. doing social science), unless one has a hypothesis then they typically stick with linear relationship tests. Testing for non-linear relationships increases the chance that you find something spurious (i.e. not truly significant). This is because you're greatly increasing the number of hypotheses you are testing, especially if you need to vary a parameter to adjust the curve rate or anything like that. If I test 100 independent curves, seeking a p-value of .05, then 5 of them are likely to come back at p <.05 simply due to chance. Generally speaking, linear correlation is much more robust and interpretable. If one theorizes non-linear relationships, they are often better off transforming the data and then testing (or discretizing the data into categories). That said, I think there is room to develop robust methods of finding non-linear relationships with human attributes -- I just wouldn't do it in the same work introducing a new problem.
5. How is standardization done in Figure 2? z-score: (value - mean) / std-dev This puts everything on the same scale since otherwise present would dominate.
6. Is it worth splitting the orientation further? How about immediate past, versus reminiscing about some events that happened decades back? Is this level of tracking psychologically relevant? yes, definitely. Psychologically, one would expect those who focus on the distant future to be better savers. It would also be great to take valence (positive / negative) into account: is the distant future positive or negative (negative means worry; positive means optimism -- both shown to relate to health)? Is one dwelling on the past or fondly reminiscing? Another area where there's room for a good paper!