Preserving individual control over private information is one of the rising concerns in our digital society. Online social networks exist in application ecosystems that allow them to access data from other services, for example gathering contact lists through mobile phone applications. Such data access might allow social networking sites to create shadow profiles with information about non-users that has been inferred from information shared by the users of the social network. This possibility motivates the shadow profile hypothesis: the data shared by the users of an online service predicts personal information of non-users of the service. We test this hypothesis for the first time on Twitter, constructing a dataset of users that includes profile biographical text, location information, and bidirectional friendship links. We evaluate the predictability of the location of a user by using only information given by friends of the user that joined Twitter before the user did. This way, we audit the historical prediction power of Twitter data for users that had not joined Twitter yet. Our results indicate that information shared by users in Twitter can be predictive of the location of individuals outside Twitter. Furthermore, we observe that the quality of this prediction increases with the tendency of Twitter users to share their mobile phone contacts and is more accurate for individuals with more contacts inside Twitter. We further explore the predictability of biographical information of non-users, finding evidence in line with our results for locations. These findings illustrate that individuals are not in full control of their online privacy and that sharing personal data with a social networking site is a decision that is collectively mediated by the decisions of others.

Since the leaks of the National Security Agency global surveillance by Edward Snowden [1], privacy in online activity has been one of the rising concerns for Internet users [2]. While these concerns date back nearly two decades [3] and have not led to wide use of privacy-enhancing technologies [4], the topic of privacy rights in online activity is higher than ever on political agenda and media attention. Sharing private information can be motivated by services or information received in exchange, for example in the case of sharing health information with a doctor. Nevertheless, this does not need to be the case in online social networks: A 2016 Pew Research Center survey [5] showed that more than 51% of respondents consider it not acceptable to share private information with an online social network that shows personalized advertisement, in fear of third parties accessing such private data.


Twitter Social Network


DOWNLOAD 🔥 https://blltly.com/2y7NO5 🔥



While the above articles provide evidence supporting the shadow profile hypothesis, they suffered certain limitations. First, the lack of precise data required the use of growth simulations or other heuristics [22]. Second, observational data of users did not contain any information about which users share their contact lists, requiring certain assumptions in the analysis process [23]. And third, previous evaluations are either supported on small datasets or on data from shutdown social networks [10], leaving open the question of whether shadow profiles can be built in a current and large online social network. This article aims at overcoming those three limitations in a study of personal information in the Twitter social network, through the high-quality data provided by the Twitter API. As a result, we test shadow profile hypothesis in an active and large online social network, using the precise time sequence of Twitter users joining the network, and identifying which users share their contact lists as revealed in the metadata of their tweets.

In the following, we present a dataset that we produced to evaluate the shadow profile hypothesis when predicting user location. We use that data to evaluate the shadow profile hypothesis and analyze how the quality of location prediction depends on the tendency of users to disclose information and on the number of friends that a non-user has in Twitter. We continue by testing the shadow profile hypothesis for simplified features of user biographical texts. This analysis not only has the potential to robustly test the shadow profile hypothesis in a current social network, but also explores possible inequalities in the accuracy of shadow profiling and the collective aspects of privacy decisions in our current digital society.

We started our data collection by producing a set of ego users whose information will constitute the ground truth to evaluate predictions. To generate an initial unbiased random sample of users, we applied the Random Digit Search method [24, 25]: We generated random Twitter user ids in the range between 1 and 30 Billion, looked them up through the Twitter REST API,Footnote 2 and saved the basic user information of the valid sampled users. To avoid celebrities and spammers, we filtered out users with a ratio of followers to friends below 0.1 or above 10, as well as users with less than 50 friends or followers. To have a homogeneous sample for biographical data analysis, we included only users that have English as the language of the Twitter account. This process generated a set of 1,017 ego users, which are the starting point of a larger dataset including their social contacts and their activity in Twitter.

We collect the timeline of tweets of each ego user up to 3,200 tweets.Footnote 3 Based on those timelines, we identify alter users as the ones that have been mentioned at least four times by an ego user, following this way a set of friendship links that capture communication and not just followership or retweeting [26]. We use these links as an approximation to the underlying social network between Twitter users that is revealed when users share their contact lists through mobile phone apps or through importing tools. This way we generate a set of 68,447 alter users, collecting also their timeline of tweets and biographical information. As a result, we count with a total of 157,408,012 tweets in our dataset from both ego and alter users.

User locations in the Twitter social network. The top panel shows a map with the locations of the users in the dataset. The lower panel shows the ego network of locations including only preceding alters, with nodes colored according to the country as a way to illustrate locations. A clear country assortativity pattern can be observed (with a nominal assortavity coefficient for countries of 0.53), as most preceding alters of egos are in the same country.

We analyze the dependence of the quality of shadow profiles for location as a function of the disclosure of contact lists, sampling alters as disclosing users for increasing values of the disclosure parameter tag_hash_110. The left panel of Figure 4 shows the median prediction error averaged over 1000 user samples for each value of tag_hash_111, with an inset showing the equivalent for the Null Model. It is clear that the same observation as above holds here: the error of the shadow profile prediction is much lower than the error of the Null Model, even for low values of tag_hash_112. Median errors decrease monotonically with tag_hash_113, which is confirmed by the Spearman correlation coefficient (\(\sigma=-0.06\), \(95\%\mbox{CI}=[-0.086, -0.037]\)). This supports the hypothesis that, as more users share information in the social network, the predictor accuracy increases.

Our work shows that the data shared by Twitter users is predictive of personal information of individuals that are not users. We produced a dataset of the ego network of more than 1000 users, retrieving their timelines and timelines of their alters for a total of more than 150 Million tweets. Detecting users that use a mobile phone app, we could identify which users share their contact lists, and thus we provide the first empirical test of the shadow profile hypothesis on a dataset of a current social network. We found that the data shared by those users is informative in the prediction of location and approximates the biographical text of individuals that had not joined Twitter. This served as a historical audit to evaluate the shadow profile hypothesis, as Twitter had enough data to infer personal attributes of people that did not have an account at that time. Studying various disclosure tendencies in random samples of users, we found that the quality of those inferences improves with the tendency to disclose information of Twitter users. Furthermore, we analyzed the heterogeneity in the quality of these inferences and found that users with more friends with a Twitter account are subject to have more accurate shadow profiles.

Our work suffers a series of limitations that need to be taken into account when generalizing. First, we performed a historical audit using future data as ground truth. While this can test the shadow profile hypothesis, we can only fully understand the risks it conveys when producing predictions of people that have never been users of an online service. This could be done combining user contact information, which is often proprietary, with factual data from non-users, which needs to be voluntarily provided by non-users for research purposes. Second, we have relied on a heuristic to infer friendships based on the intensity of interaction in Twitter. While social network arguments support this assumption [26, 33], future research should aim at accessing friendship lists or name generators that do not depend on online interaction. Third, we used a model for user biographical texts that does not allow a straightforward interpretation of what biographical qualities are being predicted. While this allows us to address the shadow profile hypothesis, larger user samples can quantify individual demographic markers in user biographical texts [21]. And finally, our analysis is based on a sample of users that might not be demographically representative. This means that, while we cannot conclude everyone can have a shadow profile, our results show that someone can have it. The evidence of this possibility is already a challenge to the current guarantees of the right to privacy, but generalizing these results to larger populations has the potential to reveal larger issues and risks for whole societies. 006ab0faaa

dortmund 2022

free download gimp 2.6 for windows 10

netter book for notre dame college pdf download

tekken 3 download for tablet

wallpaper fruit photo download