Religion in Social Media Workshop Home
These datasets are to be used in the ICWSM 2015 workshop Religion on Social Media. The goal is that these datasets will help to seed a discussion and potentially serve as a source for the extended abstracts.
There are currently two datasets here, both collected from Twitter. Each is available as both .txt and .xlsx files. See below for details.
By searching Twitter user bios via Followerwonk with a list of religion-specific keywords, the users who self-reported their religions as Atheism, Buddhism, Christianity, Hinduism, Islam, or Judaism are collected. In addition, the dataset also include "Undeclared" users who do not report any of the above mentioned religions/beliefs. The dataset has been filtered to keep only the users whose location string is mapped to one of the U.S. states, the language is specified as "en", the self-description bio is not empty, and tweet count is greater than 10.
The dataset shared here contains the user IDs, the U.S. states that the users are mapped to, and the religions self-identified by the users. The table below shows the data statistics.
Download: religious-Twitter-user.txt (5.4M), religious-Twitter-users.xlsx (4.5M)
Using the Twitter REST API, the tweets created by the religious users described above are collected. A sample of 7K tweets posted in the year of 2013 are selected from each religion, including Atheism and the Undeclared user group. For large religions (>= 7K users), 7K users were randomly sampled and for each user a random tweet was selected. For the smaller religions (<7K users), a "round-robin" sampling is done to allow some users appearing multiple times in the sample of 7K tweets.
The dataset shared here contains 49K tweets and their meta data.
Download: tweet-sample-2013.txt (8.3M), tweet-sample-2013.xlsx (6.4M)
If you use these datasets in your research, please cite the following paper:
Lu Chen, Ingmar Weber and Adam Okulicz-Kozaryn. U.S. Religious Landscape on Twitter. In Proceedings of the Sixth International Conference on Social Informatics (SocInfo'14), 2014, pages 544-560.