ALW 2019 - StackOverflow Dataset

ALW 2019

StackOverflow Dataset

The Stack Overflow Dataset

To acquire the dataset it is necessary to fulfil the following steps

Go to the StackOverflow Academic Partnership Programme Page, read the information on the website and open the application form.
You can speed up the process of the application form by using our boilerplate text (below) for the question "What research is being proposed? What are the specific requirements of the project? What datasets are you interested in, if any?"

“We are requesting comment data, without user information from StackExchange which has been removed from the site as a result of moderation. The data will be used to research computational methods to detection of abusive or harmful content, with the aims of submitting our work to the 3rd Workshop on Abusive Language online. To investigate automated methods for abuse detection, we require a dataset which contains the labels available in the StackOverflow flagging structure - though a part may be unlabelled as to experiment with empirical evaluation on the trained methods.”

Download and sign the Non-Disclosure Agreement (NDA)
- In the NDA please fill out date information, your name, and legal entity (Individual)
E-mail jsilge@stackoverflow.com and z.w.butt@sheffield.ac.uk:
- The e-mail must contain a Dropbox account
- The signed NDA as a Word Document (naming scheme: firstname_lastname_affiliation)
- Subject line: "Data Access: StackOverflow NDA Submission"

Google Sites

Report abuse