First Shared Task on Aggression Identification

[Licensed under Creative Common Non-Commercial Share-Alike 4.0 licence CC-BY-NC-SA 4.0]

If you are using the dataset for your research, kindly cite it as the following -


author = {Ritesh Kumar and Aishwarya N. Reganti and Akshit Bhatia and Tushar Maheshwari},

title = "{Aggression-annotated Corpus of Hindi-English Code-mixed Data}",

booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},

year = {2018},

month = {May 7-12, 2018},

address = {Miyazaki, Japan},

editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},

publisher = {European Language Resources Association (ELRA)},

isbn = {979-10-95546-00-9},

language = {english}


If you are using the shared task report, kindly cite as below -


title = "Benchmarking Aggression Identification in Social Media",

author = "Kumar, Ritesh and

Ojha, Atul Kr. and

Malmasi, Shervin and

Zampieri, Marcos",

booktitle = "Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying ({TRAC}-2018)",

month = aug,

year = "2018",

address = "Santa Fe, New Mexico, USA",

publisher = "Association for Computational Linguistics",

url = "",

pages = "1--11"


The workshop includes a shared task on ‘Aggression Identification’. The task will be to develop a classifier that could make a 3-way classification in between ‘Overtly Aggressive’, ‘Covertly Aggressive’ and ‘Non-aggressive’ text data.

We are making available a dataset of 15,000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation. We will release additional data for testing your system. Please register here to download the data and participate in the task.

General Instructions for Participants

  • Each team is allowed to submit up to three systems for evaluation.
  • The test data will be sent to the participants on the 21st of April, 2018 25th of April, 2018 and they will be given a window of 72 hours (i.e. till 24th 28th of April, 2018) for testing your system and sending us back the labels for the test instances. We will send the participants further instructions on submitting your system and labels for the test data in due course of time.
  • We expect each team to submit a system description paper after the evaluation. The deadline, length of submission and other instructions for the system description papers will be same as that for the workshop papers. All the system papers will be published in the proceedings and the best systems will be given slots for demos and presentations at the workshop.
  • Participants can use additional data for training the system, Ju that the dataset that you use is either already publicly available or you make it available immediately after submission (and well before the submission of your system paper) and you mention it in your submission. Use of non-public additional data for training will disqualify your system.

Evaluation Metric

The submitted system will be evaluated on the basis of weighted macro-averaged F-scores. The individual F-score of each class will be weighted by the proportion of the concerned class in the test set and the final F-score will be the average of these individual F-scores of each class.


Training set release March 13, 2018 [Extended Date]

Test set release April 21,2018 April 25, 2018 [Extended Date]

Submissions due April 24, 2018 April 30, 2018 [Extended Date]

Results announcement April 28, 2018 May 5, 2018 [Extended Date]

System papers deadline May 25, 2018 May 28, 2018 [Extended Date]

Reviews for papers June 20, 2018 June 25, 2018

Camera-ready versions June 30, 2018 July 5, 2018

[Timezone: as long as it’s the date mentioned, anywhere on earth; UTC-12.]

Data will be made publicly available after the end of the competition under Creative Commons Non-Commercial Share-Alike 4.0 licence CC-BY-NC-SA 4.0! Please Click Here to get the dataset used in the task.