SIMAH(SocIaL Media And Harassment)

Categorizing Different Types of Online Harassment Language on Social Media

SIMAH(Social Media And Harassment): First competition on categorizing different types of online harassment language in social media will be collocated with ECML PKDD 2019 in Würzburg, Germany.


Social media platforms serve society and feed people in different ways. Among all, Twitter has been doomed for abuse and harassment toward users specifically women. In fact, online harassment becomes very common in Twitter and there have been worried from experts and critics that Twitter has become the platform for many trolls, racists, misogynists and hate groups which can express themselves openly. In addition, this platform has poorly built to handle the problem of online harassment, CEO of Twitter has declared that "We suck at dealing with abuse and trolls on the platform and we have sucked at it for years". Online harassment has widely affected people throughout different times and places, from the working environment, schools, military installations and social gatherings to online social platforms. It usually refers to a wide different forms of abusive behaviour, including flaming such as name calling or insults, doxing such as showing the personal information of a women such as home address or phone number, impersonation or public shaming for destroying the persons reputation. When it comes to users who are female, other types of are added to the previous categories such as tweets about the fact that women cannot have specific positions or jobs outside of their houses or they can not be confident and determined or verbal or physical behaviour which is uninvited and unwelcome toward a person who has less power or tweets with more offensive nature and considers a female as a sex object or is about female appearance, body and even her sex life. Even though these tweets and related problems have been existed for many years, a few of the victims are now gradually speaking out.

Online harassment is usually in the form of verbal or graphical formats and is considered harassment because it is neither invited nor has the consent of the receipt. When harassment happens in online platforms, it usually increases the severity and complication of the experiences for the victim and makes how they respond hard. Monitoring the contents including sexism and sexual harassment in traditional media such as radio and television is easier than monitoring on the online social platforms such as Twitter. The main reason is because of the large amount of user generated content in these media. There have been institutions which use web crawling to collect text data from specific platforms, human being monitors them and goes through all the collected text to identify the hateful contents. Previous studies have been focused on collecting data about sexism and racism in a very broad terms or have proposed two categories of sexism as benevolent or hostile sexism which undermines other types of online harassment. However, there is no prior study focusing on different types online harassment alone attracting natural language processing techniques.

Automatically detecting content containing sexual harassment could be the basis for removing it, or flagging it for human evaluation. Differentiating different types of harassment provides the means to control such a mechanism in a fine-grained way as a viable tool for future research. This competition and workshop is proposed to develop models which apply machine learning and deep learning techniques to automatically classify tweets in different types of online harassment. As the basic goal, this automatic classification will significantly improve the process of detecting these types of speech on social media by reducing the time and effort required by human beings.