The 5th International Workshop on
Natural Language Processing for Social Media
In conjunction with EACL 2017 @ April 3, 2017, Valencia, Spain.
In conjunction with IEEE BigData 2017 @ December 11, 2017, Boston, MA, USA.


Workshop Description

With social media services' rise of popularity, including general-purpose Microblogs such as Facebook, Twitter, and Plurk, goal-oriented services such as Linkedln (for professional occupation), (a social bookmarking service), and Foursquare (a check-in service for mobile devices), and Web 2.0-based large-scale knowledgebase such as Wikipedia and common-sense corpus, now researchers can assess heterogeneous information of the target human/object that includes not only text content but also meta-data, or even the social relationships among persons.

Furthermore, the content on social media and Web 2.0 platforms is different from that on others in terms of style, tone, purpose, etc. For instance, posts on twitter are limited in size, thus can contain jargons, emoticons, or abbreviations which usually do not follow formal grammar. It is not suitable to apply existing natural language techniques on such content because they are not tailored to do so. For instance, standard summarization techniques might not be suitable for Plurk posts that are relatively short and contain responses from multiple friends; and sentiment dictionaries learned from news corpus might not be suitable for sentiment detection tasks on Microblogs.

As it is generally believed social media has become one of the major means for communication and content producing, while such trend is not likely to fade away, being able to process content from social media platforms does bring a lot of values in real-world applications. Furthermore, due to the change of the style to the content and the availability of heterogeneous resources (e.g. social relationship among people) one can obtain, novel NLP techniques that are designed specifically for such platform and can potentially integrate or learn information from different sources are highly demanded. Below we highlight some (non-exclusive) important themes in this direction.

Organizing the SocialNLP workshop in EACL 2017 is four-fold. First, social media analytics is the research topic which is closely related to natural language processing. But with the challenges mentioned above, we resort to the machine learning (ML) community and attempt to find the role of ML and NLP techniques in SocialNLP. In recent NLP-related conferences, no matter to tell from the number of submissions or participants, it is apparent that sentiment analysis and the social media analytics are certainly two of the main research topics. Second, we have a strong program committee (around 100 researchers) this year, in which 88% members have been reviewers for ACL series of conferences, which are top ones for NLP related research, and they can be very helpful in promoting our workshop. Therefore, we believe that the SocialNLP workshop can draw much interest and attract many audiences from potential academic or industrial participants of NLP. We think such high visibility of SocialNLP can bring more participants and submissions to EACL. Third, social media data is essentially generated and collected from online social services, which have accumulated a large number of user-generated social data, i.e., big social data. Processing such big social data with linguistic knowledge and NLP techniques has encountered many important research problems. Through SocialNLP, the cutting edge technology will be introduced to ML researchers, where they might find some inspirations and useful information. Moreover, as SocialNLP has an aim to make data available to the research community and will provide a platform for researchers to share datasets, ML researchers and NLP researchers can get familiar with the data from each other and access them easily. Fourth, user-generated content in social media is mainly in the form of text. Theories and techniques on artificial intelligence and natural language processing are desired for semantic understanding, accurate search, and efficient processing of social media contents. From the perspective of application, novel online applications involving social media analytics and sentiment analysis, such as emergency management, social recommendation, user behavior analysis, user social community analysis and future prediction, are topics that NLP and ML researchers have paid attention to. In short, hosting SocialNLP workshop in EACL will provide mutually-reinforced benefits for researchers in areas of ML techniques, natural language processing and social media analytics. We believe collecting thoughts and comments of these researchers will also bring up many great ideas and opportunities for future research collaborations.

Topics of Interest

Topics of interests for the workshop include, but are not limited to:

Content analysis on Social Media

  • Concept-level sentiment analysis
  • Summarization of posts/replies on social media
  • Name entity Recognition on Social media
  • Relationship extraction on social media
  • Entity resolution for social media
  • Search, Indexing, and Evaluation on Social Web
  • Improving Speech Recognition using Social Media Content
  • Multilingual and Language specific Information Retrieval on Social Web

Natural language processing on Web 2.0

  • Folksonomy and Social Tagging
  • Trend analysis on Wikipedia
  • Trustworthiness analysis on Wikipedia
  • Human computing for social-media corpus generation
  • Social structure and position analysis using Microblog content
  • Trust and Privacy analysis in social contexts
  • Community detection using blogs or Microblog content

Sentiment and Opinion Analysis on Social Media

  • Big social data analysis
  • Lexical semantic resources, corpora and annotations of social media for sentiment analysis
  • Opinion retrieval, extraction, classification, tracking and summarization
  • Domain specific sentiment analysis and model adaptation Emotion detection
  • Sentiment analysis for automatic public opinion poll and surveys of user satisfaction
  • Improvement of NLP tasks using subjectivity and/or sentiment analysis on social platform
  • Sentiment analysis and human computer interface on social platform
  • Real-world sentiment applications and systems on social platform

Disaster Management Using Social Media

  • Modeling global events or human activities based on social media texts
  • Identification and geo-location of social media content
  • Social-based web platform for disaster management
  • Disaster or disease prediction and forecasting
  • Resource allocation using social media
  • Monitoring emergency responses among social crowds
  • Analyzing the diffusion of emergent information
  • Exploiting social media for crisis response and search and rescue activities

Models and Tools Development for SocialNLP

  • Biologically-inspired opinion mining
  • Social-network motivated methods or tools for natural language processing
  • Advanced topic model for social media
  • Learning to rank for social media
  • Clustering and Classification tools for Social Media
  • Content-based and social-based Recommendation
  • Multi-lingual machine translation on Microblog

Paper Submission

SocialNLP @ EACL 2017 SocialNLP @ IEEE BigData 2017
Page Limit Regular Paper: 8 pages
DATA Paper: 5 pages
(both allow additional 2 pages for references)
Regular Paper: 8 pages
DATA Paper: 5 pages
(both allow additional 2 pages for references)
Paper Template EACL-2017 LaTeX template IEEE Templates for Conference Proceedings
Submission Site

SocialNLP review is double-blind. Therefore, please anonymize your submission: do not put the au-thor(s) names or affiliation(s) at the start of the paper, and do not include funding or other acknowl-edgments in papers submitted for review. In addition to regular paper, we call for DATA PAPER this year. A data paper should include the details of the created dataset and an experiment illustrating how to use it. Authors should note it as a data paper using the author field and submit at least partial data as accompanied materials. The created dataset should be able to be downloaded or acquired through an application process freely. If the data paper is accepted, we will list the link for accessing the dataset in the SocialNLP website. Note that the review for data papers is also double-blind and it is authors’ re-sponsibility to avoid revealing their identities.

Papers submitted to this workshop must not have been accepted for publication elsewhere or be under review for another workshop, conference or journal. Papers should be written in English. Each submis-sion will be evaluated by at least 3 program committee members. For SocialNLP@EACL-2017, the workshop proceedings will be published in ACL Anthology. For SocialNLP@IEEE-BigData-2017, all accepted papers will be published in the Workshop Proceedings by the IEEE Computer Society Press.

To pursue high quality submission, we will have a best paper award of SocialNLP 2017 for both venues. The selection process will depend on not only the review comments/ratings, but also the quality of paper that is rated by paper authors. Selected, expanded versions of papers presented at the workshop will be published in two follow-on Special Issues of Springer Journal of Information Science and Engineering (JISE) and the International Journal of Computational Linguistics and Chinese Language Processing (IJCLCLP).


Please note that at least one registration per paper published is required. At the time of submission of the final camera-ready copy, authors will have to indicate the already registered person for that publication.

Program @ EACL

Date: April 3rd
Location: TBA

SocialNLP @ EACL 2017
09:30-09:40 Opening and Welcome
09:40-10:50 [Keynote Speech 1] NLP, the perfect social (media) science?
Dirk Hovy, Associate Professor, The University of Copenhagen
Language is the ultimate social medium: We don't just communicate to convey information, but also to entertain, to gossip, to console, and much more. Social media is one of the purest expressions of all of these aspects of language, and often includes additional information about the place, time, and author of a message. This combination has allowed NLP to work on real, situated, individual language, rather than on abstract general corpora, and lead it into areas that were previously the sole domain of social sciences. These areas open up a wide range of exciting new applications, but also presents a host of new challenges - technically, linguistically, and morally. In this talk, I will illustrate both opportunities and challenges with some of my ongoing research, and end with a number of open questions that I believe will guide NLP for the years to come.
Dirk Hovy is an associate professor in natural language processing at the University of Copenhagen. His research focuses on the interaction of statistical models, language, and demographic factors. He received his PhD in Computer Science from the University of Southern California, and holds an MA in sociolinguistics from the University of Marburg, Germany. Dirk has authored papers on a variety of NLP topics, including semantic and syntactic analysis, domain adaptation, and information extraction. All of these involved annotation at some point, and the associate problems have led to the development of MACE. Outside of research, Dirk enjoys cooking, tango, and leather-crafting, as well as picking up heavy things and putting them back down. You can find an updated biography and more at
11:00-11:30 Coffee Break
Morning Session
A Survey on Hate Speech Detection using Natural Language Processing PDF
Anna Schmidt and Michael Wiegand
Facebook sentiment: Reactions and Emojis PDF
Ye Tian, Thiago Galery, Giulio Dulcinati, Emilia Molimpakis and Chao Sun
Potential and Limitations of Cross-Domain Sentiment Classification PDF
Jan Milan Deriu, Martin Weilenmann, Dirk Von Gruenigen and Mark Cieliebak
13:00-14:30 Lunch
14:30-15:40 [Keynote Speech 2] Mixed-language data in social media
Thamar Solorio, Associate Professor, University of Houston

The spontaneous nature of social media interactions gives users the flexibility to adopt an informal writing style. One of the most common consequences of this casual style is the relaxed grammar and frequent use of non-standard spellings that has been addressed to some extent in the development of specialized tools for social media data. But a big challenge that remains underexplored is the mix of languages in these channels. With few notable exceptions, most of the multilingual technology assumes a scenario where more than one language can be handled but with the restriction of having only one language per input text. As a consequence, multilingual speakers that often use more than one language at a time are very likely underrepresented by models consuming social media.

In this talk I will present the work my group has been doing in the context of processing mixed language data from social media. I will introduce the challenges we face when the informal style in social media data combines with the use of more than one language, and discuss our contributions to address them.

Thamar Solorio is Associate Professor in the Department of Computer Science at the University of Houston (UH). She is the founder and director of the Research in Text Understanding and Analysis of Language (RiTUAL) group at UH. Her main research interests are in stylistic modeling of text, syntactic analysis of mixed language data, and information extraction from health support groups. She has M.S. and PhD degrees in Computer Science from INAOE, Puebla, Mexico. The Department of Defense and the National Science Foundation currently fund her research program. She is the recipient of a CAREER award for her work in authorship analysis, and the 2014 Denice Denton Emerging Leaders ABIE Award.
15:50-16:30 Poster / Coffee Break
Afternoon Session
Aligning Entity Names with Online Aliases on Twitter PDF
Nathanael Chambers, Kevin McKelvey, Peter Goutzounis and Stephen da Cruz
Character-based Neural Embeddings for Tweet Clustering PDF
Svitlana Vakulenko, Lyndon Nixon and Mihai Lupu
A Twitter Corpus and Benchmark Resources for German Sentiment Analysis PDF
Mark Cieliebak, Jan Milan Deriu, Dominic Egger and Fatih Uzdilli

BSMDMA-SocialNLP Workshop Program @ IEEE BigData

Date: December 11
Location: Opening, Keynote, Morning Session I and II: Helicon Room - 7th Floor
Location: Afternoon Session: St. George C - 3rd Floor

BSMDMA-SocialNLP Workshop @ IEEE BigData 2017
7F: Helicon Room
Opening and Welcome
7F: Helicon Room
[Keynote Speech] Computational Methods for Team Formation
Evimaria Terzi, Associate Professor, Boston University
Language is the ultimate social medium: We don't just communicate to convey information, but also to entertain, The performance of a team depends not only on the abilities of its individual members, but also on how these members interact with each other. Inspired by this premise and motivated by a large number applications in educational, industrial and management settings, team-formation problems aim to engineer teams that are effective and successful. In the talk, we will discuss computational approaches to team-formation problems and highlight the connection of these approaches to models of social theory that capture team dynamics.
Evimaria Terzi is an Associate Professor at the Computer Science Department at Boston University. Before joining BU in 2009, she was a research scientist at IBM Almaden Research Center. Evimaria has received her Ph.D. from University of Helsinki, Finland and her MSc from Purdue University. Evimaria is a recipient of the Microsoft Faculty Fellowship (2010) and the NSF CAREER award (2012). Her research interests span a wide range of data-mining topics including algorithmic problems arising in online social networks, social media and recommender systems.
Morning Session I
7F: Helicon Room
Identifying emergency stages in Facebook posts of police departments with convolutional and recurrent neural networks and support vector machines
Nicolai Pogrebnyakov and Edgar Maldonado.
Characterization of daily tourism behaviors based on place sequence analysis from photo sharing websites
Thomas-Joseph Loiseau, Sonia Djebali, Thomas Raimbault, Bérengère Branchet and Gael Chareyron.
Ticket-Purchase behavior under the Effects of Marketing Campaigns on Facebook Fan Pages
Hsiao-Wei Hu, Ching-Han Cheng, Yun-Chu Chung and Chia-Yu Lee.
Outbound Behavior Analysis Through Social Network Data: a case study of Chinese people in Japan
TIANQI XIA, Xuan Song, Dou Huang, Satoshi Miyazawa, Zipei Fan, Renhe Jiang and Ryosuke Shibasaki.
PSEISMIC: A Personalized Self-Exciting Point Process Model for Predicting Tweet Popularity
Hsin-Yu Chen and Cheng-Te Li.
10:45-10:05 Coffee Break
Morning Session II
7F: Helicon Room
Detection of Profile Injection Attacks in Social Recommender Systems Using Outlier Analysis
Anahita Davoudi and Mainak Chatterjee.
Evaluating the Quality of Graph Embeddings via Topological Feature Reconstruction
Stephen Bonner, John Brennan, Ibad Kureshi, Georgios Theodoropoulos, Stephen McGough and Boguslaw Obara.
Topic Life Cycle Extraction from Big Twitter Data based on Community Detection in Bipartite Networks
Takako Hashimoto, Hiroshi Okamoto, Tetsuji Kuboyama and Kilho Shin.
Using Sentiment Analysis to Explore the Degree of Risk in Sharing Economy
Wei-Lun Chang.
Big Social Data Analytics for Public Health: Comparative Methods Study and Performance Indicators of Health Care Content on Facebook.
Nadiya Straton, Raghava Rao Mukkamala and Ravi Vatrapu.
A Big Social Media Data Study of the 2017 German Federal Election based on Social Set Analysis of Political Party Facebook Pages with SoSeVi.
Benjamin Flesch, Ravi Vatrapu and Raghava Rao Mukkamala.
Digital Content Recommendation System Using Implicit Feedback Data.
Saayan Mitra, Viswanathan Swaminathan, Ratnesh Kumar and Gang Wu.
12:30-14:00 Lunch Break
16:00-16:20 Coffee Break
Afternoon Session
3F: St. George C
Detecting Polarization in Ratings: An Automated Pipeline and a Preliminary Quantification on Several Benchmark Data Sets
Mahsa Badami, olfa Nasraoui, Wenlong Sun and Patrick Shafto.
Language Identification in Multilingual, Short and Noisy Texts using Common N-Grams
Dijana Kosmajac and Vlado Keselj.
Characterizing Online Community Practices with Orthographic Variation
Ian Stewart, Stevie Chancellor, Munmun De Choudhury and Jacob Eisenstein.
Using an Asset Price Bubble Model in Tweet Analytics
K.M. George.
An Entity Disambiguation Method Based on LeaderRank
Bingjing Jia, Bin Wu, Jinna Lv, Pengpeng Zhou, Yao Bu and Ying Xing.
Improving Arabic Sentiment Analysis with Sentiment-Specific Embeddings
A. Aziz Altowayan and Ashraf Elnagar.
Topic Modelling enriched LSTM Models for the Detection of Novel and Emerging Named Entities from Social Media
Patrick Jansson and Shuhua Liu.
Differences in Emoji Sentiment Perception between Readers and Writers.
Jose Berengueres and Dani Castro.

Important Dates

SocialNLP @ EACL 2017 SocialNLP @ IEEE BigData 2017
Submission Deadline January 23, 2017
Extended to February 2, 2017
(23:59 Hawaii Standard Time)
October 10, 2017
Extended to October 25, 2017
(23:59 USA Eastern Standard Time)
Author Notification February 11, 2017
Extended to February 16, 2017
November 1, 2017
Extended to November 13, 2017
Camera-ready Submission February 21, 2017 November 19, 2017
Workshop Date April 3, 2017 December 11, 2017

Program Committee

  • Tim Althoff, Stanford University
  • Sabine Bergler, Concordia University
  • Berlin Chen, National Taiwan Normal University
  • Hsin-Hsi Chen, National Taiwan University
  • Hai Leong Chieu, DSO National Laboratories
  • Monojit Choudhury, Microsoft Research
  • Freddy Chua, Singapore Management University
  • Nigel Collier, University of Cambridge
  • Danilo Croce, University of Roma, Tor Vergata
  • Lei Cui, Microsoft Research
  • Ronan Cummins, University of Cambridge
  • Pradipto Das, Rakuten USA
  • Min-Yuh Day, Tamkang University, Taiwan
  • Ann Devitt, Trinity College Dublin
  • Eduard Dragut, Temple University
  • Koji Eguchi, Kobe University
  • Michael Elhadad, Ben-Gurion University
  • Wei Gao, Qatar Computing Research Institute
  • Spandana Gella, University of Edinburgh
  • Marco Guerini, Fondazione Bruno Kessler
  • Weiwei Guo, LinkedIn
  • William Hamilton, Stanford University
  • Graeme Hirst, University of Toronto
  • Wen-Lian Hsu, Academia Sinica
  • Diana Inkpen, University of Ottawa
  • David Jurgens, Stanford University
  • Pallika Kanani, Oracle Labs
  • Soo-Min Kim, Amazon
  • Roman Klinger, University of Stuttgart
  • June-Jei Kuo, National Chung Hsing University
  • Tsung-Ting Kuo, University of California, San Diego
  • Cheng-Te Li, National Cheng Kung University
  • Chuan-Jie Lin, National Taiwan Ocean University
  • Shou-De Lin, National Taiwan University
  • Yiqun Liu, Tsinghua University
  • Zhiyuan Liu, Tsinghua University
  • Bin Lu, Google Inc.
  • Zhunchen Luo, China Defense Science and
  • Technology Information Center
  • Bruno Martins, University of Lisbon
  • Yelena Mejova, Qatar Computing Research Institute
  • Rada Mihalcea, University of Michigan
  • Manuel Montes-y-Gómez, INAOE, Mexico
  • Dong Nguyen, University of Twente
  • Haris Papageorgiou, ATHENA Research and Innovation Center
  • Souneil Park, Telefonica Research
  • Michael Paul, University of Colorado Boulder
  • Georgios Petasis, NCSR "Demokritos"
  • Stephen Pulman, Oxford University
  • Sravana Reddy, Wellesley College
  • Paolo Rosso, Universitat Politècnica de València
  • Derek Ruths, McGill University
  • Saurav Sahay, Intel Labs
  • Hassan Saif, The Open University
  • Scott Nowson, Accenture Centre for Innovation, Dublin, Ireland
  • Yohei Seki, University of Tsukuba
  • Mário J. Silva, Universidade de Lisboa
  • Yanchuan Sim, Institute for Infocomm Research
  • Jan Snajder, University of Zagreb
  • Jannik Strötgen, Max Planck Institute for Informatics
  • Xavier Tannier, Université Paris-Sud, LIMSI, CNRS
  • Mike Thelwall, University of Wolverhampton
  • Ming-Feng Tsai, National Chengchi University
  • Paola Velardi, Università di Roma
  • Marc Verhagen, Brandeis University Svitlana Volkova, PNNL
  • Xiaojun Wan, Peking University
  • Hsin-Min Wang, Academia Sinica
  • Jenq-Haur Wang, National Taipei University of Technology
  • William Yang Wang, UC Santa Barbara
  • Ingmar Weber, Qatar Computing Research Institute, HBKU
  • Robert West, École Polytechnique Fédérale de Lausanne
  • Kam-Fai Wong, The Chinese University of Hong Kong
  • Shih-Hung Wu, Chaoyang University of Technology
  • Ruifeng Xu, Harbin Institute of Technology
  • Yi Yang, Georgia Tech
  • Liang-Chih Yu, Yuan Ze University
  • Zhe Zhang, IBM Watson
  • Hua-Ping Zhang, Beijing Institute of Technology
  • Ming Zhou, Microsoft Research Asia
  • Deyu Zhou, Southeast University


Past SocialNLP

Anti-Harassment Policy

The open exchange of ideas, the freedom of thought and expression, and respectful scientific debate are central to the aims and goals of the ACL. These require a community and an environment that recognizes the inherent worth of every person and group, that fosters dignity, understanding, and mutual respect, and that embraces diversity. For these reasons, ACL is dedicated to providing a harassment-free experience for all the members, as well as participants at our events and in our programs.

Harassment and hostile behavior are unwelcome at any ACL conference, associated event, or in ACL-affiliated on-line discussions. This includes: speech or behavior that intimidates, creates discomfort, or interferes with a person’s participation or opportunity for participation in a conference or an event. We aim for ACL-related activities to be an environment where harassment in any form does not happen, including but not limited to: harassment based on race, gender, religion, age, color, appearance, national origin, ancestry, disability, sexual orientation, or gender identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking, harassing photography or recording, inappropriate physical contact, and unwelcome sexual attention. The policy is not intended to inhibit challenging scientific debate, but rather to promote it through ensuring that all are welcome to participate in shared spirit of scientific inquiry.

It is the responsibility of the community as a whole to promote an inclusive and positive environment for our scholarly activities. In addition, anyone who experiences harassment or hostile behavior may contact any current member of the ACL Executive Committee ([1]) or contact Priscilla Rasmussen (, who is usually available at the registration desk during ACL conferences. Members of the executive committee will be instructed to keep any such contact in strict confidence, and those who approach the committee will be consulted before any actions are taken.


This policy should be posted prominently on all ACL conference and workshop webpages, with a notice of a list of people who can be contacted by community members with concerns. In case of a formal complaint, the contacted ACL representative(s) will first speak to all parties involved to try to resolve the issue without presupposition of guilt.


If you are considering submitting to the workshop and have questions regarding the workshop scope or need further information, please do not hesitate to send e-mail to both lwku [AT] and chengte [AT] Thanks!