With social media services’ rise of popularity, including general-purpose Microblogs such as Facebook, Plurk and Twitter, goal-oriented services such as Linkedln (for professional occupation), Del.icio.us. (a social bookmarking service) and Foursquare (a check-in service for mobile devices), and Web 2.0-based large-scale knowledgebase such as Wikipedia and common-sense corpus, now researchers can assess heterogeneous information of the target human/object that includes not only text content but also meta-data, or even the social relationships among persons.
Furthermore, the content on social media and Web 2.0 platforms is different from that on others in terms of style, tone, purpose, etc. For instance, posts on twitter are limited in size, thus can contain jargons, emoticons, or abbreviations which usually do not follow formal grammar. It is not suitable to apply existing natural language techniques on such content because they are not tailored to do so. For instance, standard summarization techniques might not be suitable for Plurk posts that are relatively short and contain responses from multiple friends; and sentiment dictionaries learned from news corpus might not be suitable for sentiment detection tasks on Microblogs.
As it is generally believed social media has become one of the major means for communication and content producing, while such trend is not likely to fade away, being able to process content from social media platforms does bring a lot of values in real-world applications. Furthermore, due to the change of the style to the content and the availability of heterogeneous resources (e.g. social relationship among people) one can obtain, novel NLP techniques that are designed specifically for such platform and can potentially integrate or learn information from different sources are highly demanded. Below we highlight some (non-exclusive) important themes in this direction.