Until a decade ago, the fields of information extractions (IE) and knowledge acquisition (KA) were limited to identifying and extracting named entities, semantic and ontological relations, events, templates, and facts in relatively small text corpora using a small variety of external resources such as gazetteers, thesauri, and lexical hierarchies.
Today everything has changed. The size of corpora has grown dramatically: using Gigaword-scale data is common, and it is almost standard to use the Web, which contains quadrillions of words, or at least the Google Web 1T 5-grams. More importantly, new types of communication have emerged, such as chats, blogs and, in the last 2-3 years, Twitter, whose informal language poses many challenges to automatic IE and KA, yet they are becoming increasingly important, e.g., for learning customer opinions on various products and services. Social network analysis is another emerging topic, where data is naturally much more interconnected than in the rest of the Web.
All these recent developments have posed not only new challenges, but have also created a number of opportunities, opening new research directions, and offering new useful resources. For example, the growth of Wikipedia has given rise to DBpedia and other collaboratively-created resources such as Freebase. Today, IE and KA researchers can even create annotations and resources on demand as they need them for a very low price using crowd-sourcing tools such as Amazon’s Mechanical Turk.
The workshop will provide a place for researchers to discuss all these exciting developments and their implications for the future of IE and KA.
Topics of Interest
The topics of interest include but are not limited to the following:
- Knowledge discovery and mining on the Web
- Social networks and folksonomy analysis
- Ontology population and induction
- Cross-lingual and multi-lingual approaches to IE and KA
- Using crowd-sourcing tools, such as Mechanical Turk, for IE and KA
- IE and KA and the Semantic Web
- Multi-lingual named entity recognition and disambiguation
- Event extraction
- Paraphrase extraction
- Temporal data mining
- Using resources such as Wikipedia, DBpedia, Freebase
- IE and KA from informal text: chats, instant messages, blogs, Twitter
Multiple submission policy: We welcome papers that are under review for other venues, but, in the event of multiple acceptances, authors are requested to notify us and choose which meeting to present and publish the work at as soon as possible - we cannot accept for publication or presentation work that will be (or has been) published elsewhere.
Reviewing: Reviewing will be blind. No information identifying the authors should be in the paper: this includes not only the authors' names and affiliations, but also self-references that reveal authors' identities; for example, "We have previously shown (Smith 1999)" should be changed to "Smith (1999) has previously shown".
Paper length and presentation: We invite long (8) and short (4) papers. Accepted short papers will be presented either as short oral presentations or as posters.
Submission format: Authors are strongly encouraged to use the LaTeX style files or MSWord equivalents below -- these formats will ease the transition to the proceedings version:
Notification: 25 July 2011
Camera-ready version due: 22 August 2011
Workshop: 15-16 September 2011