This 12th Workshop on Asian Language Resources, held on December 12, 2016, at Osaka International Convention Center, annexed to COLING 2016, focuses on language resources for Asian region, which has more than 2,200 spoken languages. There are now increasing efforts to build multi-lingual, multi-modal language resources, with varying levels of annotations, through manual, semi-automatic and automatic approaches, as the use of ICT spreads across the region. Correspondingly, the development of practical applications of these language resources has also been rapidly advancing. The ALR workshop series aims to forge a better coordination and collaboration among researchers on these languages and in the NLP community in general, to develop common frameworks and processes for promoting these activities. This year's workshop collaborates with ISO/TC 37/SC 4, which develops international standards for "Language Resources Management", and ELRA, which is campaigning LRE map, in order to integrate efforts to develop an Asian language resource map.

To achieve these goals, the workshop calls for original and unpublished technical, strategy, policy and survey papers concerning, but not limited to, the following topics:
  • Text corpora, speech corpora, corpora in other modalities or media (such as video for sign languages or affective computing)
  • Lexicons, grammars, machine-readable dictionaries, domain specific terminology
  • Ontologies, knowledge representation, semantic web technologies
  • Infrastructure for constructing and sharing language resources
  • Exchange and annotation schemata, exchange formats
  • Standards or specifications for language resources and content management
  • Language resources for basic NLP tasks (word segmentation, named entity recognition, syntactic analysis, semantic analysis, discourse analysis, speech recognition, speech synthesis, etc.)
  • Language Resources for HLT applications (such as text generation, information retrieval, information extraction, question answering, machine translation, speech translation, reasoning, affective computing, etc.)
  • Strategies and priorities for cooperation and collaboration
  • Licensing and copyright issues


Monday, December 12, 2016

09:00–09:05 Opening
09:05–10:25 Oral Session 1: Annotation
  An extension of ISO-Space for annotating object direction
Daiki Gotou, Hitoshi Nishikawa and Takenobu Tokunaga
  Annotation and Analysis of Discourse Relations, Temporal Relations and Multi-Layered Situational Relations in Japanese Texts
Kimi Kaneko, Saku Sugawara, Koji Mineshima and Daisuke Bekki
  Developing Universal Dependencies for Mandarin Chinese
Herman Leung, Rafaël Poiret, Tak-sum Wong, Xinying Chen, Kim Gerdes and John Lee
  Developing Corpus of Lecture Utterances Aligned to Slide Components
Ryo Minamiguchi and Masatoshi Tsuchiya
10:25–10:35 Coffee Break
10:35–11:55 Oral Session 2: Data
  VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization
Minh-Tien Nguyen, Dac Viet Lai, Phong-Khac Do, Duc-Vu Tran and Minh-Le Nguyen
  BCCWJ-DepPara: A Syntactic Annotation Treebank on the ‘Balanced Corpus of Contemporary Written Japanese’
Masayuki Asahara and Yuji Matsumoto
  SCTB: A Chinese Treebank in Scientific Domain
Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara and Sadao Kurohashi
  Big Community Data before World Wide Web Era
Tomoya Iwakura, Tetsuro Takahashi, Akihiro Ohtani and Kunio Matsui
12:00–14:00 Lunch Break
14:00–14:30 Poster session
  An Overview of BPPT’s Indonesian Language Resources
Gunarso Gunarso and Hammam Riza
  Creating Japanese Political Corpus from Local Assembly Minutes of 47 prefectures
Yasutomo Kimura, Keiichi Takamaru, Takuma Tanaka, Akio Kobayashi, Hiroki Sakaji, Yuzu Uchida, Hokuto Ototake and Shigeru Masuyama
  Selective Annotation of Sentence Parts: Identification of Relevant Sub-sentential Units
Ge Xu, Xiaoyan Yang and Chu-Ren Huang
14:35–15:55 Oral Session 3: Analysis
  The Kyutech corpus and topic segmentation using a combined method
Takashi Yamamura, Kazutaka Shimada and Shintaro Kawahara
  Automatic Evaluation of Commonsense Knowledge for Refining Japanese ConceptNet
Seiya Shudo, Rafal Rzepka and Kenji Araki
  SAMER: A Semi-Automatically Created Lexical Resource for Arabic Verbal Multiword Expressions Tokens Paradigm and their Morphosyntactic Features
Mohamed Al-Badrashiny, Abdelati Hawwari, Mahmoud Ghoneim and Mona Diab
  Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets
Tuan Anh Le, David Moeljadi, Yasuhide Miura and Tomoko Ohkuma
15:55–16:55 TC37 Session

Introducing ISO/TC 37/SC 4 Language Resources Management Activities
Nicoletta Calzolari

Towards Application of ISO-TimeML and ISOspace to Korean and other Asian Languages
Kiyong Lee

Standardization of Numerical Expression Extraction and Representations in English and Other Languages
Haitao Wang

Design of ISLRN for Asian Language Resources
Khalid Choukri
16:55–17:00 Closing


A PDF version of the proceedings is available: view the proceedings / download the proceedings.

Paper Submissions

Please follow COLING 2016 INSTRUCTIONS FOR AUTHORS except that the relevant dates for this workshop are listed below and the START system entrance for uploading your papers is https://www.softconf.com/coling2016/ALR12/.

Important Dates

  • 2016-09-28: paper submission deadline (extended)
  • 2016-10-16: notification of acceptance or rejection
  • 2016-10-30: camera-ready copies due
  • 2016-12-12: ALR12 Workshop at COLING 2016

Past Workshops

Yusuke Matsubara,
Nov 29, 2016, 6:41 PM