14th National Natural Language Processing Research Symposium

May 11-12, 2018, UC Theater

University of the Cordilleras, Baguio City

Pictures from the event: link

Call for Participation

Organized by the Computing Society of the Philippines – Special Interest Group on Natural Language Processing (CSP SIG-NLP), National University (NU), and University of the Cordilleras

The 14th National Natural Language Processing Research Symposium (14NNLPRS) will take place on May 11-12, 2018 at the University of the Cordilleras, Baguio City. NNLPRS is a regular gathering of researchers from different fields working on the analysis, processing, and generation of human languages. This event is intended to provide a forum for the conduct of more research and networking. The past symposia have covered a wide range of topics in NLP and were graced by international invited speakers:

  • Prof. Robert Dale of Macquarie University, Australia in 2004;
  • Prof. Chu-Ren Huang from Institute of Linguistics in Academia Sinica of Taiwan in 2007;
  • Mr. Adam Pease of Articulate Software USA, and Prof. Gerald Nelson of the Chinese University of Hong Kong both in 2009;
  • Prof. Dekai Wu from the Hong Kong University of Science and Technology in 2010;
  • Prof. Hwee Tou Ng from the National University of Singapore in 2011;
  • Prof. Tod Allman of the Graduate Institute of Applied Linguistics and Prof. Chiu-Yu Tseng from Academia Sinica of iwan in 2014;
  • Prof. Tod Allman of the Graduate Institute of Applied Linguistics and Prof. Enya Kong Tang from Linton University of Malaysia in 2015;
  • Prof. Tod Allman of the Graduate Institute of Applied Linguistics and Prof. Chu Ren Huang from Hong Kong Polytechnic University in 2016; and
  • Prof. Tod Allman of the Graduate Institute of Applied Linguistics in 2017.

With the theme “Humanitarian Technology”, the 14NNLPRS will be a venue for discussing the various challenges and opportunities that we face in integrating human language technologies to analyze various types of data towards addressing societal problems.

Relevant topics include but not limited to the following areas:

LANGUAGE

  • Corpus Building
  • Dictionary and Philippine Languages
  • Discourse Analysis
  • Phonology and morphology
  • Language Resources and Evaluation
  • Language Clustering and Mapping
  • Language Learning
  • Lexicology
  • Multilingual Speech Corpora
  • Prosody
  • Sociolinguistics
  • Speech Databases
  • Standardization
  • Syntax and grammar

COMPUTING

  • Automatic Speech Recognition
  • Culturomics
  • Information Retrieval
  • Machine Learning for Natural Language
  • Machine Translation
  • Named Entity Recognition
  • Natural Language Generation
  • Segmentation and Labeling
  • Sentiment Analysis and Opinion Mining
  • Sign Language Processing
  • Speech Synthesis
  • Text Summarization and Generation
  • Word Sense Disambiguation
  • WordNets and Ontologies

CHED Endorsement

14NNLPRS CHED Endorsement.pdf

Invitation Letter

Invitation Letter.pdf

Program

14NNLPRS Program v4.pdf

Speakers

Tod Allman

Tod Allman has been working in the field of Natural Language Generation for the past twenty years. He and his colleagues have designed and developed a linguistically based natural language generator called Linguist’s Assistant (LA). LA produces high quality draft translations in a wide variety of languages, particularly minority and endangered languages. Linguists may use LA to simultaneously document a language, and also produce initial draft translations of significant texts in the language. When experienced mother-tongue translators edit the translations produced by this system into publishable texts, their productivity is typically quadrupled without any loss of quality. LA incorporates extensive typological, semantic, syntactic, and discourse research into its semantic representational system and its transfer and synthesizing grammars. Tod has worked with linguists and mother-tongue speakers in order to develop computational lexicons and grammars for a variety of languages including Korean, Kewa (Papua New Guinea), Jula (Cote d’Ivoire), Angas (Nigeria), Chinantec (Mexico), and Nsenga (Zambia). He is now living in the Manila area, and is presently building lexicons and grammars for five languages: Tagalog, Ayta Mag-Indi, and Botolan Sambali which are spoken here in the Philippines, Ibwe which is spoken in Malaysia, and Hlai which is spoken in Taiwan. He hopes that the texts generated by LA will empower the speakers of these languages by enabling them to participate in the larger world, and by providing them with vital information which helps them live longer, healthier, and more productive lives.

Techniques for Accelerating the Development of Computational Lexicons and Grammars for the Languages of the Philippines

Linguist’s Assistant (LA) is a linguistically based natural language generator (NLG) designed and developed entirely from a linguist’s perspective. The system incorporates extensive typological, semantic, syntactic, and discourse research into its semantic representational system and its transfer and synthesizing grammars. It is presently being used to translate numerous texts into languages from several diverse language families, including three languages spoken here in the Philippines.

In order to produce translations of texts in a language, every NLG requires three components developed specifically for that language: 1) a lexicon, 2) a transfer grammar, and 3) a synthesizing grammar. The development of these three components requires considerable time and effort by a computational linguist. The author spent approximately one year developing the lexicon, transfer grammar, and synthesizing grammar for Tagalog. In order to reduce the time and effort required to build the lexicons, transfer grammars, and synthesizing grammars for other languages in the Philippines, new techniques have been developed to accelerate the process. This presentation will summarize these new techniques; a brief overview follows.

1) Lexicons: Although the languages of the Philippines are closely related to one another, their lexicons differ significantly. However, the Summer Institute of Linguistics (SIL), which has been doing linguistic research and translation work in the Philippines for more than 50 years, has developed an archive of lexicons for more than 80 of the languages spoken here. All of these lexicons are in a standardized format, and are freely available from SIL. So LA has been modified to import the lexical data from these files. LA is used to translate several types of texts into a language, and those texts always include numerous proper names. Since those names aren’t in the lexical files archived by SIL, an additional file that contains the Tagalog equivalents for all of the proper names was prepared in the same format as the SIL files. After importing the lexical file and the proper names file, all that remains to complete the lexicon for a particular language is to write the rules that produce the various forms of the verbs (e.g., Actor Focus Perfective, Object Focus Imperfective, etc.), and then link the source concepts to the equivalent target words. Writing the rules and linking the concepts to the target words takes just a fraction of the time that was required to develop the entire lexicon manually.

2) Transfer grammars: The purpose of the transfer grammar in LA is to restructure the semantic representations into a new deep structure representation that is appropriate for each target language. The semantic representations serve as the source documents used by LA during the translation process and are heavily influenced by English. However, since the languages of the Philippines are closely related to one another, it’s plausible that they all have the same deep structure representations. This hypothesis has been confirmed with Ayta Mag-Indi and Botolan Sambal, but remains to be confirmed for other languages. If this hypothesis proves true, the transfer grammar that was developed for Tagalog will work well for the other languages spoken here, thus eliminating the need to develop new transfer grammars for each language.

3) Synthesizing grammars: The synthesizing grammar in LA synthesizes the final surface forms of the translated texts. The data in the synthesizing grammars are drastically different for each language, but the majority of the data necessary in the synthesizing grammar can be imported from tables in Word documents. Mother-tongue speakers of a language are easily able to edit the Tagalog data in the grammar Word documents so that it accommodates their languages. Then that data can be imported from the Word documents into the synthesizing grammar. This drastically reduces the amount of work required to develop the synthesizing grammar for a particular language.

The conclusion of this research is that the process of developing a lexicon, transfer grammar, and synthesizing grammar for a language related to Tagalog has been significantly reduced when compared to what it was a year ago. The author’s hopes are high that this process may be repeated for many of the languages spoken in the Philippines.


Ramon Rodriguez

A graduate of Bachelor of Science in Computer Science, he took his master’s degree from Ateneo De Manila University. He is pursuing his PhD in Computer Science at De La Salle University. He was a short-term scholar at the University of California, Berkeley from October 2017 to January 2018. With more than 15 years of teaching and admin experience (he is currently the Program Chair for Computer Science at National University), his research interest involves Machine Learning, Signal Processing and Software Engineering.

Convolutional Neural Network: An approach to classification

A convolutional neural network (CNN) is a category of neural networks that is effective in machine vision to recognize and classify images. CCN can also be used for text classification, signal processing and NLP-related tasks. Discussion on understanding convolution and the core convolutional neural networks are the focus of the talk. Image and text processing using CNN will also be tackled.

Registration

Type

Undergraduate

Graduate

Regular

Onsite

PHP 800.00

PHP 1,800.00

PHP 4,000.00

Pre-Registration

https://goo.gl/forms/VQKGUAMg4qrLd0233

*regular registration includes

CSP membership

Payment can be made through bank deposit.

    • Name: COMPUTING SOCIETY OF THE PHILIPPINES, INC.
    • Bank: BANCO DE ORO
    • Branch: Loyola Heights - Katipunan, Quezon City
    • Savings Account Number: 3570-0089-29
    • A copy of the deposit slip should be emailed to: naoco@national-u.edu.ph
    • The deposit slip should also be presented during onsite registration.

Scientific Review Committee

  • Katrina Joy Abriol-Santos, University of the Philippines Los Banos
  • Charibeth Cheng, De La Salle University
  • Maria Art Antonette Clariño, University of the Philippines Los Baños
  • Angelica Dela Cruz, National University
  • Matthew Phillip Go, Industry
  • Joohyuk Lim, Daehan College of Business and Technology
  • Erlyn Manguilimotan, Weathernews, Inc.
  • Dalos Miguel, Saint Louis University
  • Nathaniel Oco, National University
  • Ethel Ong, De La Salle University
  • Thelma Palaoag, University of the Cordilleras
  • Rodolfo Jr Raga, Jose Rizal University
  • Reginald Recario, University of the Philippines Los Baños
  • Ramon Rodriguez, National University
  • Julie Ann Salido, Aklan State University
  • Briane Paul Samson, De La Salle University
  • Leif Romeritch Syliongka, Industry
  • Aileen Joan Vicente, University of the Philippines Cebu
  • John Noel Victorino, Ateneo de Manila University

Local Information

Recommended Accommodation

  • Microtel by Wyndham
  • Holiday Park Hotel
  • Crown Legacy
  • Bloomfield Hotel
  • City Travel Hotel
  • Hotel Veniz
  • City Light Hotel
  • Hotel Enrico
  • YMCA Hotel
  • Hotel 45

Activities and Tourist Sites

  • BenCab Museum
  • Tam-Awan Village
  • Ifugao Woodcarvers' Village
  • Butterfly Garden
  • Camp John Hay
  • Asin Hot Springs
  • Horseback Riding
  • Strawberry Picking
  • Picnic at Camp John Hay
  • Sip Baguio Coffee

Previous Events

Contact Information

Nathaniel Oco

  • General Chair, 14NNLPRS
  • Research Fellow, National University (Philippines)
  • +632-712-1900 loc. 459
  • naoco [at] national-u [dot] edu [dot] ph

Thelma Palaoag

  • Co-chair, 14NNLPRS
  • Research Coordinator, College of Information Technology and Computer Science
  • University of the Cordilleras
  • +63-74-442-3316
  • tpalaoag [at] gmail [dot] com

Local Organizers

  • Jeffrey Ingosan
  • Thelma Palaoag
  • Josephine Dela Cruz
  • Melinda Beninsig
  • Jane Abenes