14NNLPRS

14th National Natural Language Processing Research Symposium

May 11-12, 2018, UC Theater

University of the Cordilleras, Baguio City

Pictures from the event: link

Clich this link for 15NNLPRS

Related Event:
Oriental COCOSDA 2026 | O-COCOSDA 2026

Call for Participation

Organized by the Computing Society of the Philippines – Special Interest Group on Natural Language Processing (CSP SIG-NLP), National University (NU), and University of the Cordilleras (UC)

The 14th National Natural Language Processing Research Symposium (14NNLPRS) will take place on May 11-12, 2018 at the University of the Cordilleras, Baguio City. NNLPRS is a regular gathering of researchers from different fields working on the analysis, processing, and generation of human languages. This event is intended to provide a forum for the conduct of more research and networking. The past symposia have covered a wide range of topics in NLP and were graced by international invited speakers:

Prof. Robert Dale of Macquarie University, Australia in 2004;
Prof. Chu-Ren Huang from Institute of Linguistics in Academia Sinica of Taiwan in 2007;
Mr. Adam Pease of Articulate Software USA, and Prof. Gerald Nelson of the Chinese University of Hong Kong both in 2009;
Prof. Dekai Wu from the Hong Kong University of Science and Technology in 2010;
Prof. Hwee Tou Ng from the National University of Singapore in 2011;
Prof. Tod Allman of the Graduate Institute of Applied Linguistics and Prof. Chiu-Yu Tseng from Academia Sinica of iwan in 2014;
Prof. Tod Allman of the Graduate Institute of Applied Linguistics and Prof. Enya Kong Tang from Linton University of Malaysia in 2015;
Prof. Tod Allman of the Graduate Institute of Applied Linguistics and Prof. Chu Ren Huang from Hong Kong Polytechnic University in 2016; and
Prof. Tod Allman of the Graduate Institute of Applied Linguistics in 2017.

With the theme “Humanitarian Technology”, the 14NNLPRS will be a venue for discussing the various challenges and opportunities that we face in integrating human language technologies to analyze various types of data towards addressing societal problems.

Relevant topics include but not limited to the following areas:

LANGUAGE

Corpus Building
Dictionary and Philippine Languages
Discourse Analysis
Phonology and morphology
Language Resources and Evaluation
Language Clustering and Mapping
Language Learning
Lexicology
Multilingual Speech Corpora
Prosody
Sociolinguistics
Speech Databases
Standardization
Syntax and grammar

COMPUTING

Automatic Speech Recognition
Culturomics
Information Retrieval
Machine Learning for Natural Language
Machine Translation
Named Entity Recognition
Natural Language Generation
Segmentation and Labeling
Sentiment Analysis and Opinion Mining
Sign Language Processing
Speech Synthesis
Text Summarization and Generation
Word Sense Disambiguation
WordNets and Ontologies

CHED Endorsement

14NNLPRS CHED Endorsement.pdf

Invitation Letter

Invitation Letter.pdf

Program

14NNLPRS Program v4.pdf

Speakers

Tod Allman

Tod Allman has been working in the field of Natural Language Generation for the past twenty years. He and his colleagues have designed and developed a linguistically based natural language generator called Linguist’s Assistant (LA). LA produces high quality draft translations in a wide variety of languages, particularly minority and endangered languages. Linguists may use LA to simultaneously document a language, and also produce initial draft translations of significant texts in the language. When experienced mother-tongue translators edit the translations produced by this system into publishable texts, their productivity is typically quadrupled without any loss of quality. LA incorporates extensive typological, semantic, syntactic, and discourse research into its semantic representational system and its transfer and synthesizing grammars. Tod has worked with linguists and mother-tongue speakers in order to develop computational lexicons and grammars for a variety of languages including Korean, Kewa (Papua New Guinea), Jula (Cote d’Ivoire), Angas (Nigeria), Chinantec (Mexico), and Nsenga (Zambia). He is now living in the Manila area, and is presently building lexicons and grammars for five languages: Tagalog, Ayta Mag-Indi, and Botolan Sambali which are spoken here in the Philippines, Ibwe which is spoken in Malaysia, and Hlai which is spoken in Taiwan. He hopes that the texts generated by LA will empower the speakers of these languages by enabling them to participate in the larger world, and by providing them with vital information which helps them live longer, healthier, and more productive lives.

Techniques for Accelerating the Development of Computational Lexicons and Grammars for the Languages of the Philippines

Linguist’s Assistant (LA) is a linguistically based natural language generator (NLG) designed and developed entirely from a linguist’s perspective. The system incorporates extensive typological, semantic, syntactic, and discourse research into its semantic representational system and its transfer and synthesizing grammars. It is presently being used to translate numerous texts into languages from several diverse language families, including three languages spoken here in the Philippines.

In order to produce translations of texts in a language, every NLG requires three components developed specifically for that language: 1) a lexicon, 2) a transfer grammar, and 3) a synthesizing grammar. The development of these three components requires considerable time and effort by a computational linguist. The author spent approximately one year developing the lexicon, transfer grammar, and synthesizing grammar for Tagalog. In order to reduce the time and effort required to build the lexicons, transfer grammars, and synthesizing grammars for other languages in the Philippines, new techniques have been developed to accelerate the process. This presentation will summarize these new techniques; a brief overview follows.

1) Lexicons: Although the languages of the Philippines are closely related to one another, their lexicons differ significantly. However, the Summer Institute of Linguistics (SIL), which has been doing linguistic research and translation work in the Philippines for more than 50 years, has developed an archive of lexicons for more than 80 of the languages spoken here. All of these lexicons are in a standardized format, and are freely available from SIL. So LA has been modified to import the lexical data from these files. LA is used to translate several types of texts into a language, and those texts always include numerous proper names. Since those names aren’t in the lexical files archived by SIL, an additional file that contains the Tagalog equivalents for all of the proper names was prepared in the same format as the SIL files. After importing the lexical file and the proper names file, all that remains to complete the lexicon for a particular language is to write the rules that produce the various forms of the verbs (e.g., Actor Focus Perfective, Object Focus Imperfective, etc.), and then link the source concepts to the equivalent target words. Writing the rules and linking the concepts to the target words takes just a fraction of the time that was required to develop the entire lexicon manually.

2) Transfer grammars: The purpose of the transfer grammar in LA is to restructure the semantic representations into a new deep structure representation that is appropriate for each target language. The semantic representations serve as the source documents used by LA during the translation process and are heavily influenced by English. However, since the languages of the Philippines are closely related to one another, it’s plausible that they all have the same deep structure representations. This hypothesis has been confirmed with Ayta Mag-Indi and Botolan Sambal, but remains to be confirmed for other languages. If this hypothesis proves true, the transfer grammar that was developed for Tagalog will work well for the other languages spoken here, thus eliminating the need to develop new transfer grammars for each language.

3) Synthesizing grammars: The synthesizing grammar in LA synthesizes the final surface forms of the translated texts. The data in the synthesizing grammars are drastically different for each language, but the majority of the data necessary in the synthesizing grammar can be imported from tables in Word documents. Mother-tongue speakers of a language are easily able to edit the Tagalog data in the grammar Word documents so that it accommodates their languages. Then that data can be imported from the Word documents into the synthesizing grammar. This drastically reduces the amount of work required to develop the synthesizing grammar for a particular language.

The conclusion of this research is that the process of developing a lexicon, transfer grammar, and synthesizing grammar for a language related to Tagalog has been significantly reduced when compared to what it was a year ago. The author’s hopes are high that this process may be repeated for many of the languages spoken in the Philippines.

Ramon Rodriguez

A graduate of Bachelor of Science in Computer Science, he took his master’s degree from Ateneo De Manila University. He is pursuing his PhD in Computer Science at De La Salle University. He was a short-term scholar at the University of California, Berkeley from October 2017 to January 2018. With more than 15 years of teaching and admin experience (he is currently the Program Chair for Computer Science at National University), his research interest involves Machine Learning, Signal Processing and Software Engineering.

Convolutional Neural Network: An approach to classification

A convolutional neural network (CNN) is a category of neural networks that is effective in machine vision to recognize and classify images. CCN can also be used for text classification, signal processing and NLP-related tasks. Discussion on understanding convolution and the core convolutional neural networks are the focus of the talk. Image and text processing using CNN will also be tackled.

Registration

Type

Undergraduate

Graduate

Regular

Onsite

PHP 800.00

PHP 1,800.00

PHP 4,000.00

Pre-Registration

https://goo.gl/forms/VQKGUAMg4qrLd0233

*regular registration includes

CSP membership

Payment can be made through bank deposit.

- Name: COMPUTING SOCIETY OF THE PHILIPPINES, INC.
- Bank: BANCO DE ORO
- Branch: Loyola Heights - Katipunan, Quezon City
- Savings Account Number: 3570-0089-29
- A copy of the deposit slip should be emailed to: naoco@national-u.edu.ph
- The deposit slip should also be presented during onsite registration.