第13回勉強会

投稿日: 2019/08/17 0:35:04

日時:2019年8月22日(木) 15:00-16:30

場所:東京大学大学院教育学研究科第一会議室(本郷キャンパス教育学部棟2階)

http://www.p.u-tokyo.ac.jp/cg

講演者:

Burcu Canさん (トルコ, Hacettepe University)

タイトル:

Recent Trends in Natural Language Processing of Agglutinative Languages

概要:

Agglutinative languages are built upon words that are made up of a sequence of morphemes. Although morphemic structure of the language enables a productive word generation that handles both syntax and semantics during the generation of new words, in other respects this production causes sparsity in the language, thereby brings one of the most serious problems in natural language processing. One solution to mitigate the sparsity in the language is morphological segmentation. One of the topics that I will mention about in this talk is our recent work on unsupervised morphological segmentation using non-parametric Bayesian models.

The sparsity problem is still there with the rise of representation learning. We could represent each word in a low dimensional space using their distributional features in a large corpus. However, if the word does not exist or it is not frequent enough, how should we represent this word in the same space? Most of the recent work handles this problem by processing each word as a set of characters where the representation is obtained through the word's characters. Here I will question whether a word should be represented by its characters, or its morphemes. How to represent words?