Computer Tools

Abstract

A web-based database is developed to provide psycholinguists with a large-scale phonological representation system for all Mandarin Chinese monosyllables. The construction of the system was based on the slot-based phonological pattern generator (PatPho), with an adequate consideration of the language-specific features of the Chinese phonology. Users can retrieve the relevant phonological representations through an interactive query system on the web. The query outcomes can be saved in a number of formats, such as Excel spreadsheets, for further analyses. This representation system can be used for a variety of purposes, in particular, connectionist language modeling, and more generally, the study of Chinese phonology.

© Xiaowei Zhao & Ping Li, January, 2009





3. Contextual Self-organizing Map Package

Abstract

This site provides a downloadable version of the Contextual Self-Organizing Map (click here to download), a software package that applies a corpus-based algorithm to derive semantic representations of words. The algorithm relies on the analyses of contextual information extracted from a text corpus, that is, analyses of word co-occurrences in a large-scale electronic database of text. Specifically, a target word is represented as the combination of the average of all the words preceding the target and all the words following it among all the text within a corpus. This representation can be further processed by a self-organizing map (SOM, Kohonen, 2001), an unsupervised neural network model that provides efficient data extraction and representation. Due to its topography-preserving features, the SOM projects the statistical structure of the context onto a 2-D space, such that words with similar meanings cluster together, forming groups that correspond to lexically meaningful categories. Such a representation system has applications in a variety of contexts, including computational modeling of language acquisition and processing. In this package we present specific examples in two languages (English and Chinese) to illustrate how the method is used to extract semantic representations for words.

© Xiaowei Zhao & Ping Li, August, 2010


Emmanuel  College