Corpus of Kansai Vernacular Japanese

Getting ready to interview a speaker

Introduction

The Corpus of Kansai Vernacular Japanese is a collection of about 200 (and growing!) sociolinguistic interviews conducted by university students with either a family member or a close acquaintance in a casual setting. In general, the interviewer and interviewee began the interview in polite, standard Japanese but gradually relaxed and switched to the vernacular style that they were accustomed to using with each other. The age of the interviewees ranges from 15 years old to 83 years old. The corpus has been parsed and tagged with part of speech information using the Mecab parser. Every line of data was checked by hand, and mistakes were corrected to the best of our abilities. The corpus is approximately two million words in size.

A very small sample (part of speech tagged data)...

A very small sample (untagged data)...