Pontic project: Theoretical problems and modern technologies

Maxim Kisilier

Saint Petersburg University,

Institute for Linguistic Studies RAS

Pontic is one of the most widespread Modern Greek dialects. Along with Turkey, Greece and Australia, a major Pontic-speaking community is in Russia. The determination of its exact number requires a separate investigation but only in the South of Russia, Pontic is spoken more or less fluently by more than 10.000 people. There are also several thousand speakers of Pontic in the Siberia and more than 200 Pontic Greeks both in Moscow and in Saint-Petersburg.[1]

Although Pontic studies have a relatively long history, we still lack good dictionaries[2] and general grammatical descriptions of Pontic with comparative analysis of regional peculiarities and variations.

In 2019, I started a project aimed at creation of a set of digital high-tech linguistic tools for Pontic studies designed not only for linguists but also for common users interested in Pontic. So, they can be used for both scholarly and educational purposes. These tools are

(1) Pontic online dictionary,

(2) Pontic language corpus and

(3) morphological analyzer.

First (demo) version of the online dictionary (with only 17 randomly chosen entries) was created in December 2019 (http://pontik.tw1.ru/en/, access date 29.02.2020), and in January 2020 it was presented to Russian Pontic community which decided to support the project.

Pontic online dictionary has multilingual interface (Russian, Modern Greek, English and Turkish). It can provide the user not only with translations from Pontic into one of the four interface languages (the complete version can translate into Pontic as well) and examples of usage, but also gives local variants (with sound recorded from native speakers and full paradigms), information on morphological derivation, etymology, synonyms and antonyms, lexical and semantical classes, etc. It is important that the dictionary will result from collaboration of linguists and local Pontic communities (I already have assistance from Pontic speakers from South Russia and Trabzon but I intend to involve the data from other regions too.).

Pontic language corpus will be based on the platform “Tsakorpus” which was specially created by Timofey Arkhangelsky in 2018 for Tsakonian corpus (it was not started due to economic reasons), Corpus of Modern Greek (in progress, the older version of this corpus is still found online: http://web-corpora.net/GreekCorpus/search/?interface_language=en, access date 29.02.2020) and Albanian National Corpus (http://albanian.web-corpora.net/index.html, access date 29.02.2020). The Pontic language corpus will include books published in Pontic in the USSR in 1930–1937, Pontic folklore, field research data, texts from online blogs, modern Pontic songs and poetry.[3]

Morphological analyzer is based rather on corpus data than on existing linguistic descriptions and its task is recognition of grammatical forms and compilation of paradigms (the both processes should be performed automatically with the help of examples from corpus). This tool may become an important assistant in interlinear morphemic glossing of Pontic texts.

In my report I do not intend to describe thoroughly the online dictionary, language corpus and morphological analyzer. I would like to make emphasis on linguistic (theoretical) problems I have to deal with while working on the project. These are, for example:

1) Which varieties of Pontic exist now and differences they demonstrate

2) Which grammar forms are relevant for morphological analyzer

3) Number of declension/conjugation types and if there is any way to combine at least some of these types

4) Which was the language of Soviet Pontic literature — real dialect or some extensions of Demotic Greek.

I think that results of the Pontic project will be important not only for Pontic studies but for other Greek dialects too, and only as a theoretical background but as a ready-for-implementation technology.

References

Dawkins, R. M. 1916. Modern Greek in Asia Minor. A study of the dialects of Sílli, Cappadocia and Phárasa with grammar, texts and glossary, with a chapter on the subject-matter of the folk-tales by William R. Halliday. Cambridge: Camridge University Press.

Tursun, V. 2019. Romeika-Türkçe sözlük. Trabzon Rumcası. İstanbul: Heyamola yayınlatı.

Зимов, Д. И. 2020. Русско-понтийский словарь обиходной лексики. Пятигорск: Пятигорский государственный университет.

[1] It is noteworthy that most Pontic Greeks from the Crimea received Soviet citizenship only in the early 1970s.

[2] Even the recent dictionaries are either very unprofessional and have many structural and grammatical mistakes (Tursun 2019), or are based on older dictionaries without taking into account modern data (Зимов 2020).

[3] I also hope that we shall be allowed to use the collections of the Centre of the Asia Minor in Athens.