TexAFon

TexAFon is a set of Catalan/Spanish text processing tools for:

- Automatic normalization
- Phonetic transcription
- Syllabication
- Prosodic segmentation
- Stress prediction

TexAFon has been conceived to perform all the linguistic processing usually applied to the input text in text-to-speech systems. Its output can be currently used to generate speech with several synthesis engines:

- Cereproc
- MBROLA

TexAfon can also be used for other purposes, such as:

- Automatic phonetic transcription of texts
- Automatic building of phonetic dictionaries

TextAFon was jointly developed by researchers of the Computational Linguistics Group (GLiCom) of Pompeu Fabra University and the Speech and Language Group at Barcelona Media.

Fully developed in Python, linguistic knowledge in TexAFon has been implemented in the form of:

- Python procedures, containing the linguistic rules;
- Python lists, containing non-editable information;
- External dictionaries, stored in text files (then editable by external users).

TexAFon has a modular architecture, which facilitates the development of new applications using it, the addition of new languages, and the connection to other external modules and applications. This architecture clearly differentiates among:

- A general processing core, which includes the language-independent procedures.
- The language packages (two, for Spanish and Catalan), including modules and dictionaries specific of the language.
- The applications, which call the processing core depending on their needs.

Here is an example of input and output of TexAFon for a short text in Spanish:

TexAFon 2.0 includes new functionalities oriented to the generation of expressive synthetic speech:

- Normalisation of non-standard text
- Automatic detection of emotional contents
- Automatic identification of speech acts

More details about TexAFon architecture and implementation can be found in Garrido et al. (2012).

Two public domain tools have been developed and released using TextAFon as core code:

- TransText, for the phonetic trasncription of texts in Catalan and Spanish.
- TransDic, for the development of phonetised dictionaries.

Page updated

Google Sites

Report abuse