Implementing

This section discusses the implementation of Kondrak's similarity function to compute phoneme similarity scores based on phonological criteria. We've used this function to create our phonological similarity matrices (with minor modifications that made the similarity scores work better for our audio alignment task, or that made the implementation more coherent with our alignment system).

Similarity Function

The similarity function was devised by Kondrak in order to perform cognate alignment.

The similarity function in Kondrak's (2002) function is below. We used the clauses for σ_suband σ_skip , but not the clause written in grey below (σ_exp). The latter is used to evaluate two-to-one alignment operations useful in cognate alignment, but we did not implement this in our audio alignment system. .

We also modified the definition of V(p) in the formula (see "Other observations" below), getting slightly better results in audio alignment.

Depending on the experiment, we've worked with two different versions for σ_skip (different to the original and different from each other as well), as discussed below.

Kondrak, 2002, p. 54: Similarity function

The meaning of the functions' clauses and parameters is the following:

σ_sub(p, q) returns the similarity score for phonemes p, q
- C_sub is the maximum possible substitution cost. It is determined heuristically. For our matrices, we set this value at 3500. This corresponds to Kondrak's value of 35. The similarity scores work out to be in the same score-range, since we divide the output of σ_sub by 100, for implementation reasons.
- C_vwl is a constant that reflects the relative weight of vowels and consonants in alignment. See "Other observations" below for elaboration. We set C_vwl at 1000, which is coherent with Kondrak's original value.
diff(p, q, f) returns the difference between phonemes p and q for feature f.
- R is the feature set to use. It is configurable, as are its subsets R_c and R_v
  - R_c is the feature-subset to use when a consonant is involved in the similarity comparison
  - R_v is the feature-subset to use when only vowels are involved in the comparison
  - These links contain the feature sets we used for Spanish, English and Basque
- σ_skip is not used for computing the matrix. Rather, it is the gap (insertion or deletion) penalty that the aligner needs to apply in alignment. Depending on the experiment, we've used two different ways to calculate σ_skip.
  - 1) C_skip / 100 . This was done for the same implementation reasons that led us to divide σ_sub by 100. When we used this definition, C_skip was set at 1000, yielding a gap penalty of 10 when aligning.
  - 2) ceiling( | C_sub / 400 | ). This definition was used in a set of experiments where alignement results with the phonological similarity matrices were compared to alignment results with the perceptual and decoder-based similarity matrices. The range for the last two types of matrices was 0-1000 (much wider than the range of approx -50 to +35 in Kondrak's matrices), and for consistence we wanted to have a gap penalty that is in a similar proportion to the maximum substitution cost in all three matrices. The ratio between the gap penalty and the maximum substitution cost in our phonological similarity matrices was approximately 1/4 (i.e. 10/35). Accordingly, we defined σ_skip as 1/4 × (C_sub/100) when running the aligner for comparison of results with the three types of matrices.

Other observations

We modified the definition of V(p) as depicted below.

The reason for this modification: When C_vwl = 0, vowels and consonants have the same weight, and the matrices have the same score throughout their diagonal, with vowel matches obtaiing the same score as consonant matches. However, setting C_vwl at 0 is generally not desirable, since it increases similarity scores between consonants and vowels, somewhat blurring differences between very dissimilar segments. The modified definition of V(p) returns a unique-value diagonal even when setting C_vwl > 0.
Given the way R and its subsets are defined, the characteristics of vowels are described with two sets of features. Vowels bear Place and Manner specifications, that apply when they are compared with consonants (R_c comparison). They also bear High, Back, Round and Long features, used when vowels are compared with other vowels (R_v comparison). See our feature specification tables for examples of this double specification for vowels.
In order to reproduce our similarity scores with the function, taking our feature specifications as input, the output of σ_sub(p, q) needs to be divided by 100, as does C_skip.This way we can work with integers in feature values instead of with decimals, and obtain scores in the same score range as when using Kondrak's feature values, which have 2 decimals.

Implementing

Besides Kondrak's work, other projects that have applied the similarity metric are Comas (2012), in the field of spoken document retrieval, and Huff (2010), in computational applications for historical linguistics. Their projects contain useful information for implementation

A useful tool to test how to implement Kondrak's similarity function is P. Huff's PyAline, a Python implementation of Kondrak's cognate alignment system (ALINE).

Page updated

Google Sites

Report abuse