This section discusses the implementation of Kondrak's similarity function to compute phoneme similarity scores based on phonological criteria. We've used this function to create our phonological similarity matrices (with minor modifications that made the similarity scores work better for our audio alignment task, or that made the implementation more coherent with our alignment system).
The similarity function was devised by Kondrak in order to perform cognate alignment.
The similarity function in Kondrak's (2002) function is below. We used the clauses for σsuband σskip , but not the clause written in grey below (σexp). The latter is used to evaluate two-to-one alignment operations useful in cognate alignment, but we did not implement this in our audio alignment system. .
We also modified the definition of V(p) in the formula (see "Other observations" below), getting slightly better results in audio alignment.
Depending on the experiment, we've worked with two different versions for σskip (different to the original and different from each other as well), as discussed below.
Kondrak, 2002, p. 54: Similarity function
The meaning of the functions' clauses and parameters is the following:
Besides Kondrak's work, other projects that have applied the similarity metric are Comas (2012), in the field of spoken document retrieval, and Huff (2010), in computational applications for historical linguistics. Their projects contain useful information for implementation
A useful tool to test how to implement Kondrak's similarity function is P. Huff's PyAline, a Python implementation of Kondrak's cognate alignment system (ALINE).