Average cosine similarity over 100 random draws for each singer
PCA
T-SNE
CONT-VC was the best performing model on singer similarity for the VocalSet, followed by BYOL
female2
female3
female4
female7
female8
female9
male1
male3
male4
male6
male8
male9
Most models can very easily distinguishing male/female voices. However, within male/female groups, the task is much harder.
Female4 has a lower pitched voice than most other female voices on the dataset. This is captured by all our trained models.
There is an overall higher similarity between female voices than between male voices in the VocalSet (it is harder for the models to distinguish beween female voices than male voices).
This could be explained by the fact that there is more diversity of voice types on male voices.