Cross-Era Corpus Analysis

This is supplementary material to the submission of the paper "Unveiling High-level Discriminant Harmonic Features of Musical Style in the Tonal Interval Space" to the International Conference on Music Perception and Cognition (ICMPC) 2021 conference.

Style is one of the most prominent musical traits in distinguishing historical times, composers, musicians, sonic texture, emotion, and genre. In recent years, the automatic recognition and synthesis of musical styles have been extensively pursued. Most of the work focuses on low-level perceptual characteristics that do not reflect the hierarchical nature of harmony, namely the temporal structure of the harmonic progressions. In this project, we aim to unveil musicological and perceptually inspired harmonic descriptors in the Tonal Interval Space that best discriminate musical eras within the Western classical tradition. Descriptors that account for multiple time scales of the vertical and horizontal harmonic structure are considered, which in harmonic terms can be understood as harmonic quality and progressions (notably including voice leading). The selection of representative discriminant harmonic features is data-driven and adopts the cross-era dataset, which includes 200 tracks per representative Western historical music period (Baroque, Classical, Romantic, and Modern), and two instrumentations (orchestra and piano).

Harmonic Descriptors

Next, we list the full set of harmonic descriptors adopted in the study and detail their mathematical definition and musical interpretation in the Tonal Interval Space. Each musical track is represented by a collection of harmonic descriptors computed as time-varying descriptive statistics. We focused on highly robust statistics: median (med) as a measure of central tendency and interquartile range (IQR) as a measure of the variability of the descriptor data distribution. Audio frames from which we compute Tonal Interval Vectors derive from NNLS chromagrams (window size of 8192 samples and a step size of 4410 samples, assuming a sample rate of 44.100 kHz).

Dissonance (diss): provides a perceptually inspired indicator of dissonance, as the normalized as a weighted magnitude of TIV magnitudes subtracted from unity. The descriptor output is in the [0,1] range, where 1 corresponds to a highly dissonant audio (12 pitch class cluster) and 0 to very consonant audio.
Chromaticity (chromatic): indicates the level of the chromatic quality of a given audio frame as the magnitude of the TIV in the chromatic pitch circle. It is computed as the magnitude of the T(2) normalized to unity (range [0-1]).
Dyadicity (dyad): indicates the level to which a given audio frame embeds the tritone quality. High dyadicity values indicate sonorities consisting of stacked perfect and augmented fourths. It is computed as the magnitude of the T(2) normalized to unity (range [0-1]).
Triadicity (triad): indicates the level to which a given audio frame consists of major and minor triads. It is computed as the magnitude of the T(3) normalized to unity (range [0-1]).
Diminished quality (dim): indicates the level to which a given audio frame complies to a diminished seventh chord. It is computed as the magnitude of the T(4) normalized to unity (range [0-1]).
Diatonicity (diatonic): indicates the level of diatonicity of a given audio frame. It is computed as the magnitude of the T(5) normalized to unity (range [0-1]).
Whole-toneness (wholeTone): indicates the level to which a given audio frame complies with a whole-tone set. It is computed as the magnitude of the T(6) normalized to unity (range [0-1]).
HCDF peaks (hcdfPeaks): is computed as the magnitude of the peaks from the Harmonic Change Detection Function (HCDF) in the Tonal Interval Space. Larger magnitude values indicate harmonic changes from greater perceptual distance across two sequential chords.
Harmonic Rhythm (hRhythm): is computed as the inter-peak interval (in frames) from the HCDF, thus providing a loose indicator of the harmonic rhythm.
Euclidean/Cosine Tonal Dispersion (eucTDispersion and cosTDispersion): measures the Euclidean or cosine distance of each audio frame from the tonal center. The tonal center TIV results from the mean per pc bin from a larger musical excerpt (or musical track).
Euclidean distance (eucTIV): measures the Euclidean distance between consecutive audio frames. It provides a measure of perceptual relatedness, enforcing the interval content relation across TIVs.
Cosine distance (cosTIV): measures the cosine distance between consecutive audio frames. It provides a measure of the common tone shared across TIVs, roughly related to voice leading.

Harmonic Descriptor Analysis

The interactive box plot below presents the harmonic descriptors’ statistics per era and instrumentation. To isolate particular descriptors, double-click in the plot legends. Single clicks add/remove given descriptors from the plot.

To assess the intercorrelations among harmonic audio descriptors and the number of groups of independent descriptors, we modeled the between-descriptor distances using two distance models: hierarchical clustering and metric MDS. Distance models are computed from a square matrix denoting the absolute correlation distance across all harmonic audio descriptors.

Finally, a cross-validated logistic regression was performed to rank the importance of each feature in discriminating four historical Western music genres (Baroque, Classical, Romantic, and Modern).