Discussion Ideas

For guidelines on how to edit the pages, include new ideas for discussion or comment on existing ones, please visit our FAQ page. For a better idea of the function of this wiki, please visit our About page. Participants are encouraged to create and organize pages on this wiki to accommodate new topics for discussion, and to better document what is going on during the workshop.

Below is a preliminary list of ideas for discussion. We will try to combine all proposed topics for discussion as best as possible to create a semi-finished agenda for the break-out sessions during the workshop. We plan to discuss this agenda at the end of day 1.

Evaluation of computational approaches for rhythm analysis:

Most algorithmic evaluations in the context of MIR rhythm research, e.g. for (down)beat tracking, meter estimation or rhythm similarity, assume the existence of a "ground-truth" analysis of the music on a given signal, against which all algorithms can be benchmarked for robustness. Of course, this is a problematic assumption as both research in music psychology and theory, and the experience of the MIR community, clearly show that there is no single analysis that all listeners would subscribe to, and that the variability of those analysis depends, to an unknown extent, on a range of both endogenous and exogenous factors. So, how can we characterize the range of possible analyses during the process of annotation? how can we modify existing evaluation methods to embrace that variability? in fact, since it is impossible for any given methods to recreate all possible analyses, what should automatic approaches aim for? what can we learn from other fields of music research and practice to improve our current standards? --- proposed by Juan P. Bello

  • As an additional thought (and example comment), the problem of evaluation is even worse for tasks where "ground-truth" data (however flawed) is very hard to collect, so researchers turn to proxy analysis or labels. One example is rhythm similarity, where it is difficult to collect, at scale, information about how each recording relates to all, or a subset of, others in a music collection. As a result researchers use genre labels as a proxy for rhythm similarity, and focus on styles (e.g. LatinAmerican, African, Turkish, Ballroom, Electronica) thought to be mostly defined by rhythm. (Juan P. Bello)

Evaluation of automatic downbeat detections systems

We (Matthew Davies, Sebastian Böck, Andre Holzapfel, and Florian Krebs) started a new MIREX task this year where participants were asked to detect downbeats from audio files of different genres. In order to improve the outcome of this task for future years, we would like to discuss the following topics:

    • Downbeat tracking evaluation metrics

    • As there is no common way to evaluate downbeat tracking we decided to use the most simple metric for this year's MIREX: a downbeat F-measure. We are aware that other metrics (such as continuity based metrics used for beat tracking) could be useful as well to get a better understanding under what circumstances and why systems fail.

      • I'm interested in this. We could take into account that:

      • - We may not need the same temporal precision as for other tasks such as onset detection or beat tracking. I could show some audio examples for that.

      • - Downbeat positions are often subjective and not innate. Most people can feel beats or onsets but have trouble estimating downbeats. Besides, there are often octave differences between annotators.

      • - It is often related to the genre. Beat tracking tend to be easier than downbeat tracking on Electro, Rap and Dance music while we observe the opposite on Classical music for example.

      • - It currently almost relies on prior segmentation which can heavily downgrade results because of this segmentation and not the particular downbeat tracking. For example, in the RWC Classical music dataset, estimating the downbeat positions from ground truth beat positions give a result bump from 45% to 85% compared to case where we don't have access to this estimation.

      • From that, we could adapt the evaluation measures, or find other evaluation procedures based on the answer of multiple experts/listeners to embrace the variability of genre/precision/expressiveness/octave estimations (this part may be conected to the topic "Evaluation of computational approaches for rhythm analysis:". In the second case, it brings the question of how to collect these annotations (this part may be conected to the topic Data acquisition from listeners/experts) and how to analyze it. (Simon Durand)

    • How to deal with the problem of overfitting

    • More and more systems rely on training data to set its parameters. If the test data is partly used as training data, the reported scores could be too optimistic, because of overfitting. How can we make sure that we get a fair comparison of systems?

    • (Florian Krebs)

Collection of audio material:

Related to the topic above is the problem of identifying, assembling, and preparing digital archives of recorded performances. Several large analog archives of music from the world's cultures are extant, as are some digitized libraries. What are these resources, how are they organized, how can they be accessed, what metadata exists for them, and what computational analyses might they support? What additional resources might be needed to add to the available digital collections of such material, and how should those efforts be organized in such a way as to make the newly available material maximally interoperable with what already exists? (Robert Rowe)

    • I second this topic and may add that I am also interested in discussing about collections that include not only audio but multimodal information such as video and motion capture. I would like to know if such type of collections are currently being developed (in fact I am aware of some of them) and how do they deal with the questions raised above. (Martín Rocamora)

    • Also related to this topic, I think that it would be interesting to discuss about experiences with mocap systems (motion capture), including low-cost ones such as kinect, leapmotion (leapmotion.com) and DUO (duo3d.com). Motion data can very valuable for studying performance and improvisation. (Martín Rocamora)

Data acquisition from listeners/experts:

Time aligned listener annotations on music audio pieces are useful in many tasks. An important type of data that can be collected is tapping data, where the users “tap” in real time to a piece of music to indicate their perception of various rhythm events, such as beats, downbeats, or position in a metrical cycle. The tapping data can be obtained from music experts, in which case we can use it as ground truth. It could also be obtained from listeners and non-experts, in which case we can use it in many other types of comparative analyses. There are two potential topics of discussion in such a scenario:

  1. What is the methodology to design and build tools that can collect tapping data ?

  2. How do we specify the tasks and design tapping experiments to achieve reliable and representative tapping data ?

For example, consider a case where we wish to collect listeners’ tapping responses that track the progression through metrical cycles. We want to then use it as a benchmark for “human performance” so that it can be used for comparison with an algorithm. In such a case, how would we define an experiment (choose subjects, music pieces, interfaces) so that the tapping responses we obtain can be used as a benchmark ? How do we design interfaces for such a task ?

Some other pertinent questions here are: What would be the scope of such annotation tools in MIR, musicology, music cognition research ? What factors do we need to consider in designing a tapping experiment (both from a listener and from a researcher point of view.) ? How do we handle multicultural aspects of rhythm while building such tools, so that it can be used in a wide variety of tasks across many music cultures?

A particular example of a tool I wish to mention here is the Beat Station (paper, software) developed by Marius Miron et al., which is a real time rhythm annotation tool for recording tapping responses. It was used in ISMIR 2013 to collect tapping responses of Turkish music pieces. Beat Station was adapted to a new version to be used with Carnatic music and record tapping responses to track the events in the taala (the metrical cycles of Carnatic music). Both the variants of tools have been in use for quite some time now, yet there are many gaps to be addressed in the design of interface and experiments. The motivation for the discussion is also to improve such tools in the future. (Ajay Srinivasamurthy)

    • @Ajay: It may also be valuable to discuss the importance of genre diversity in the music pieces used when defining the experiment. Familiarity with a certain genre of music can help a human better distinguish proper tapping than someone not familiar to the genre. What if the tapping data includes music across genres? Would that be an effective solution to eliminate the bias from familiarity towards music or simply complicate the ground truth collecting process? (Akshay Anantapadmanabhan)

    • Another exemple is the work of Mark Levy in ISMIR 2011 (paper) on tempo estimation which was very interesting (4000 tracks annotated by 2000 people resultating in 18000 annotations) but has several limitations (temporal precision due to the use of the space bar for annotation, only the title is provided to find the 30 seconds excerpts of the audio used) making the results not very clear when annotators disagree (is it because of subjective perception or other factors?).

    • (Simon Durand)

Rhythm universals - Universal rhythm...software?:

Within the music information retrieval (MIR) community an increasing awareness towards the diversity present in the musics of the world can be observed. While the first years of MIR saw a concentration on Eurogenetic classical and popular music repertoire, throughout the last decade approaches have been presented that are able to cope with the analysis of certain aspects of "other" musics. We are now at a stage where we can train models to track, for instance, the progression of meter in arbitrary recordings of (metered) music with quite high accuracy. However, still these models need a certain amount of manually labeled training data in order to learn rhythmic characteristics of a certain style.

After many years of lamentation, the international recording industry starts to grow again, mainly driven by the vast expansion of digital markets. As predicted throughout the last years, subscription and legal download models attract more and more listeners and increase the revenue of the industry. With the expansion of broadband wireless internet and smart phone, especially in the developing countries, the recording industry sees a vast potential in getting into the digital music markets of many countries that do not have a large contribution to the revenues so far. The target is: localized distribution and good music recommendation for the listeners. This "good" music recommendation will have to be achieved, at least to some extent, by automatic software tools that analyze music for properties such as "rhythmic similarity", tempo, meter, etc. In last years ISMIR, a Google representative announced to reach such aims by implementing universal analysis software.

Given this background, I would propose to discuss: What are the actual universals in musical rhythm? Can a target such as a universal analysis software be a realistic target given the knowledge of music universals? What can MIR do in order to approach these problems? And finally, are there ethical issues involved in developing music analysis software that will be applied in musical contexts not foreseen by the developing engineer? Andre Holzapfel

    • I am also very interested in the topic of universals in music rhythm and the discussion about what these actually are. We probably first need discuss the definition of universal in musical rhythm itself. Should it be something that is definable across any culture, or just a majority of cultures? (Akshay Anantapadmanabhan)

    • Musical rhythm can often be estimated with attributes related to other music parts (such as timbre, harmony, melody, dynamics). While rhythm patterns tend to be closely related to a given musical style, making it hard to find universal rhythm patterns. These attributes (or others) may be more robust to many different music styles. (this part may be linked to the topic "Rhythm in the context of other musical parameters") (Simon Durand)

Rhythm and percussion syllables and their utility in MIR:

Many music traditions around the world have developed particular systems of syllabic mnemonics for transmission of the repertoire and the technique. Syllabic systems to define and describe rhythm and percussion are widespread across many cultures: Samul nori (Korea), Shoga (Noh theatre of Japan), Carnatic and Hindustani music of India, Turkish makam music and Javanese music, to name a few.

Most often the syllables are onomatopoeic and relate to the percussion instruments used in the music culture. However, in some cases, they are used to define metrical structures, e.g. the metrical grid of usul in Turkish music are defined using vocal syllables and Hindustani music uses Tabla bol to indirectly specify the basic structure of a taal (using the thekas).

These syllables are musically well grounded used extensively in music training. The benefits of using oral syllabic systems from an MIR perspective are both the cultural specificity of the approach and the accuracy of the representation of timbre, articulation and dynamics. One potential topic of discussion is to address these aspects and discuss the possible uses of syllables in MIR tasks such as percussion pattern transcription and classification, meter inference and rhythm similarity. (Ajay Srinivasamurthy)

Definition of common elements for experimentation

There are different metrics used for assessing rhythmic similarity between patterns that have been found positively related to human ratings. These metrics have been evaluated experimentally with different methodologies and with different materials.

Establishing certain standards could help analyze and thoroughly compare between the metrics. Given that the universe of rhythmic probes is wide but finite, a meaningful selection of the materials could be done.

This raises some questions as: what would be a "meaningful" selection? How do we define the space to select rhythms that are perhaps evenly spread throughout it? Or how do we define the space to study similarity within very close rhythms?

(Daniel Gómez)

Interfaces for rhythm creation and transformation

General practice of rhythm manipulation with computers, either in live or in composition scenarios, has been standardized as the atomic control of onset positions, durations and dynamics. In "real time" scenarios, rhythm control is mostly done either by playing an instrument and recording audio (or MIDI), playing a MIDI keyboard (or pad) or tapping the mouse. On the other hand, "off line" approaches, are mostly symbolic, based on writing scores or editing time grids where onsets, durations and dynamics are written.

Given the current knowledge of rhythm cognition and perception, how can new methodologies to create and transform rhythm emerge? Specifically addressing different strategies from note by note playing or writing, with novel tangible interfaces that go beyond screens.

(Daniel Gómez)

Dance - tracking human movement?

In addition to visual/video motion capture systems, are there wearable devices that are being utilized to measure biological motion? Perhaps, wearable sensors accelerometers and gyroscopes, etc? How can the data be time-locked to musical stimuli?

What methods have been used to study interacting performers simultaneously?

(Megha Makam)

Learning and improvisation in children

If music training improves cognitive skills in general, how does this effect carry into other domains, like creativity and improvisation?

(Megha Makam)

Rhythm in the context of other musical parameters

While it's useful in many ways to study rhythm in isolation, in musical contexts one only encounters rhythm as one musical aspect among at least (such as timbre, meter, tempo, dynamics, melody, harmony, form, not to mention cultural/social/associative aspects). It seems that rhythm, like all other musical parameters, can't be considered an independent musical variable, at least not when trying to understand how a piece of music is experienced. Yet, on the other hand, for research purposes reductionism is often inevitable. This leads me to the following questions: What role does rhythm play among other musical parameters? How are they interrelated in different musical styles? How we can we study rhythm in the context of other musical variables? And what kind of statements can we--or can't we--make about rhythm as an isolated phenomenon in the first place? (And a bonus question that might arise: what do we mean by rhythm?) Through concrete examples these topics could be interesting to discuss.

(Olaf Post)