Comparative Evaluation in the Wild: Systems for the Expressive Rendering of Music

There have been many attempts to model the ability of human musicians to take a score and perform or render it expressively, by adding tempo, timing, loudness, and articulation changes to nonexpressive music data. While expressive rendering models exist in academic research, most of these are not op...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on artificial intelligence 2024-10, Vol.5 (10), p.5290-5303
Hauptverfasser: Worrall, Kyle, Yin, Zongyu, Collins, Tom
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:There have been many attempts to model the ability of human musicians to take a score and perform or render it expressively, by adding tempo, timing, loudness, and articulation changes to nonexpressive music data. While expressive rendering models exist in academic research, most of these are not open source or accessible, meaning they are difficult to evaluate empirically and have not been widely adopted in professional music software. Systematic comparative evaluation of such algorithms stopped after the last performance rendering contest (RENCON) in 2013, making it difficult to compare newer models to existing work in a fair and valid way. In this article, we introduce the first transformer-based model for expressive rendering, cue-free express + pedal (CFE + P), which predicts expressive attributes such as notewise dynamics and micro-timing adjustments, and beatwise tempo and sustain pedal use based only on the start and end times and pitches of notes (e.g., inexpressive musical instrument digital interface (MIDI) input). We perform two comparative evaluations on our model against a nonmachine learning baseline taken from professional music software and two open-source algorithms-a feedforward neural network (FFNN) and hierarchical recurrent neural network (HRNN). The results of two listening studies indicate that our model renders passages that outperform what can be done in professional music software such as Logic Pro and Ableton Live. 1 1 All data and preexisting hypotheses can be accessed via the Open Science Foundation: https://osf.io/6uwjk/ .
ISSN:2691-4581
2691-4581
DOI:10.1109/TAI.2024.3408717