An End-to-End Machine Learning System for Harmonic Analysis of Music

We present a new system for the harmonic analysis of popular musical audio. It is focused on chord estimation, although the proposed system additionally estimates the key sequence and bass notes. It is distinct from competing approaches in two main ways. First, it makes use of a new improved chromag...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-08, Vol.20 (6), p.1771-1783
Hauptverfasser:	Ni, Y., McVicar, M., Santos-Rodriguez, R., De Bie, T.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Audio chord estimation Bass Detection, estimation, filtering, equalization, prediction Exact sciences and technology Fourier analysis Ground truth Harmonic analysis harmony progression analyzer (HPA) Hidden Markov models Humans Information, signal and communications theory Keys Loudness loudness-based chromagram Machine learning Maximum likelihood estimation meta-song evaluation Perception Signal and communications theory Signal, noise Telecommunications and information theory Topology Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present a new system for the harmonic analysis of popular musical audio. It is focused on chord estimation, although the proposed system additionally estimates the key sequence and bass notes. It is distinct from competing approaches in two main ways. First, it makes use of a new improved chromagram representation of audio that takes the human perception of loudness into account. Furthermore, it is the first system for joint estimation of chords, keys, and bass notes that is fully based on machine learning, requiring no expert knowledge to tune the parameters. This means that it will benefit from future increases in available annotated audio files, broadening its applicability to a wider range of genres. In all of three evaluation scenarios, including a new one that allows evaluation on audio for which no complete ground truth annotation is available, the proposed system is shown to be faster, more memory efficient, and more accurate than the state-of-the-art.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2012.2188516