A detector for page-level handwritten music object recognition based on deep learning

Handwritten music recognition (HMR) is the technology of transcribing the content of images of music scores. The accurate detection of music objects at the page level is one of the main challenges of HMR. Thus far, the existing methods suffer from the tiny and dense nature of handwritten music notat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural computing & applications 2023-05, Vol.35 (13), p.9773-9787
Hauptverfasser: Zhang, Yusen, Huang, Zhiqing, Zhang, Yanxin, Ren, Keyan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Handwritten music recognition (HMR) is the technology of transcribing the content of images of music scores. The accurate detection of music objects at the page level is one of the main challenges of HMR. Thus far, the existing methods suffer from the tiny and dense nature of handwritten music notations and realize positive detection accuracy only on snippets. In this paper, we propose a detector that consists of a staff line removal model and a handwritten music object detection model for page-level handwritten music object recognition. First, an end-to-end staff line removal model R_Staff_Net based on residual learning reduces the complexity of page-level detection. Second, we developed an improved YOLO-V4 model for handwritten music object detection. The improvements mainly concern the adoption of a de-coupled detection head and visual attention module in the YOLO-V4, and an adaptive multi-scale feature fusion module AMFFM is used to enhance the textures and features of tiny music symbols in the deep convolution layers, and the gradient harmonized mechanism is utilized to address the inherent imbalance between music objects. We verified the R_Staff_Net and the improved YOLO-V4 model on the ICDAR/GREC staff line removal dataset and the MUSCIMA++ dataset, respectively. The experiments highlight that R_Staff_Net presents outstanding performance with an F-M score of 98.64%, and our improved YOLO-V4 model is superior to other handwritten music symbol detection methods with a mean average precision (mAP) of 91.8% when addressing page-level input. Although the experimental results of the detector show that the R_Staff_Net helps little to the overall mAP, the network is beneficial for symbols that are similar to staff lines or heavily overlap staff lines.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-023-08216-6