ASELMAR: Active and semi-supervised learning-based framework to reduce multi-labeling efforts for activity recognition
Manual annotation of unlabeled data for model training is expensive and time-consuming, especially for visual datasets requiring domain-specific experience for multi-labeling, such as video records generated in hospital settings. There is a need to build frameworks to reduce human labeling efforts w...
Gespeichert in:
Veröffentlicht in: | Computer vision and image understanding 2025-02, Vol.251, p.104269, Article 104269 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Manual annotation of unlabeled data for model training is expensive and time-consuming, especially for visual datasets requiring domain-specific experience for multi-labeling, such as video records generated in hospital settings. There is a need to build frameworks to reduce human labeling efforts while improving training performance. Semi-supervised learning is widely used to generate predictions for unlabeled samples in a partially labeled datasets. Active learning can be used with semi-supervised learning to annotate unlabeled samples to reduce the sampling bias due to the label predictions. We developed the aselmar framework based on active and semi-supervised learning techniques to reduce the time and effort associated with multi-labeling of unlabeled samples for activity recognition. aselmar (i) categorizes the predictions for unlabeled data based on the confidence level in predictions using fixed and adaptive threshold settings, (ii) applies a label verification procedure for the samples with the ambiguous prediction, and (iii) retrains the model iteratively using samples with their high-confidence predictions or manual annotations. We also designed a software tool to guide domain experts in verifying ambiguous predictions. We applied aselmar to recognize eight selected activities from our trauma resuscitation video dataset and evaluated their performance based on the label verification time and the mean ap score metric. The label verification required by aselmar was 12.1% of the manual annotation effort for the unlabeled video records. The improvement in the mean ap score was 5.7% for the first iteration and 8.3% for the second iteration with the fixed threshold-based method compared to the baseline model. The p-values were below 0.05 for the target activities. Using an adaptive-threshold method, aselmar achieved a decrease in ap score deviation, implying an improvement in model robustness. For a speech-based case study, the word error rate decreased by 6.2%, and the average transcription factor increased 2.6 times, supporting the broad applicability of ASELMAR in reducing labeling efforts from domain experts.
•ASELMAR, a deep learning framework, reduces labeling time for activity recognition.•Label verification time was 12.1% of the manual annotation effort by domain experts.•The performance improvement was 8.3% by training the model with a few iterations.•ASELMAR’s performance for a speech recognition study shows its broad applicability.•The dete |
---|---|
ISSN: | 1077-3142 |
DOI: | 10.1016/j.cviu.2024.104269 |