An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation
Image segmentation is a crucial task in computer vision, with wide-ranging applications in industry. The Segment Anything Model (SAM) has recently attracted intensive attention; however, its application in industrial inspection, particularly for segmenting commercial anti-counterfeit codes, remains...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Image segmentation is a crucial task in computer vision, with wide-ranging
applications in industry. The Segment Anything Model (SAM) has recently
attracted intensive attention; however, its application in industrial
inspection, particularly for segmenting commercial anti-counterfeit codes,
remains challenging. Unlike open-source datasets, industrial settings often
face issues such as small sample sizes and complex textures. Additionally,
computational cost is a key concern due to the varying number of trainable
parameters. To address these challenges, we propose an Augmentation-based Model
Re-adaptation Framework (AMRF). This framework leverages data augmentation
techniques during training to enhance the generalisation of segmentation
models, allowing them to adapt to newly released datasets with temporal
disparity. By observing segmentation masks from conventional models (FCN and
U-Net) and a pre-trained SAM model, we determine a minimal augmentation set
that optimally balances training efficiency and model performance. Our results
demonstrate that the fine-tuned FCN surpasses its baseline by 3.29% and 3.02%
in cropping accuracy, and 5.27% and 4.04% in classification accuracy on two
temporally continuous datasets. Similarly, the fine-tuned U-Net improves upon
its baseline by 7.34% and 4.94% in cropping, and 8.02% and 5.52% in
classification. Both models outperform the top-performing SAM models (ViT-Large
and ViT-Base) by an average of 11.75% and 9.01% in cropping accuracy, and 2.93%
and 4.83% in classification accuracy, respectively. |
---|---|
DOI: | 10.48550/arxiv.2409.09530 |