Pavement crack detection from CCD images with a locally enhanced transformer network
•A novel Transformer-based deep neural network was proposed for pavement crack detection.•Transformer modules extracted global contextual information for long-range dependency modeling.•A local enhancement module was designed to compensate for fine-grained local features.•A manually annotated paveme...
Gespeichert in:
Veröffentlicht in: | International journal of applied earth observation and geoinformation 2022-06, Vol.110, p.102825, Article 102825 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A novel Transformer-based deep neural network was proposed for pavement crack detection.•Transformer modules extracted global contextual information for long-range dependency modeling.•A local enhancement module was designed to compensate for fine-grained local features.•A manually annotated pavement crack dataset was built for high-resolution CCD image-based pavement crack detection.
Precisely identifying pavement cracks from charge-coupled devices (CCDs) captured high-resolution images faces many challenges. Even though convolutional neural networks (CNNs) have achieved impressive performance in this task, the stacked convolutional layers fail to extract long-range contextual features and impose high computational costs. Therefore, we propose a locally enhanced Transformer network (LETNet) to completely and efficiently detect pavement cracks. In the LETNet, Transformer is employed to model long-range dependencies. By designing a convolution stem and a local enhancement module, both low-level and high-level local features can be compensated. To take advantage of these rich features, a skip connection strategy and an efficient upsampling module is built to restore detailed information. In addition, a defect rectification module is further developed to reinforce the network for hard sample recognition. The quantitative comparison demonstrates that the proposed LETNet outperformed four advanced deep learning-based models with respect to both efficiency and effectiveness. Specifically, the average precision, recall, ODS, IoU, and frame per second (FPS) of the LETNet on three testing datasets are approximately 93.04%, 92.85%, 92.94%, 94.07%, and 30.80FPS, respectively. We also built a comprehensive pavement crack dataset containing 156 high-resolution manually annotated CCD images and made it publicly available on Zenodo. |
---|---|
ISSN: | 1569-8432 1872-826X |
DOI: | 10.1016/j.jag.2022.102825 |