DDD: Discriminative Difficulty Distance for plant disease diagnosis
Recent studies on plant disease diagnosis using machine learning (ML) have highlighted concerns about the overestimated diagnostic performance due to inappropriate data partitioning, where training and test datasets are derived from the same source (domain). Plant disease diagnosis presents a challe...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent studies on plant disease diagnosis using machine learning (ML) have
highlighted concerns about the overestimated diagnostic performance due to
inappropriate data partitioning, where training and test datasets are derived
from the same source (domain). Plant disease diagnosis presents a challenging
classification task, characterized by its fine-grained nature, vague symptoms,
and the extensive variability of image features within each domain. In this
study, we propose the concept of Discriminative Difficulty Distance (DDD), a
novel metric designed to quantify the domain gap between training and test
datasets while assessing the classification difficulty of test data. DDD
provides a valuable tool for identifying insufficient diversity in training
data, thus supporting the development of more diverse and robust datasets. We
investigated multiple image encoders trained on different datasets and examined
whether the distances between datasets, measured using low-dimensional
representations generated by the encoders, are suitable as a DDD metric. The
study utilized 244,063 plant disease images spanning four crops and 34 disease
classes collected from 27 domains. As a result, we demonstrated that even if
the test images are from different crops or diseases than those used to train
the encoder, incorporating them allows the construction of a distance measure
for a dataset that strongly correlates with the difficulty of diagnosis
indicated by the disease classifier developed independently. Compared to the
base encoder, pre-trained only on ImageNet21K, the correlation higher by 0.106
to 0.485, reaching a maximum of 0.909. |
---|---|
DOI: | 10.48550/arxiv.2501.00734 |