Massive spectral data analysis for plant breeding using parSketch-PLSDA method: Discrimination of sunflower genotypes

In precision agriculture and plant breeding, the amount of data tends to increase. This massive data is becoming more and more complex, leading to difficulties in managing and analysing it. Optical instruments such as NIR Spectroscopy or hyperspectral imaging are gradually expanding directly in the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biosystems engineering 2021-10, Vol.210, p.69-77
Hauptverfasser: Ryckewaert, Maxime, Metz, Maxime, Héran, Daphné, George, Pierre, Grèzes-Besset, Bruno, Akbarinia, Reza, Roger, Jean-Michel, Bendoula, Ryad
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In precision agriculture and plant breeding, the amount of data tends to increase. This massive data is becoming more and more complex, leading to difficulties in managing and analysing it. Optical instruments such as NIR Spectroscopy or hyperspectral imaging are gradually expanding directly in the field, increasing the amount of spectral database. Using these tools allows access to non-destructive and rapid measurements to classify new varieties according to breeding objectives. Processing this massive amount of spectral data is challenging. In a context of genotype discrimination, we propose to apply a method called parSketch-PLSDA to analyse such a massive amount of spectral data. ParSketch-PLSDA is a combination of an indexing strategy (parSketch) and the reference method (PLSDA) for predicting classes from multivariate data. For this purpose, a spectral database was formed by collecting 1,300,000 spectra generated from hyperspectral images of leaves of four different sunflower genotypes. ParSketch-PLSDA is compared to a PLSDA. Both methods use the same set of calibration and test. The prediction model obtained by PLSDA has a classification error close to 23% on average across all genotypes. ParSketch-PLSDA method outperforms PLSDA by greatly improving prediction qualities by 10%. Indeed, the model built with ParSketch-PLSDA has the ability to take into account non-linearities among data sets. These results are encouraging and allow us to anticipate the future bottleneck related to the generation of a large amount of data from phenotyping. [Display omitted] •parSketch-PLSDA was compared to PLSDA to discriminate sunflower genotypes.•A massive spectral database of 1,300,000 spectra were formed.•parSketch-PLSDA method outperforms PLSDA by improving prediction qualities by 10%.•parSketch-PLSDA is a method that could potentially handle large amounts of data.
ISSN:1537-5110
1537-5129
DOI:10.1016/j.biosystemseng.2021.08.005