An investigation of the impact of imbalance on the analysis of the US crop variety evaluation program data

Multi‐environment trial data from many crop variety evaluation programs are imbalanced because only a subset of varieties is selected for the following year, which leads to missing variety by year. Inspired by the US National Cotton Variety Test trial, we conducted new simulation studies to investig...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Crop science 2024-07, Vol.64 (4), p.2183-2194
Hauptverfasser:	Fang, Zhou, Deng, Dewayne D., Jenkins, Johnie N., Zhou, Qian M.
Format:	Artikel
Sprache:	eng
Schlagworte:	cotton cultivars prediction regression analysis variance
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multi‐environment trial data from many crop variety evaluation programs are imbalanced because only a subset of varieties is selected for the following year, which leads to missing variety by year. Inspired by the US National Cotton Variety Test trial, we conducted new simulation studies to investigate selection processes that differ from the existing literature. The followings are our four main contributions. First, we adopted a framework that utilizes a logistic regression to generate imbalanced data that follow missing completely at random, missing at random, or missing not at random (MNAR). Second, our selection process can depend on multiple traits, whereas all existing studies only used a single trait for selection. Third, besides variance components (VCs), long‐term trends that reflect genetic and non‐genetic development are of interest since the simulated data span over 30 years. Last, we evaluated the prediction accuracy for variety's overall and location‐specific performance. The results show that the VC and long‐term trends estimations are the worst under MNAR using the single trait for selection. Compared to VC, the long‐term trends estimation is more influenced by the missing mechanism and missing rate. However, the prediction accuracy for variety's performance is mainly driven by the missing rate and is less sensitive to the selection process. If ignoring the genetic and non‐genetic long‐term trends, both estimation and prediction will deteriorate. More testing years would improve estimation and prediction, despite a higher missing rate. Core Ideas Large biases for variance components (VCs) and long‐term trend occur under missing not at random using the target trait (TT) for selection. When using multiple traits for selection, the bias depends on their correlation with TT. Estimation of the long‐term trends is more substantially affected by the selection process and the missing rate compared to VC. The prediction accuracy for variety's performance is mainly driven by the missing rate. Ignoring the long‐term trends worsens estimation and prediction; more testing years improve them.
ISSN:	0011-183X 1435-0653
DOI:	10.1002/csc2.21262