An investigation of the impact of imbalance on the analysis of the US crop variety evaluation program data
Multi‐environment trial data from many crop variety evaluation programs are imbalanced because only a subset of varieties is selected for the following year, which leads to missing variety by year. Inspired by the US National Cotton Variety Test trial, we conducted new simulation studies to investig...
Gespeichert in:
Veröffentlicht in: | Crop science 2024-07, Vol.64 (4), p.2183-2194 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi‐environment trial data from many crop variety evaluation programs are imbalanced because only a subset of varieties is selected for the following year, which leads to missing variety by year. Inspired by the US National Cotton Variety Test trial, we conducted new simulation studies to investigate selection processes that differ from the existing literature. The followings are our four main contributions. First, we adopted a framework that utilizes a logistic regression to generate imbalanced data that follow missing completely at random, missing at random, or missing not at random (MNAR). Second, our selection process can depend on multiple traits, whereas all existing studies only used a single trait for selection. Third, besides variance components (VCs), long‐term trends that reflect genetic and non‐genetic development are of interest since the simulated data span over 30 years. Last, we evaluated the prediction accuracy for variety's overall and location‐specific performance. The results show that the VC and long‐term trends estimations are the worst under MNAR using the single trait for selection. Compared to VC, the long‐term trends estimation is more influenced by the missing mechanism and missing rate. However, the prediction accuracy for variety's performance is mainly driven by the missing rate and is less sensitive to the selection process. If ignoring the genetic and non‐genetic long‐term trends, both estimation and prediction will deteriorate. More testing years would improve estimation and prediction, despite a higher missing rate.
Core Ideas
Large biases for variance components (VCs) and long‐term trend occur under missing not at random using the target trait (TT) for selection.
When using multiple traits for selection, the bias depends on their correlation with TT.
Estimation of the long‐term trends is more substantially affected by the selection process and the missing rate compared to VC.
The prediction accuracy for variety's performance is mainly driven by the missing rate.
Ignoring the long‐term trends worsens estimation and prediction; more testing years improve them. |
---|---|
ISSN: | 0011-183X 1435-0653 |
DOI: | 10.1002/csc2.21262 |