Predicting the number of oocytes retrieved from controlled ovarian hyperstimulation with machine learning
Abstract STUDY QUESTION Can machine learning predict the number of oocytes retrieved from controlled ovarian hyperstimulation (COH)? SUMMARY ANSWER Three machine-learning models were successfully trained to predict the number of oocytes retrieved from COH. WHAT IS KNOWN ALREADY A number of previous...
Gespeichert in:
Veröffentlicht in: | Human reproduction (Oxford) 2023-10, Vol.38 (10), p.1918-1926 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Abstract
STUDY QUESTION
Can machine learning predict the number of oocytes retrieved from controlled ovarian hyperstimulation (COH)?
SUMMARY ANSWER
Three machine-learning models were successfully trained to predict the number of oocytes retrieved from COH.
WHAT IS KNOWN ALREADY
A number of previous studies have identified and built predictive models on factors that influence the number of oocytes retrieved during COH. Many of these studies are, however, limited in the fact that they only consider a small number of variables in isolation.
STUDY DESIGN, SIZE, DURATION
This study was a retrospective analysis of a dataset of 11,286 cycles performed at a single centre in France between 2009 and 2020 with the aim of building a predictive model for the number of oocytes retrieved from ovarian stimulation. The analysis was carried out by a data analysis team external to the centre using the Substra framework. The Substra framework enabled the data analysis team to send computer code to run securely on the centre’s on-premises server. In this way, a high level of data security was achieved as the data analysis team did not have direct access to the data, nor did the data leave the centre at any point during the study.
PARTICIPANTS/MATERIALS, SETTING, METHODS
The Light Gradient Boosting Machine algorithm was used to produce three predictive models: one that directly predicted the number of oocytes retrieved and two that predicted which of a set of bins provided by two clinicians the number of oocytes retrieved fell into. The resulting models were evaluated on a held-out test set and compared to linear and logistic regression baselines. In addition, the models themselves were analysed to identify the parameters that had the biggest impact on their predictions.
MAIN RESULTS AND THE ROLE OF CHANCE
On average, the model that directly predicted the number of oocytes retrieved deviated from the ground truth by 4.21 oocytes. The model that predicted the first clinician’s bins deviated by 0.73 bins whereas the model for the second clinician deviated by 0.62 bins. For all models, performance was best within the first and third quartiles of the target variable, with the model underpredicting extreme values of the target variable (no oocytes and large numbers of oocytes retrieved). Nevertheless, the erroneous predictions made for these extreme cases were still within the vicinity of the true value. Overall, all three models agreed on the importance of each feature which was es |
---|---|
ISSN: | 0268-1161 1460-2350 1460-2350 |
DOI: | 10.1093/humrep/dead163 |