Machine Learning Reveals Patient Phenotypes and Stratifies Outcomes in Chronic Graft-Versus-Host Disease

Background: Chronic graft-versus-host disease (cGVHD) contributes to significant morbidity and mortality post hematopoietic stem cell transplant. Establishing a reliable classification system for this biologically and clinically heterogenous disease, remains challenging. Current scoring systems (e.g...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Blood 2021-11, Vol.138 (Supplement 1), p.2951-2951
Hauptverfasser: Wu, Ashley Y, Barone, Sierra, Gandelman, Jocelyn, Jagasia, Madan, Irish, Jonathan Michael
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Background: Chronic graft-versus-host disease (cGVHD) contributes to significant morbidity and mortality post hematopoietic stem cell transplant. Establishing a reliable classification system for this biologically and clinically heterogenous disease, remains challenging. Current scoring systems (e.g., NIH consensus criteria) calculate a score of mild, moderate, or severe disease from multiple organ domains. However, important information about the biology of disease and subtypes of patients may be lost when using the aggregate NIH overall severity classification that combines multiple dimensions of organ data. Machine learning may thus reveal subgroups in multi-dimensional data. Aim: We previously utilized a machine learning workflow on a training dataset to cluster cases of cGVHD into seven distinct phenotypes and designed a user-facing prognostic tool organized as a decision tree (Gandelman et al., Haematologica 2018). This decision tree identified clusters of patients with different survival trends that were not explained or stratified by NIH Severity alone. We sought to validate and expand this workflow on an independent cohort of patients from BMT CTN Study #0801. Methods: With permission from CIBMTR, clinical data were obtained from patients enrolled in BMT CTN #0801, a Phase II/III, prospective, multi-center comparative study. The cohort size started at 151 patients; 19 patients were excluded because of missing organ scores, leaving 132 patients in the final analysis. In the training dataset, it was determined that clusters were stable with N=130 patients in the analysis. Therefore, the sample size of the BMT CTN data set was adequate for the validation. At enrollment, NIH 2005 consensus criteria scores were recorded for eye, liver, joint, mouth, gastrointestinal tract, and lung. The percentage of the body surface area with erythema (% erythema) was measured. Skin sclerosis and fascia were assessed using Hopkins scores. Eight organ domains (NIH Scores 0-3: eye, liver, joint, mouth, gastrointestinal tract; Hopkins Scores: Sclerosis 0-4, Fascia 0-3; % erythema) were analyzed via a machine learning workflow consisting of t-distributed stochastic neighbor embedding (t-SNE), self-organizing maps (FlowSOM), and marker enrichment modeling (MEM). These steps allowed for dimensionality reduction, patient clustering, and organ enrichment scoring.Lung scores were not included as they were not found to contribute significantly to patient clustering in the train
ISSN:0006-4971
1528-0020
DOI:10.1182/blood-2021-147472