Predicting the diagnosis of HIV and sexually transmitted infections among men who have sex with men using machine learning approaches

•Machine learning outperforms regression models in predicting HIV/STIs diagnosis.•The top 10 predictors collectively explained 62.7–73.6% of variations in prediction.•STIs symptoms, past infection history and risk behaviours are top predictors.•The tool has important public health implications for H...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of infection 2021-01, Vol.82 (1), p.48-59
Hauptverfasser: Bao, Yining, Medland, Nicholas A., Fairley, Christopher K., Wu, Jinrong, Shang, Xianwen, Chow, Eric P.F., Xu, Xianglong, Ge, Zongyuan, Zhuang, Xun, Zhang, Lei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Machine learning outperforms regression models in predicting HIV/STIs diagnosis.•The top 10 predictors collectively explained 62.7–73.6% of variations in prediction.•STIs symptoms, past infection history and risk behaviours are top predictors.•The tool has important public health implications for HIV/STIs surveillance. We aimed to develop machine learning models and evaluate their performance in predicting HIV and sexually transmitted infections (STIs) diagnosis based on a cohort of Australian men who have sex with men (MSM). We collected clinical records of 21,273 Australian MSM during 2011–2017. We compared accuracies for predicting HIV and STIs (syphilis, gonorrhoea, chlamydia) diagnosis using four machine learning approaches against a multivariable logistic regression (MLR) model. Machine learning approaches consistently outperformed MLR. Gradient boosting machine (GBM) achieved the highest area under the receiver operator characteristic curve for HIV (76.3%) and STIs (syphilis, 85.8%; gonorrhoea, 75.5%; chlamydia, 68.0%), followed by extreme gradient boosting (71.1%, 82.2%, 70.3%, 66.4%), random forest (72.0%, 81.9%, 67.2%, 64.3%), deep learning (75.8%, 81.0%, 67.5%, 65.4%) and MLR (69.8%, 80.1%, 67.2%, 63.2%). GBM models demonstrated the ten greatest predictors collectively explained 62.7-73.6% of variations in predicting HIV/STIs. STIs symptoms, past syphilis infection, age, time living in Australia, frequency of condom use with casual male sexual partners during receptive anal sex and the number of casual male sexual partners in the past 12 months were most commonly identified predictors. Machine learning approaches are advantageous over multivariable logistic regression models in predicting HIV/STIs diagnosis.
ISSN:0163-4453
1532-2742
DOI:10.1016/j.jinf.2020.11.007