Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study

Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast canc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of cancer 2023-09, Vol.153 (5), p.932-941
Hauptverfasser: Syleouni, Maria-Eleni, Karavasiloglou, Nena, Manduchi, Laura, Wanner, Miriam, Korol, Dimitri, Ortelli, Laura, Bordoni, Andrea, Rohrmann, Sabine
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 941
container_issue 5
container_start_page 932
container_title International journal of cancer
container_volume 153
creator Syleouni, Maria-Eleni
Karavasiloglou, Nena
Manduchi, Laura
Wanner, Miriam
Korol, Dimitri
Ortelli, Laura
Bordoni, Andrea
Rohrmann, Sabine
description Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.
doi_str_mv 10.1002/ijc.34568
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2820024825</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2820024825</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-5d1eb6fdd5104856ebfd94a26c37c0f61128a9af14da6e5e976e93f4a0ea55b13</originalsourceid><addsrcrecordid>eNpdkU1LxDAQhoMouq4e_AMS8KJg16RJ-nEU8QsW9KDnMk2m2qVtatK6-C_8yaa6evAyAzPPvAzvS8gRZwvOWHxRr_RCSJVkW2TGWZ5GLOZqm8zCjkUpF8ke2fd-xRjnisldsifSWIpQZuTz0aGp9VB3L9Sjtp2hpUPwA9XQaXQUWhtWa9tiR9f18Ep7V7fgPv5ho58UWtCvdYe0QXDdNIDmxbpw1fpzCrS3_djAUNsuKsGjobb06N6_J9BQP4zm44DsVNB4PNz0OXm-uX66uouWD7f3V5fLSAuZDZEyHMukMkZxJjOVYFmZXEKcaJFqViWcxxnkUHFpIEGFeZpgLioJDEGpkos5Of3R7Z19G9EPRVt7jU0DHdrRF3EWB_tkFquAnvxDV3Z04eOJCi4KxlUaqLMfSjvrvcOq2DhVcFZMMRUhpuI7psAebxTHskXzR_7mIr4Api2Qcg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2837230157</pqid></control><display><type>article</type><title>Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Wiley Online Library All Journals</source><creator>Syleouni, Maria-Eleni ; Karavasiloglou, Nena ; Manduchi, Laura ; Wanner, Miriam ; Korol, Dimitri ; Ortelli, Laura ; Bordoni, Andrea ; Rohrmann, Sabine</creator><creatorcontrib>Syleouni, Maria-Eleni ; Karavasiloglou, Nena ; Manduchi, Laura ; Wanner, Miriam ; Korol, Dimitri ; Ortelli, Laura ; Bordoni, Andrea ; Rohrmann, Sabine</creatorcontrib><description>Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.</description><identifier>ISSN: 0020-7136</identifier><identifier>EISSN: 1097-0215</identifier><identifier>DOI: 10.1002/ijc.34568</identifier><identifier>PMID: 37243372</identifier><language>eng</language><publisher>United States: Wiley Subscription Services, Inc</publisher><subject>Algorithms ; Automation ; Breast cancer ; Cancer ; Learning algorithms ; Machine learning ; Medical research ; Neural networks ; Observational studies ; Population studies ; Population-based studies</subject><ispartof>International journal of cancer, 2023-09, Vol.153 (5), p.932-941</ispartof><rights>2023 The Authors. International Journal of Cancer published by John Wiley &amp; Sons Ltd on behalf of UICC.</rights><rights>2023. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-5d1eb6fdd5104856ebfd94a26c37c0f61128a9af14da6e5e976e93f4a0ea55b13</citedby><cites>FETCH-LOGICAL-c348t-5d1eb6fdd5104856ebfd94a26c37c0f61128a9af14da6e5e976e93f4a0ea55b13</cites><orcidid>0000-0001-5284-6612 ; 0000-0002-2215-1200</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37243372$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Syleouni, Maria-Eleni</creatorcontrib><creatorcontrib>Karavasiloglou, Nena</creatorcontrib><creatorcontrib>Manduchi, Laura</creatorcontrib><creatorcontrib>Wanner, Miriam</creatorcontrib><creatorcontrib>Korol, Dimitri</creatorcontrib><creatorcontrib>Ortelli, Laura</creatorcontrib><creatorcontrib>Bordoni, Andrea</creatorcontrib><creatorcontrib>Rohrmann, Sabine</creatorcontrib><title>Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study</title><title>International journal of cancer</title><addtitle>Int J Cancer</addtitle><description>Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.</description><subject>Algorithms</subject><subject>Automation</subject><subject>Breast cancer</subject><subject>Cancer</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>Medical research</subject><subject>Neural networks</subject><subject>Observational studies</subject><subject>Population studies</subject><subject>Population-based studies</subject><issn>0020-7136</issn><issn>1097-0215</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpdkU1LxDAQhoMouq4e_AMS8KJg16RJ-nEU8QsW9KDnMk2m2qVtatK6-C_8yaa6evAyAzPPvAzvS8gRZwvOWHxRr_RCSJVkW2TGWZ5GLOZqm8zCjkUpF8ke2fd-xRjnisldsifSWIpQZuTz0aGp9VB3L9Sjtp2hpUPwA9XQaXQUWhtWa9tiR9f18Ep7V7fgPv5ho58UWtCvdYe0QXDdNIDmxbpw1fpzCrS3_djAUNsuKsGjobb06N6_J9BQP4zm44DsVNB4PNz0OXm-uX66uouWD7f3V5fLSAuZDZEyHMukMkZxJjOVYFmZXEKcaJFqViWcxxnkUHFpIEGFeZpgLioJDEGpkos5Of3R7Z19G9EPRVt7jU0DHdrRF3EWB_tkFquAnvxDV3Z04eOJCi4KxlUaqLMfSjvrvcOq2DhVcFZMMRUhpuI7psAebxTHskXzR_7mIr4Api2Qcg</recordid><startdate>20230901</startdate><enddate>20230901</enddate><creator>Syleouni, Maria-Eleni</creator><creator>Karavasiloglou, Nena</creator><creator>Manduchi, Laura</creator><creator>Wanner, Miriam</creator><creator>Korol, Dimitri</creator><creator>Ortelli, Laura</creator><creator>Bordoni, Andrea</creator><creator>Rohrmann, Sabine</creator><general>Wiley Subscription Services, Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7T5</scope><scope>7TO</scope><scope>7U9</scope><scope>H94</scope><scope>K9.</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-5284-6612</orcidid><orcidid>https://orcid.org/0000-0002-2215-1200</orcidid></search><sort><creationdate>20230901</creationdate><title>Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study</title><author>Syleouni, Maria-Eleni ; Karavasiloglou, Nena ; Manduchi, Laura ; Wanner, Miriam ; Korol, Dimitri ; Ortelli, Laura ; Bordoni, Andrea ; Rohrmann, Sabine</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-5d1eb6fdd5104856ebfd94a26c37c0f61128a9af14da6e5e976e93f4a0ea55b13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Automation</topic><topic>Breast cancer</topic><topic>Cancer</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>Medical research</topic><topic>Neural networks</topic><topic>Observational studies</topic><topic>Population studies</topic><topic>Population-based studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Syleouni, Maria-Eleni</creatorcontrib><creatorcontrib>Karavasiloglou, Nena</creatorcontrib><creatorcontrib>Manduchi, Laura</creatorcontrib><creatorcontrib>Wanner, Miriam</creatorcontrib><creatorcontrib>Korol, Dimitri</creatorcontrib><creatorcontrib>Ortelli, Laura</creatorcontrib><creatorcontrib>Bordoni, Andrea</creatorcontrib><creatorcontrib>Rohrmann, Sabine</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Immunology Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><jtitle>International journal of cancer</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Syleouni, Maria-Eleni</au><au>Karavasiloglou, Nena</au><au>Manduchi, Laura</au><au>Wanner, Miriam</au><au>Korol, Dimitri</au><au>Ortelli, Laura</au><au>Bordoni, Andrea</au><au>Rohrmann, Sabine</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study</atitle><jtitle>International journal of cancer</jtitle><addtitle>Int J Cancer</addtitle><date>2023-09-01</date><risdate>2023</risdate><volume>153</volume><issue>5</issue><spage>932</spage><epage>941</epage><pages>932-941</pages><issn>0020-7136</issn><eissn>1097-0215</eissn><abstract>Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.</abstract><cop>United States</cop><pub>Wiley Subscription Services, Inc</pub><pmid>37243372</pmid><doi>10.1002/ijc.34568</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-5284-6612</orcidid><orcidid>https://orcid.org/0000-0002-2215-1200</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0020-7136
ispartof International journal of cancer, 2023-09, Vol.153 (5), p.932-941
issn 0020-7136
1097-0215
language eng
recordid cdi_proquest_miscellaneous_2820024825
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Wiley Online Library All Journals
subjects Algorithms
Automation
Breast cancer
Cancer
Learning algorithms
Machine learning
Medical research
Neural networks
Observational studies
Population studies
Population-based studies
title Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T14%3A03%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20second%20breast%20cancer%20among%20women%20with%20primary%20breast%20cancer%20using%20machine%20learning%20algorithms,%20a%20population-based%20observational%20study&rft.jtitle=International%20journal%20of%20cancer&rft.au=Syleouni,%20Maria-Eleni&rft.date=2023-09-01&rft.volume=153&rft.issue=5&rft.spage=932&rft.epage=941&rft.pages=932-941&rft.issn=0020-7136&rft.eissn=1097-0215&rft_id=info:doi/10.1002/ijc.34568&rft_dat=%3Cproquest_cross%3E2820024825%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2837230157&rft_id=info:pmid/37243372&rfr_iscdi=true