Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study
Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast canc...
Gespeichert in:
Veröffentlicht in: | International journal of cancer 2023-09, Vol.153 (5), p.932-941 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 941 |
---|---|
container_issue | 5 |
container_start_page | 932 |
container_title | International journal of cancer |
container_volume | 153 |
creator | Syleouni, Maria-Eleni Karavasiloglou, Nena Manduchi, Laura Wanner, Miriam Korol, Dimitri Ortelli, Laura Bordoni, Andrea Rohrmann, Sabine |
description | Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it. |
doi_str_mv | 10.1002/ijc.34568 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2820024825</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2820024825</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-5d1eb6fdd5104856ebfd94a26c37c0f61128a9af14da6e5e976e93f4a0ea55b13</originalsourceid><addsrcrecordid>eNpdkU1LxDAQhoMouq4e_AMS8KJg16RJ-nEU8QsW9KDnMk2m2qVtatK6-C_8yaa6evAyAzPPvAzvS8gRZwvOWHxRr_RCSJVkW2TGWZ5GLOZqm8zCjkUpF8ke2fd-xRjnisldsifSWIpQZuTz0aGp9VB3L9Sjtp2hpUPwA9XQaXQUWhtWa9tiR9f18Ep7V7fgPv5ho58UWtCvdYe0QXDdNIDmxbpw1fpzCrS3_djAUNsuKsGjobb06N6_J9BQP4zm44DsVNB4PNz0OXm-uX66uouWD7f3V5fLSAuZDZEyHMukMkZxJjOVYFmZXEKcaJFqViWcxxnkUHFpIEGFeZpgLioJDEGpkos5Of3R7Z19G9EPRVt7jU0DHdrRF3EWB_tkFquAnvxDV3Z04eOJCi4KxlUaqLMfSjvrvcOq2DhVcFZMMRUhpuI7psAebxTHskXzR_7mIr4Api2Qcg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2837230157</pqid></control><display><type>article</type><title>Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Wiley Online Library All Journals</source><creator>Syleouni, Maria-Eleni ; Karavasiloglou, Nena ; Manduchi, Laura ; Wanner, Miriam ; Korol, Dimitri ; Ortelli, Laura ; Bordoni, Andrea ; Rohrmann, Sabine</creator><creatorcontrib>Syleouni, Maria-Eleni ; Karavasiloglou, Nena ; Manduchi, Laura ; Wanner, Miriam ; Korol, Dimitri ; Ortelli, Laura ; Bordoni, Andrea ; Rohrmann, Sabine</creatorcontrib><description>Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.</description><identifier>ISSN: 0020-7136</identifier><identifier>EISSN: 1097-0215</identifier><identifier>DOI: 10.1002/ijc.34568</identifier><identifier>PMID: 37243372</identifier><language>eng</language><publisher>United States: Wiley Subscription Services, Inc</publisher><subject>Algorithms ; Automation ; Breast cancer ; Cancer ; Learning algorithms ; Machine learning ; Medical research ; Neural networks ; Observational studies ; Population studies ; Population-based studies</subject><ispartof>International journal of cancer, 2023-09, Vol.153 (5), p.932-941</ispartof><rights>2023 The Authors. International Journal of Cancer published by John Wiley & Sons Ltd on behalf of UICC.</rights><rights>2023. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-5d1eb6fdd5104856ebfd94a26c37c0f61128a9af14da6e5e976e93f4a0ea55b13</citedby><cites>FETCH-LOGICAL-c348t-5d1eb6fdd5104856ebfd94a26c37c0f61128a9af14da6e5e976e93f4a0ea55b13</cites><orcidid>0000-0001-5284-6612 ; 0000-0002-2215-1200</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37243372$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Syleouni, Maria-Eleni</creatorcontrib><creatorcontrib>Karavasiloglou, Nena</creatorcontrib><creatorcontrib>Manduchi, Laura</creatorcontrib><creatorcontrib>Wanner, Miriam</creatorcontrib><creatorcontrib>Korol, Dimitri</creatorcontrib><creatorcontrib>Ortelli, Laura</creatorcontrib><creatorcontrib>Bordoni, Andrea</creatorcontrib><creatorcontrib>Rohrmann, Sabine</creatorcontrib><title>Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study</title><title>International journal of cancer</title><addtitle>Int J Cancer</addtitle><description>Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.</description><subject>Algorithms</subject><subject>Automation</subject><subject>Breast cancer</subject><subject>Cancer</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>Medical research</subject><subject>Neural networks</subject><subject>Observational studies</subject><subject>Population studies</subject><subject>Population-based studies</subject><issn>0020-7136</issn><issn>1097-0215</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpdkU1LxDAQhoMouq4e_AMS8KJg16RJ-nEU8QsW9KDnMk2m2qVtatK6-C_8yaa6evAyAzPPvAzvS8gRZwvOWHxRr_RCSJVkW2TGWZ5GLOZqm8zCjkUpF8ke2fd-xRjnisldsifSWIpQZuTz0aGp9VB3L9Sjtp2hpUPwA9XQaXQUWhtWa9tiR9f18Ep7V7fgPv5ho58UWtCvdYe0QXDdNIDmxbpw1fpzCrS3_djAUNsuKsGjobb06N6_J9BQP4zm44DsVNB4PNz0OXm-uX66uouWD7f3V5fLSAuZDZEyHMukMkZxJjOVYFmZXEKcaJFqViWcxxnkUHFpIEGFeZpgLioJDEGpkos5Of3R7Z19G9EPRVt7jU0DHdrRF3EWB_tkFquAnvxDV3Z04eOJCi4KxlUaqLMfSjvrvcOq2DhVcFZMMRUhpuI7psAebxTHskXzR_7mIr4Api2Qcg</recordid><startdate>20230901</startdate><enddate>20230901</enddate><creator>Syleouni, Maria-Eleni</creator><creator>Karavasiloglou, Nena</creator><creator>Manduchi, Laura</creator><creator>Wanner, Miriam</creator><creator>Korol, Dimitri</creator><creator>Ortelli, Laura</creator><creator>Bordoni, Andrea</creator><creator>Rohrmann, Sabine</creator><general>Wiley Subscription Services, Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7T5</scope><scope>7TO</scope><scope>7U9</scope><scope>H94</scope><scope>K9.</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-5284-6612</orcidid><orcidid>https://orcid.org/0000-0002-2215-1200</orcidid></search><sort><creationdate>20230901</creationdate><title>Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study</title><author>Syleouni, Maria-Eleni ; Karavasiloglou, Nena ; Manduchi, Laura ; Wanner, Miriam ; Korol, Dimitri ; Ortelli, Laura ; Bordoni, Andrea ; Rohrmann, Sabine</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-5d1eb6fdd5104856ebfd94a26c37c0f61128a9af14da6e5e976e93f4a0ea55b13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Automation</topic><topic>Breast cancer</topic><topic>Cancer</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>Medical research</topic><topic>Neural networks</topic><topic>Observational studies</topic><topic>Population studies</topic><topic>Population-based studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Syleouni, Maria-Eleni</creatorcontrib><creatorcontrib>Karavasiloglou, Nena</creatorcontrib><creatorcontrib>Manduchi, Laura</creatorcontrib><creatorcontrib>Wanner, Miriam</creatorcontrib><creatorcontrib>Korol, Dimitri</creatorcontrib><creatorcontrib>Ortelli, Laura</creatorcontrib><creatorcontrib>Bordoni, Andrea</creatorcontrib><creatorcontrib>Rohrmann, Sabine</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Immunology Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><jtitle>International journal of cancer</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Syleouni, Maria-Eleni</au><au>Karavasiloglou, Nena</au><au>Manduchi, Laura</au><au>Wanner, Miriam</au><au>Korol, Dimitri</au><au>Ortelli, Laura</au><au>Bordoni, Andrea</au><au>Rohrmann, Sabine</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study</atitle><jtitle>International journal of cancer</jtitle><addtitle>Int J Cancer</addtitle><date>2023-09-01</date><risdate>2023</risdate><volume>153</volume><issue>5</issue><spage>932</spage><epage>941</epage><pages>932-941</pages><issn>0020-7136</issn><eissn>1097-0215</eissn><abstract>Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.</abstract><cop>United States</cop><pub>Wiley Subscription Services, Inc</pub><pmid>37243372</pmid><doi>10.1002/ijc.34568</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-5284-6612</orcidid><orcidid>https://orcid.org/0000-0002-2215-1200</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0020-7136 |
ispartof | International journal of cancer, 2023-09, Vol.153 (5), p.932-941 |
issn | 0020-7136 1097-0215 |
language | eng |
recordid | cdi_proquest_miscellaneous_2820024825 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Wiley Online Library All Journals |
subjects | Algorithms Automation Breast cancer Cancer Learning algorithms Machine learning Medical research Neural networks Observational studies Population studies Population-based studies |
title | Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T14%3A03%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20second%20breast%20cancer%20among%20women%20with%20primary%20breast%20cancer%20using%20machine%20learning%20algorithms,%20a%20population-based%20observational%20study&rft.jtitle=International%20journal%20of%20cancer&rft.au=Syleouni,%20Maria-Eleni&rft.date=2023-09-01&rft.volume=153&rft.issue=5&rft.spage=932&rft.epage=941&rft.pages=932-941&rft.issn=0020-7136&rft.eissn=1097-0215&rft_id=info:doi/10.1002/ijc.34568&rft_dat=%3Cproquest_cross%3E2820024825%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2837230157&rft_id=info:pmid/37243372&rfr_iscdi=true |