SUITOR: Selecting the number of mutational signatures through cross-validation

For de novo mutational signature analysis, the critical first step is to decide how many signatures should be expected in a cancer genomics study. An incorrect number could mislead downstream analyses. Here we present SUITOR (Selecting the nUmber of mutatIonal signaTures thrOugh cRoss-validation), a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2022-04, Vol.18 (4), p.e1009309
Hauptverfasser: Lee, Donghyuk, Wang, Difei, Yang, Xiaohong R, Shi, Jianxin, Landi, Maria Teresa, Zhu, Bin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 4
container_start_page e1009309
container_title PLoS computational biology
container_volume 18
creator Lee, Donghyuk
Wang, Difei
Yang, Xiaohong R
Shi, Jianxin
Landi, Maria Teresa
Zhu, Bin
description For de novo mutational signature analysis, the critical first step is to decide how many signatures should be expected in a cancer genomics study. An incorrect number could mislead downstream analyses. Here we present SUITOR (Selecting the nUmber of mutatIonal signaTures thrOugh cRoss-validation), an unsupervised cross-validation method that requires little assumptions and no numerical approximations to select the optimal number of signatures without overfitting the data. In vitro studies and in silico simulations demonstrated that SUITOR can correctly identify signatures, some of which were missed by other widely used methods. Applied to 2,540 whole-genome sequenced tumors across 22 cancer types, SUITOR selected signatures with the smallest prediction errors and almost all signatures of breast cancer selected by SUITOR were validated in an independent breast cancer study. SUITOR is a powerful tool to select the optimal number of mutational signatures, facilitating downstream analyses with etiological or therapeutic importance.
doi_str_mv 10.1371/journal.pcbi.1009309
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2665140075</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A702007773</galeid><doaj_id>oai_doaj_org_article_41e4e173ac624064b1f6a149222d6ed7</doaj_id><sourcerecordid>A702007773</sourcerecordid><originalsourceid>FETCH-LOGICAL-c661t-da03c3ecbbd9095b5f54e52906c0eaeba310ef9b409581be618d451b6f272ce93</originalsourceid><addsrcrecordid>eNqVkk1r3DAQhk1paT7af1BaQy_pwVvJ-lr3UAihHwshgWxyFpI89mqxra0kh_bfV7vrhGzJpeggMXrmHc3ozbJ3GM0wEfjz2o1-UN1sY7SdYYQqgqoX2TFmjBSCsPnLJ-ej7CSENULpWPHX2RFhRIg5F8fZ1fJucXt98yVfQgcm2qHN4wryYew1-Nw1eT9GFa1LlfJg20HF0UNIjHdju8qNdyEU96qz9Y56k71qVBfg7bSfZnffv91e_Cwur38sLs4vC8M5jkWtEDEEjNZ1hSqmWcMosLJC3CBQoBXBCJpK03Q5xxo4nteUYc2bUpQGKnKafdjrbjoX5DSKIEvOGaYICZaIxZ6onVrLjbe98n-kU1buAs63UvloTQeSYqCABVGGlxRxqnHDFaZVWZY1h1okra9TtVH3UBsYolfdgejhzWBXsnX3skrfwgVNAmeTgHe_RghR9jYY6Do1gBu376aixCVCPKEf_0Gf726iWpUasEPjUl2zFZXnAiUhIQRJ1OwZKq0aemvcAI1N8YOETwcJiYnwO7ZqDEEuljf_wV4dsnTP7vzioXmcHUZy6-aHJuXWzXJyc0p7_3Tuj0kP9iV_AVJ372o</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2665140075</pqid></control><display><type>article</type><title>SUITOR: Selecting the number of mutational signatures through cross-validation</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Lee, Donghyuk ; Wang, Difei ; Yang, Xiaohong R ; Shi, Jianxin ; Landi, Maria Teresa ; Zhu, Bin</creator><contributor>Panchenko, Anna R</contributor><creatorcontrib>Lee, Donghyuk ; Wang, Difei ; Yang, Xiaohong R ; Shi, Jianxin ; Landi, Maria Teresa ; Zhu, Bin ; Panchenko, Anna R</creatorcontrib><description>For de novo mutational signature analysis, the critical first step is to decide how many signatures should be expected in a cancer genomics study. An incorrect number could mislead downstream analyses. Here we present SUITOR (Selecting the nUmber of mutatIonal signaTures thrOugh cRoss-validation), an unsupervised cross-validation method that requires little assumptions and no numerical approximations to select the optimal number of signatures without overfitting the data. In vitro studies and in silico simulations demonstrated that SUITOR can correctly identify signatures, some of which were missed by other widely used methods. Applied to 2,540 whole-genome sequenced tumors across 22 cancer types, SUITOR selected signatures with the smallest prediction errors and almost all signatures of breast cancer selected by SUITOR were validated in an independent breast cancer study. SUITOR is a powerful tool to select the optimal number of mutational signatures, facilitating downstream analyses with etiological or therapeutic importance.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1009309</identifier><identifier>PMID: 35377867</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Approximation ; Biology and Life Sciences ; Breast cancer ; Breast Neoplasms - genetics ; Cancer ; Computer Simulation ; Etiology ; Female ; Genetic aspects ; Genomes ; Genomics ; Humans ; Kidney cancer ; Liver cancer ; Medicine and Health Sciences ; Mutation ; Mutation (Biology) ; Mutation - genetics ; Neoplasms ; Physical Sciences ; Physiological aspects ; Prostate cancer ; Signature analysis ; Signatures ; Simulation ; Sparsity ; Tumors</subject><ispartof>PLoS computational biology, 2022-04, Vol.18 (4), p.e1009309</ispartof><rights>COPYRIGHT 2022 Public Library of Science</rights><rights>This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication: https://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c661t-da03c3ecbbd9095b5f54e52906c0eaeba310ef9b409581be618d451b6f272ce93</citedby><cites>FETCH-LOGICAL-c661t-da03c3ecbbd9095b5f54e52906c0eaeba310ef9b409581be618d451b6f272ce93</cites><orcidid>0000-0001-8606-4707 ; 0000-0003-0172-5516 ; 0000-0003-4451-8664 ; 0000-0003-4088-3859</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009674/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009674/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35377867$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Panchenko, Anna R</contributor><creatorcontrib>Lee, Donghyuk</creatorcontrib><creatorcontrib>Wang, Difei</creatorcontrib><creatorcontrib>Yang, Xiaohong R</creatorcontrib><creatorcontrib>Shi, Jianxin</creatorcontrib><creatorcontrib>Landi, Maria Teresa</creatorcontrib><creatorcontrib>Zhu, Bin</creatorcontrib><title>SUITOR: Selecting the number of mutational signatures through cross-validation</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>For de novo mutational signature analysis, the critical first step is to decide how many signatures should be expected in a cancer genomics study. An incorrect number could mislead downstream analyses. Here we present SUITOR (Selecting the nUmber of mutatIonal signaTures thrOugh cRoss-validation), an unsupervised cross-validation method that requires little assumptions and no numerical approximations to select the optimal number of signatures without overfitting the data. In vitro studies and in silico simulations demonstrated that SUITOR can correctly identify signatures, some of which were missed by other widely used methods. Applied to 2,540 whole-genome sequenced tumors across 22 cancer types, SUITOR selected signatures with the smallest prediction errors and almost all signatures of breast cancer selected by SUITOR were validated in an independent breast cancer study. SUITOR is a powerful tool to select the optimal number of mutational signatures, facilitating downstream analyses with etiological or therapeutic importance.</description><subject>Approximation</subject><subject>Biology and Life Sciences</subject><subject>Breast cancer</subject><subject>Breast Neoplasms - genetics</subject><subject>Cancer</subject><subject>Computer Simulation</subject><subject>Etiology</subject><subject>Female</subject><subject>Genetic aspects</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Humans</subject><subject>Kidney cancer</subject><subject>Liver cancer</subject><subject>Medicine and Health Sciences</subject><subject>Mutation</subject><subject>Mutation (Biology)</subject><subject>Mutation - genetics</subject><subject>Neoplasms</subject><subject>Physical Sciences</subject><subject>Physiological aspects</subject><subject>Prostate cancer</subject><subject>Signature analysis</subject><subject>Signatures</subject><subject>Simulation</subject><subject>Sparsity</subject><subject>Tumors</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkk1r3DAQhk1paT7af1BaQy_pwVvJ-lr3UAihHwshgWxyFpI89mqxra0kh_bfV7vrhGzJpeggMXrmHc3ozbJ3GM0wEfjz2o1-UN1sY7SdYYQqgqoX2TFmjBSCsPnLJ-ej7CSENULpWPHX2RFhRIg5F8fZ1fJucXt98yVfQgcm2qHN4wryYew1-Nw1eT9GFa1LlfJg20HF0UNIjHdju8qNdyEU96qz9Y56k71qVBfg7bSfZnffv91e_Cwur38sLs4vC8M5jkWtEDEEjNZ1hSqmWcMosLJC3CBQoBXBCJpK03Q5xxo4nteUYc2bUpQGKnKafdjrbjoX5DSKIEvOGaYICZaIxZ6onVrLjbe98n-kU1buAs63UvloTQeSYqCABVGGlxRxqnHDFaZVWZY1h1okra9TtVH3UBsYolfdgejhzWBXsnX3skrfwgVNAmeTgHe_RghR9jYY6Do1gBu376aixCVCPKEf_0Gf726iWpUasEPjUl2zFZXnAiUhIQRJ1OwZKq0aemvcAI1N8YOETwcJiYnwO7ZqDEEuljf_wV4dsnTP7vzioXmcHUZy6-aHJuXWzXJyc0p7_3Tuj0kP9iV_AVJ372o</recordid><startdate>20220401</startdate><enddate>20220401</enddate><creator>Lee, Donghyuk</creator><creator>Wang, Difei</creator><creator>Yang, Xiaohong R</creator><creator>Shi, Jianxin</creator><creator>Landi, Maria Teresa</creator><creator>Zhu, Bin</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8606-4707</orcidid><orcidid>https://orcid.org/0000-0003-0172-5516</orcidid><orcidid>https://orcid.org/0000-0003-4451-8664</orcidid><orcidid>https://orcid.org/0000-0003-4088-3859</orcidid></search><sort><creationdate>20220401</creationdate><title>SUITOR: Selecting the number of mutational signatures through cross-validation</title><author>Lee, Donghyuk ; Wang, Difei ; Yang, Xiaohong R ; Shi, Jianxin ; Landi, Maria Teresa ; Zhu, Bin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c661t-da03c3ecbbd9095b5f54e52906c0eaeba310ef9b409581be618d451b6f272ce93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Approximation</topic><topic>Biology and Life Sciences</topic><topic>Breast cancer</topic><topic>Breast Neoplasms - genetics</topic><topic>Cancer</topic><topic>Computer Simulation</topic><topic>Etiology</topic><topic>Female</topic><topic>Genetic aspects</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Humans</topic><topic>Kidney cancer</topic><topic>Liver cancer</topic><topic>Medicine and Health Sciences</topic><topic>Mutation</topic><topic>Mutation (Biology)</topic><topic>Mutation - genetics</topic><topic>Neoplasms</topic><topic>Physical Sciences</topic><topic>Physiological aspects</topic><topic>Prostate cancer</topic><topic>Signature analysis</topic><topic>Signatures</topic><topic>Simulation</topic><topic>Sparsity</topic><topic>Tumors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Donghyuk</creatorcontrib><creatorcontrib>Wang, Difei</creatorcontrib><creatorcontrib>Yang, Xiaohong R</creatorcontrib><creatorcontrib>Shi, Jianxin</creatorcontrib><creatorcontrib>Landi, Maria Teresa</creatorcontrib><creatorcontrib>Zhu, Bin</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Donghyuk</au><au>Wang, Difei</au><au>Yang, Xiaohong R</au><au>Shi, Jianxin</au><au>Landi, Maria Teresa</au><au>Zhu, Bin</au><au>Panchenko, Anna R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SUITOR: Selecting the number of mutational signatures through cross-validation</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2022-04-01</date><risdate>2022</risdate><volume>18</volume><issue>4</issue><spage>e1009309</spage><pages>e1009309-</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>For de novo mutational signature analysis, the critical first step is to decide how many signatures should be expected in a cancer genomics study. An incorrect number could mislead downstream analyses. Here we present SUITOR (Selecting the nUmber of mutatIonal signaTures thrOugh cRoss-validation), an unsupervised cross-validation method that requires little assumptions and no numerical approximations to select the optimal number of signatures without overfitting the data. In vitro studies and in silico simulations demonstrated that SUITOR can correctly identify signatures, some of which were missed by other widely used methods. Applied to 2,540 whole-genome sequenced tumors across 22 cancer types, SUITOR selected signatures with the smallest prediction errors and almost all signatures of breast cancer selected by SUITOR were validated in an independent breast cancer study. SUITOR is a powerful tool to select the optimal number of mutational signatures, facilitating downstream analyses with etiological or therapeutic importance.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>35377867</pmid><doi>10.1371/journal.pcbi.1009309</doi><orcidid>https://orcid.org/0000-0001-8606-4707</orcidid><orcidid>https://orcid.org/0000-0003-0172-5516</orcidid><orcidid>https://orcid.org/0000-0003-4451-8664</orcidid><orcidid>https://orcid.org/0000-0003-4088-3859</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2022-04, Vol.18 (4), p.e1009309
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_2665140075
source MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Approximation
Biology and Life Sciences
Breast cancer
Breast Neoplasms - genetics
Cancer
Computer Simulation
Etiology
Female
Genetic aspects
Genomes
Genomics
Humans
Kidney cancer
Liver cancer
Medicine and Health Sciences
Mutation
Mutation (Biology)
Mutation - genetics
Neoplasms
Physical Sciences
Physiological aspects
Prostate cancer
Signature analysis
Signatures
Simulation
Sparsity
Tumors
title SUITOR: Selecting the number of mutational signatures through cross-validation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T07%3A54%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SUITOR:%20Selecting%20the%20number%20of%20mutational%20signatures%20through%20cross-validation&rft.jtitle=PLoS%20computational%20biology&rft.au=Lee,%20Donghyuk&rft.date=2022-04-01&rft.volume=18&rft.issue=4&rft.spage=e1009309&rft.pages=e1009309-&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1009309&rft_dat=%3Cgale_plos_%3EA702007773%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2665140075&rft_id=info:pmid/35377867&rft_galeid=A702007773&rft_doaj_id=oai_doaj_org_article_41e4e173ac624064b1f6a149222d6ed7&rfr_iscdi=true