Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA

The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confoun...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2019-08, Vol.15 (8), p.e1007332-e1007332
Hauptverfasser: Parida, Laxmi, Haferlach, Claudia, Rhrissorrakrai, Kahn, Utro, Filippo, Levovitz, Chaya, Kern, Wolfgang, Nadarajah, Niroshan, Twardziok, Sven, Hutter, Stephan, Meggendorfer, Manja, Walter, Wencke, Baer, Constance, Haferlach, Torsten
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1007332
container_issue 8
container_start_page e1007332
container_title PLoS computational biology
container_volume 15
creator Parida, Laxmi
Haferlach, Claudia
Rhrissorrakrai, Kahn
Utro, Filippo
Levovitz, Chaya
Kern, Wolfgang
Nadarajah, Niroshan
Twardziok, Sven
Hutter, Stephan
Meggendorfer, Manja
Walter, Wencke
Baer, Constance
Haferlach, Torsten
description The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.
doi_str_mv 10.1371/journal.pcbi.1007332
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2291472438</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A600426383</galeid><doaj_id>oai_doaj_org_article_31da23faa53b4e68b4123d50e3164b54</doaj_id><sourcerecordid>A600426383</sourcerecordid><originalsourceid>FETCH-LOGICAL-c633t-7c28d4a77e5cf248e6a743012c14ad49f49cb37559243156d243ee2583668ee53</originalsourceid><addsrcrecordid>eNqVkk2P0zAQhiMEYj_gHyCIxGU5tNge20n2gFRt-ai0KhIfZ8txJl2XNC62g-Df45Lsaou4IB_G8jzzeubVZNkzSuYUCvp66wbf626-N7WdU0IKAPYgO6VCwKwAUT68dz_JzkLYEpKulXycnQDlsiqBnGbrpfbfZjsdI_p8DOEyX9pgvN3ZXkfbb_Iw1LHDvO6ca3Kje5OgfAiHVLzBvEkSGGK-XC-eZI9a3QV8OsXz7Ou7t1-uPsyuP75fXS2uZ0YCxFlhWNlwXRQoTMt4iVIXHAhlhnLd8KrllamhEKJiHKiQTQqITJQgZYko4Dx7MeruOxfUZEVQjFWUFwkuE7EaicbprdqnabT_pZy26s-D8xulfbSmQwW00QxarQXUHGVZc8qgEQSBSl4LnrTeTL8N9Q4bg330ujsSPc709kZt3A8lC844p0ngYhLw7vuQvFK75DB2ne7RDYe-S6CUyPIw2cu_0H9PNx-pjU4D2L516V-TToM7a1yPrU3vC0kIZzLxqeDVUUFiIv6MGz2EoFafP_0Huz5m-cga70Lw2N65Qok67Olt--qwp2ra01T2_L6jd0W3iwm_AY-w4js</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2291472438</pqid></control><display><type>article</type><title>Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Parida, Laxmi ; Haferlach, Claudia ; Rhrissorrakrai, Kahn ; Utro, Filippo ; Levovitz, Chaya ; Kern, Wolfgang ; Nadarajah, Niroshan ; Twardziok, Sven ; Hutter, Stephan ; Meggendorfer, Manja ; Walter, Wencke ; Baer, Constance ; Haferlach, Torsten</creator><contributor>Kann, Maricel G</contributor><creatorcontrib>Parida, Laxmi ; Haferlach, Claudia ; Rhrissorrakrai, Kahn ; Utro, Filippo ; Levovitz, Chaya ; Kern, Wolfgang ; Nadarajah, Niroshan ; Twardziok, Sven ; Hutter, Stephan ; Meggendorfer, Manja ; Walter, Wencke ; Baer, Constance ; Haferlach, Torsten ; Kann, Maricel G</creatorcontrib><description>The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1007332</identifier><identifier>PMID: 31469830</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Analysis ; Annotations ; Artificial intelligence ; Biology and Life Sciences ; Blood cancer ; Cancer ; Cancer genetics ; Computational Biology ; Dark matter ; Databases, Nucleic Acid ; Deoxyribonucleic acid ; Disease ; DNA ; DNA sequencing ; DNA, Neoplasm - genetics ; Etiology ; Etiology (Medicine) ; Gene Frequency ; Gene sequencing ; Genetic aspects ; Genome, Human ; Genomes ; Genomics ; Hematologic Neoplasms - classification ; Hematologic Neoplasms - genetics ; Hematology ; Heritability ; Heterogeneity ; High-Throughput Nucleotide Sequencing ; Humans ; Laboratories ; Learning algorithms ; Leukemia ; Machine Learning ; Medicine and Health Sciences ; Metabolism ; Models, Genetic ; Mutation ; Novels ; Polymorphism, Single Nucleotide ; Regularization ; Regulation ; RNA, Untranslated - genetics ; Stochastic Processes ; Stochasticity ; Supervision ; Tumorigenesis ; Tumors ; Whole Genome Sequencing</subject><ispartof>PLoS computational biology, 2019-08, Vol.15 (8), p.e1007332-e1007332</ispartof><rights>COPYRIGHT 2019 Public Library of Science</rights><rights>2019 Parida et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2019 Parida et al 2019 Parida et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c633t-7c28d4a77e5cf248e6a743012c14ad49f49cb37559243156d243ee2583668ee53</citedby><cites>FETCH-LOGICAL-c633t-7c28d4a77e5cf248e6a743012c14ad49f49cb37559243156d243ee2583668ee53</cites><orcidid>0000-0002-6486-9675 ; 0000-0003-3226-7642 ; 0000-0002-1567-9090 ; 0000-0002-7872-5074</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742441/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742441/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2101,2927,23865,27923,27924,53790,53792,79471,79472</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31469830$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Kann, Maricel G</contributor><creatorcontrib>Parida, Laxmi</creatorcontrib><creatorcontrib>Haferlach, Claudia</creatorcontrib><creatorcontrib>Rhrissorrakrai, Kahn</creatorcontrib><creatorcontrib>Utro, Filippo</creatorcontrib><creatorcontrib>Levovitz, Chaya</creatorcontrib><creatorcontrib>Kern, Wolfgang</creatorcontrib><creatorcontrib>Nadarajah, Niroshan</creatorcontrib><creatorcontrib>Twardziok, Sven</creatorcontrib><creatorcontrib>Hutter, Stephan</creatorcontrib><creatorcontrib>Meggendorfer, Manja</creatorcontrib><creatorcontrib>Walter, Wencke</creatorcontrib><creatorcontrib>Baer, Constance</creatorcontrib><creatorcontrib>Haferlach, Torsten</creatorcontrib><title>Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Annotations</subject><subject>Artificial intelligence</subject><subject>Biology and Life Sciences</subject><subject>Blood cancer</subject><subject>Cancer</subject><subject>Cancer genetics</subject><subject>Computational Biology</subject><subject>Dark matter</subject><subject>Databases, Nucleic Acid</subject><subject>Deoxyribonucleic acid</subject><subject>Disease</subject><subject>DNA</subject><subject>DNA sequencing</subject><subject>DNA, Neoplasm - genetics</subject><subject>Etiology</subject><subject>Etiology (Medicine)</subject><subject>Gene Frequency</subject><subject>Gene sequencing</subject><subject>Genetic aspects</subject><subject>Genome, Human</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Hematologic Neoplasms - classification</subject><subject>Hematologic Neoplasms - genetics</subject><subject>Hematology</subject><subject>Heritability</subject><subject>Heterogeneity</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Humans</subject><subject>Laboratories</subject><subject>Learning algorithms</subject><subject>Leukemia</subject><subject>Machine Learning</subject><subject>Medicine and Health Sciences</subject><subject>Metabolism</subject><subject>Models, Genetic</subject><subject>Mutation</subject><subject>Novels</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Regularization</subject><subject>Regulation</subject><subject>RNA, Untranslated - genetics</subject><subject>Stochastic Processes</subject><subject>Stochasticity</subject><subject>Supervision</subject><subject>Tumorigenesis</subject><subject>Tumors</subject><subject>Whole Genome Sequencing</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkk2P0zAQhiMEYj_gHyCIxGU5tNge20n2gFRt-ai0KhIfZ8txJl2XNC62g-Df45Lsaou4IB_G8jzzeubVZNkzSuYUCvp66wbf626-N7WdU0IKAPYgO6VCwKwAUT68dz_JzkLYEpKulXycnQDlsiqBnGbrpfbfZjsdI_p8DOEyX9pgvN3ZXkfbb_Iw1LHDvO6ca3Kje5OgfAiHVLzBvEkSGGK-XC-eZI9a3QV8OsXz7Ou7t1-uPsyuP75fXS2uZ0YCxFlhWNlwXRQoTMt4iVIXHAhlhnLd8KrllamhEKJiHKiQTQqITJQgZYko4Dx7MeruOxfUZEVQjFWUFwkuE7EaicbprdqnabT_pZy26s-D8xulfbSmQwW00QxarQXUHGVZc8qgEQSBSl4LnrTeTL8N9Q4bg330ujsSPc709kZt3A8lC844p0ngYhLw7vuQvFK75DB2ne7RDYe-S6CUyPIw2cu_0H9PNx-pjU4D2L516V-TToM7a1yPrU3vC0kIZzLxqeDVUUFiIv6MGz2EoFafP_0Huz5m-cga70Lw2N65Qok67Olt--qwp2ra01T2_L6jd0W3iwm_AY-w4js</recordid><startdate>20190801</startdate><enddate>20190801</enddate><creator>Parida, Laxmi</creator><creator>Haferlach, Claudia</creator><creator>Rhrissorrakrai, Kahn</creator><creator>Utro, Filippo</creator><creator>Levovitz, Chaya</creator><creator>Kern, Wolfgang</creator><creator>Nadarajah, Niroshan</creator><creator>Twardziok, Sven</creator><creator>Hutter, Stephan</creator><creator>Meggendorfer, Manja</creator><creator>Walter, Wencke</creator><creator>Baer, Constance</creator><creator>Haferlach, Torsten</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6486-9675</orcidid><orcidid>https://orcid.org/0000-0003-3226-7642</orcidid><orcidid>https://orcid.org/0000-0002-1567-9090</orcidid><orcidid>https://orcid.org/0000-0002-7872-5074</orcidid></search><sort><creationdate>20190801</creationdate><title>Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA</title><author>Parida, Laxmi ; Haferlach, Claudia ; Rhrissorrakrai, Kahn ; Utro, Filippo ; Levovitz, Chaya ; Kern, Wolfgang ; Nadarajah, Niroshan ; Twardziok, Sven ; Hutter, Stephan ; Meggendorfer, Manja ; Walter, Wencke ; Baer, Constance ; Haferlach, Torsten</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c633t-7c28d4a77e5cf248e6a743012c14ad49f49cb37559243156d243ee2583668ee53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Annotations</topic><topic>Artificial intelligence</topic><topic>Biology and Life Sciences</topic><topic>Blood cancer</topic><topic>Cancer</topic><topic>Cancer genetics</topic><topic>Computational Biology</topic><topic>Dark matter</topic><topic>Databases, Nucleic Acid</topic><topic>Deoxyribonucleic acid</topic><topic>Disease</topic><topic>DNA</topic><topic>DNA sequencing</topic><topic>DNA, Neoplasm - genetics</topic><topic>Etiology</topic><topic>Etiology (Medicine)</topic><topic>Gene Frequency</topic><topic>Gene sequencing</topic><topic>Genetic aspects</topic><topic>Genome, Human</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Hematologic Neoplasms - classification</topic><topic>Hematologic Neoplasms - genetics</topic><topic>Hematology</topic><topic>Heritability</topic><topic>Heterogeneity</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Humans</topic><topic>Laboratories</topic><topic>Learning algorithms</topic><topic>Leukemia</topic><topic>Machine Learning</topic><topic>Medicine and Health Sciences</topic><topic>Metabolism</topic><topic>Models, Genetic</topic><topic>Mutation</topic><topic>Novels</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Regularization</topic><topic>Regulation</topic><topic>RNA, Untranslated - genetics</topic><topic>Stochastic Processes</topic><topic>Stochasticity</topic><topic>Supervision</topic><topic>Tumorigenesis</topic><topic>Tumors</topic><topic>Whole Genome Sequencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Parida, Laxmi</creatorcontrib><creatorcontrib>Haferlach, Claudia</creatorcontrib><creatorcontrib>Rhrissorrakrai, Kahn</creatorcontrib><creatorcontrib>Utro, Filippo</creatorcontrib><creatorcontrib>Levovitz, Chaya</creatorcontrib><creatorcontrib>Kern, Wolfgang</creatorcontrib><creatorcontrib>Nadarajah, Niroshan</creatorcontrib><creatorcontrib>Twardziok, Sven</creatorcontrib><creatorcontrib>Hutter, Stephan</creatorcontrib><creatorcontrib>Meggendorfer, Manja</creatorcontrib><creatorcontrib>Walter, Wencke</creatorcontrib><creatorcontrib>Baer, Constance</creatorcontrib><creatorcontrib>Haferlach, Torsten</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Parida, Laxmi</au><au>Haferlach, Claudia</au><au>Rhrissorrakrai, Kahn</au><au>Utro, Filippo</au><au>Levovitz, Chaya</au><au>Kern, Wolfgang</au><au>Nadarajah, Niroshan</au><au>Twardziok, Sven</au><au>Hutter, Stephan</au><au>Meggendorfer, Manja</au><au>Walter, Wencke</au><au>Baer, Constance</au><au>Haferlach, Torsten</au><au>Kann, Maricel G</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2019-08-01</date><risdate>2019</risdate><volume>15</volume><issue>8</issue><spage>e1007332</spage><epage>e1007332</epage><pages>e1007332-e1007332</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>31469830</pmid><doi>10.1371/journal.pcbi.1007332</doi><orcidid>https://orcid.org/0000-0002-6486-9675</orcidid><orcidid>https://orcid.org/0000-0003-3226-7642</orcidid><orcidid>https://orcid.org/0000-0002-1567-9090</orcidid><orcidid>https://orcid.org/0000-0002-7872-5074</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2019-08, Vol.15 (8), p.e1007332-e1007332
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_2291472438
source MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS); EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Algorithms
Analysis
Annotations
Artificial intelligence
Biology and Life Sciences
Blood cancer
Cancer
Cancer genetics
Computational Biology
Dark matter
Databases, Nucleic Acid
Deoxyribonucleic acid
Disease
DNA
DNA sequencing
DNA, Neoplasm - genetics
Etiology
Etiology (Medicine)
Gene Frequency
Gene sequencing
Genetic aspects
Genome, Human
Genomes
Genomics
Hematologic Neoplasms - classification
Hematologic Neoplasms - genetics
Hematology
Heritability
Heterogeneity
High-Throughput Nucleotide Sequencing
Humans
Laboratories
Learning algorithms
Leukemia
Machine Learning
Medicine and Health Sciences
Metabolism
Models, Genetic
Mutation
Novels
Polymorphism, Single Nucleotide
Regularization
Regulation
RNA, Untranslated - genetics
Stochastic Processes
Stochasticity
Supervision
Tumorigenesis
Tumors
Whole Genome Sequencing
title Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T13%3A10%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dark-matter%20matters:%20Discriminating%20subtle%20blood%20cancers%20using%20the%20darkest%20DNA&rft.jtitle=PLoS%20computational%20biology&rft.au=Parida,%20Laxmi&rft.date=2019-08-01&rft.volume=15&rft.issue=8&rft.spage=e1007332&rft.epage=e1007332&rft.pages=e1007332-e1007332&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1007332&rft_dat=%3Cgale_plos_%3EA600426383%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2291472438&rft_id=info:pmid/31469830&rft_galeid=A600426383&rft_doaj_id=oai_doaj_org_article_31da23faa53b4e68b4123d50e3164b54&rfr_iscdi=true