Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA
The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confoun...
Gespeichert in:
Veröffentlicht in: | PLoS computational biology 2019-08, Vol.15 (8), p.e1007332-e1007332 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e1007332 |
---|---|
container_issue | 8 |
container_start_page | e1007332 |
container_title | PLoS computational biology |
container_volume | 15 |
creator | Parida, Laxmi Haferlach, Claudia Rhrissorrakrai, Kahn Utro, Filippo Levovitz, Chaya Kern, Wolfgang Nadarajah, Niroshan Twardziok, Sven Hutter, Stephan Meggendorfer, Manja Walter, Wencke Baer, Constance Haferlach, Torsten |
description | The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention. |
doi_str_mv | 10.1371/journal.pcbi.1007332 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2291472438</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A600426383</galeid><doaj_id>oai_doaj_org_article_31da23faa53b4e68b4123d50e3164b54</doaj_id><sourcerecordid>A600426383</sourcerecordid><originalsourceid>FETCH-LOGICAL-c633t-7c28d4a77e5cf248e6a743012c14ad49f49cb37559243156d243ee2583668ee53</originalsourceid><addsrcrecordid>eNqVkk2P0zAQhiMEYj_gHyCIxGU5tNge20n2gFRt-ai0KhIfZ8txJl2XNC62g-Df45Lsaou4IB_G8jzzeubVZNkzSuYUCvp66wbf626-N7WdU0IKAPYgO6VCwKwAUT68dz_JzkLYEpKulXycnQDlsiqBnGbrpfbfZjsdI_p8DOEyX9pgvN3ZXkfbb_Iw1LHDvO6ca3Kje5OgfAiHVLzBvEkSGGK-XC-eZI9a3QV8OsXz7Ou7t1-uPsyuP75fXS2uZ0YCxFlhWNlwXRQoTMt4iVIXHAhlhnLd8KrllamhEKJiHKiQTQqITJQgZYko4Dx7MeruOxfUZEVQjFWUFwkuE7EaicbprdqnabT_pZy26s-D8xulfbSmQwW00QxarQXUHGVZc8qgEQSBSl4LnrTeTL8N9Q4bg330ujsSPc709kZt3A8lC844p0ngYhLw7vuQvFK75DB2ne7RDYe-S6CUyPIw2cu_0H9PNx-pjU4D2L516V-TToM7a1yPrU3vC0kIZzLxqeDVUUFiIv6MGz2EoFafP_0Huz5m-cga70Lw2N65Qok67Olt--qwp2ra01T2_L6jd0W3iwm_AY-w4js</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2291472438</pqid></control><display><type>article</type><title>Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Parida, Laxmi ; Haferlach, Claudia ; Rhrissorrakrai, Kahn ; Utro, Filippo ; Levovitz, Chaya ; Kern, Wolfgang ; Nadarajah, Niroshan ; Twardziok, Sven ; Hutter, Stephan ; Meggendorfer, Manja ; Walter, Wencke ; Baer, Constance ; Haferlach, Torsten</creator><contributor>Kann, Maricel G</contributor><creatorcontrib>Parida, Laxmi ; Haferlach, Claudia ; Rhrissorrakrai, Kahn ; Utro, Filippo ; Levovitz, Chaya ; Kern, Wolfgang ; Nadarajah, Niroshan ; Twardziok, Sven ; Hutter, Stephan ; Meggendorfer, Manja ; Walter, Wencke ; Baer, Constance ; Haferlach, Torsten ; Kann, Maricel G</creatorcontrib><description>The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1007332</identifier><identifier>PMID: 31469830</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Analysis ; Annotations ; Artificial intelligence ; Biology and Life Sciences ; Blood cancer ; Cancer ; Cancer genetics ; Computational Biology ; Dark matter ; Databases, Nucleic Acid ; Deoxyribonucleic acid ; Disease ; DNA ; DNA sequencing ; DNA, Neoplasm - genetics ; Etiology ; Etiology (Medicine) ; Gene Frequency ; Gene sequencing ; Genetic aspects ; Genome, Human ; Genomes ; Genomics ; Hematologic Neoplasms - classification ; Hematologic Neoplasms - genetics ; Hematology ; Heritability ; Heterogeneity ; High-Throughput Nucleotide Sequencing ; Humans ; Laboratories ; Learning algorithms ; Leukemia ; Machine Learning ; Medicine and Health Sciences ; Metabolism ; Models, Genetic ; Mutation ; Novels ; Polymorphism, Single Nucleotide ; Regularization ; Regulation ; RNA, Untranslated - genetics ; Stochastic Processes ; Stochasticity ; Supervision ; Tumorigenesis ; Tumors ; Whole Genome Sequencing</subject><ispartof>PLoS computational biology, 2019-08, Vol.15 (8), p.e1007332-e1007332</ispartof><rights>COPYRIGHT 2019 Public Library of Science</rights><rights>2019 Parida et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2019 Parida et al 2019 Parida et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c633t-7c28d4a77e5cf248e6a743012c14ad49f49cb37559243156d243ee2583668ee53</citedby><cites>FETCH-LOGICAL-c633t-7c28d4a77e5cf248e6a743012c14ad49f49cb37559243156d243ee2583668ee53</cites><orcidid>0000-0002-6486-9675 ; 0000-0003-3226-7642 ; 0000-0002-1567-9090 ; 0000-0002-7872-5074</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742441/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742441/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2101,2927,23865,27923,27924,53790,53792,79471,79472</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31469830$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Kann, Maricel G</contributor><creatorcontrib>Parida, Laxmi</creatorcontrib><creatorcontrib>Haferlach, Claudia</creatorcontrib><creatorcontrib>Rhrissorrakrai, Kahn</creatorcontrib><creatorcontrib>Utro, Filippo</creatorcontrib><creatorcontrib>Levovitz, Chaya</creatorcontrib><creatorcontrib>Kern, Wolfgang</creatorcontrib><creatorcontrib>Nadarajah, Niroshan</creatorcontrib><creatorcontrib>Twardziok, Sven</creatorcontrib><creatorcontrib>Hutter, Stephan</creatorcontrib><creatorcontrib>Meggendorfer, Manja</creatorcontrib><creatorcontrib>Walter, Wencke</creatorcontrib><creatorcontrib>Baer, Constance</creatorcontrib><creatorcontrib>Haferlach, Torsten</creatorcontrib><title>Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Annotations</subject><subject>Artificial intelligence</subject><subject>Biology and Life Sciences</subject><subject>Blood cancer</subject><subject>Cancer</subject><subject>Cancer genetics</subject><subject>Computational Biology</subject><subject>Dark matter</subject><subject>Databases, Nucleic Acid</subject><subject>Deoxyribonucleic acid</subject><subject>Disease</subject><subject>DNA</subject><subject>DNA sequencing</subject><subject>DNA, Neoplasm - genetics</subject><subject>Etiology</subject><subject>Etiology (Medicine)</subject><subject>Gene Frequency</subject><subject>Gene sequencing</subject><subject>Genetic aspects</subject><subject>Genome, Human</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Hematologic Neoplasms - classification</subject><subject>Hematologic Neoplasms - genetics</subject><subject>Hematology</subject><subject>Heritability</subject><subject>Heterogeneity</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Humans</subject><subject>Laboratories</subject><subject>Learning algorithms</subject><subject>Leukemia</subject><subject>Machine Learning</subject><subject>Medicine and Health Sciences</subject><subject>Metabolism</subject><subject>Models, Genetic</subject><subject>Mutation</subject><subject>Novels</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Regularization</subject><subject>Regulation</subject><subject>RNA, Untranslated - genetics</subject><subject>Stochastic Processes</subject><subject>Stochasticity</subject><subject>Supervision</subject><subject>Tumorigenesis</subject><subject>Tumors</subject><subject>Whole Genome Sequencing</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkk2P0zAQhiMEYj_gHyCIxGU5tNge20n2gFRt-ai0KhIfZ8txJl2XNC62g-Df45Lsaou4IB_G8jzzeubVZNkzSuYUCvp66wbf626-N7WdU0IKAPYgO6VCwKwAUT68dz_JzkLYEpKulXycnQDlsiqBnGbrpfbfZjsdI_p8DOEyX9pgvN3ZXkfbb_Iw1LHDvO6ca3Kje5OgfAiHVLzBvEkSGGK-XC-eZI9a3QV8OsXz7Ou7t1-uPsyuP75fXS2uZ0YCxFlhWNlwXRQoTMt4iVIXHAhlhnLd8KrllamhEKJiHKiQTQqITJQgZYko4Dx7MeruOxfUZEVQjFWUFwkuE7EaicbprdqnabT_pZy26s-D8xulfbSmQwW00QxarQXUHGVZc8qgEQSBSl4LnrTeTL8N9Q4bg330ujsSPc709kZt3A8lC844p0ngYhLw7vuQvFK75DB2ne7RDYe-S6CUyPIw2cu_0H9PNx-pjU4D2L516V-TToM7a1yPrU3vC0kIZzLxqeDVUUFiIv6MGz2EoFafP_0Huz5m-cga70Lw2N65Qok67Olt--qwp2ra01T2_L6jd0W3iwm_AY-w4js</recordid><startdate>20190801</startdate><enddate>20190801</enddate><creator>Parida, Laxmi</creator><creator>Haferlach, Claudia</creator><creator>Rhrissorrakrai, Kahn</creator><creator>Utro, Filippo</creator><creator>Levovitz, Chaya</creator><creator>Kern, Wolfgang</creator><creator>Nadarajah, Niroshan</creator><creator>Twardziok, Sven</creator><creator>Hutter, Stephan</creator><creator>Meggendorfer, Manja</creator><creator>Walter, Wencke</creator><creator>Baer, Constance</creator><creator>Haferlach, Torsten</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6486-9675</orcidid><orcidid>https://orcid.org/0000-0003-3226-7642</orcidid><orcidid>https://orcid.org/0000-0002-1567-9090</orcidid><orcidid>https://orcid.org/0000-0002-7872-5074</orcidid></search><sort><creationdate>20190801</creationdate><title>Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA</title><author>Parida, Laxmi ; Haferlach, Claudia ; Rhrissorrakrai, Kahn ; Utro, Filippo ; Levovitz, Chaya ; Kern, Wolfgang ; Nadarajah, Niroshan ; Twardziok, Sven ; Hutter, Stephan ; Meggendorfer, Manja ; Walter, Wencke ; Baer, Constance ; Haferlach, Torsten</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c633t-7c28d4a77e5cf248e6a743012c14ad49f49cb37559243156d243ee2583668ee53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Annotations</topic><topic>Artificial intelligence</topic><topic>Biology and Life Sciences</topic><topic>Blood cancer</topic><topic>Cancer</topic><topic>Cancer genetics</topic><topic>Computational Biology</topic><topic>Dark matter</topic><topic>Databases, Nucleic Acid</topic><topic>Deoxyribonucleic acid</topic><topic>Disease</topic><topic>DNA</topic><topic>DNA sequencing</topic><topic>DNA, Neoplasm - genetics</topic><topic>Etiology</topic><topic>Etiology (Medicine)</topic><topic>Gene Frequency</topic><topic>Gene sequencing</topic><topic>Genetic aspects</topic><topic>Genome, Human</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Hematologic Neoplasms - classification</topic><topic>Hematologic Neoplasms - genetics</topic><topic>Hematology</topic><topic>Heritability</topic><topic>Heterogeneity</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Humans</topic><topic>Laboratories</topic><topic>Learning algorithms</topic><topic>Leukemia</topic><topic>Machine Learning</topic><topic>Medicine and Health Sciences</topic><topic>Metabolism</topic><topic>Models, Genetic</topic><topic>Mutation</topic><topic>Novels</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Regularization</topic><topic>Regulation</topic><topic>RNA, Untranslated - genetics</topic><topic>Stochastic Processes</topic><topic>Stochasticity</topic><topic>Supervision</topic><topic>Tumorigenesis</topic><topic>Tumors</topic><topic>Whole Genome Sequencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Parida, Laxmi</creatorcontrib><creatorcontrib>Haferlach, Claudia</creatorcontrib><creatorcontrib>Rhrissorrakrai, Kahn</creatorcontrib><creatorcontrib>Utro, Filippo</creatorcontrib><creatorcontrib>Levovitz, Chaya</creatorcontrib><creatorcontrib>Kern, Wolfgang</creatorcontrib><creatorcontrib>Nadarajah, Niroshan</creatorcontrib><creatorcontrib>Twardziok, Sven</creatorcontrib><creatorcontrib>Hutter, Stephan</creatorcontrib><creatorcontrib>Meggendorfer, Manja</creatorcontrib><creatorcontrib>Walter, Wencke</creatorcontrib><creatorcontrib>Baer, Constance</creatorcontrib><creatorcontrib>Haferlach, Torsten</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Parida, Laxmi</au><au>Haferlach, Claudia</au><au>Rhrissorrakrai, Kahn</au><au>Utro, Filippo</au><au>Levovitz, Chaya</au><au>Kern, Wolfgang</au><au>Nadarajah, Niroshan</au><au>Twardziok, Sven</au><au>Hutter, Stephan</au><au>Meggendorfer, Manja</au><au>Walter, Wencke</au><au>Baer, Constance</au><au>Haferlach, Torsten</au><au>Kann, Maricel G</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2019-08-01</date><risdate>2019</risdate><volume>15</volume><issue>8</issue><spage>e1007332</spage><epage>e1007332</epage><pages>e1007332-e1007332</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>31469830</pmid><doi>10.1371/journal.pcbi.1007332</doi><orcidid>https://orcid.org/0000-0002-6486-9675</orcidid><orcidid>https://orcid.org/0000-0003-3226-7642</orcidid><orcidid>https://orcid.org/0000-0002-1567-9090</orcidid><orcidid>https://orcid.org/0000-0002-7872-5074</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1553-7358 |
ispartof | PLoS computational biology, 2019-08, Vol.15 (8), p.e1007332-e1007332 |
issn | 1553-7358 1553-734X 1553-7358 |
language | eng |
recordid | cdi_plos_journals_2291472438 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS); EZB-FREE-00999 freely available EZB journals; PubMed Central |
subjects | Algorithms Analysis Annotations Artificial intelligence Biology and Life Sciences Blood cancer Cancer Cancer genetics Computational Biology Dark matter Databases, Nucleic Acid Deoxyribonucleic acid Disease DNA DNA sequencing DNA, Neoplasm - genetics Etiology Etiology (Medicine) Gene Frequency Gene sequencing Genetic aspects Genome, Human Genomes Genomics Hematologic Neoplasms - classification Hematologic Neoplasms - genetics Hematology Heritability Heterogeneity High-Throughput Nucleotide Sequencing Humans Laboratories Learning algorithms Leukemia Machine Learning Medicine and Health Sciences Metabolism Models, Genetic Mutation Novels Polymorphism, Single Nucleotide Regularization Regulation RNA, Untranslated - genetics Stochastic Processes Stochasticity Supervision Tumorigenesis Tumors Whole Genome Sequencing |
title | Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T13%3A10%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dark-matter%20matters:%20Discriminating%20subtle%20blood%20cancers%20using%20the%20darkest%20DNA&rft.jtitle=PLoS%20computational%20biology&rft.au=Parida,%20Laxmi&rft.date=2019-08-01&rft.volume=15&rft.issue=8&rft.spage=e1007332&rft.epage=e1007332&rft.pages=e1007332-e1007332&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1007332&rft_dat=%3Cgale_plos_%3EA600426383%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2291472438&rft_id=info:pmid/31469830&rft_galeid=A600426383&rft_doaj_id=oai_doaj_org_article_31da23faa53b4e68b4123d50e3164b54&rfr_iscdi=true |