Shall genomic correlation structure be considered in copy number variants detection?
Copy number variation has been identified as a major source of genomic variation associated with disease susceptibility. With the advent of whole-exome sequencing (WES) technology, massive WES data have been generated, allowing for the identification of copy number variants (CNVs) in the protein-cod...
Gespeichert in:
Veröffentlicht in: | Briefings in bioinformatics 2021-11, Vol.22 (6) |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 6 |
container_start_page | |
container_title | Briefings in bioinformatics |
container_volume | 22 |
creator | Qin, Fei Luo, Xizhi Cai, Guoshuai Xiao, Feifei |
description | Copy number variation has been identified as a major source of genomic variation associated with disease susceptibility. With the advent of whole-exome sequencing (WES) technology, massive WES data have been generated, allowing for the identification of copy number variants (CNVs) in the protein-coding regions with direct functional interpretation. We have previously shown evidence of the genomic correlation structure in array data and developed a novel chromosomal breakpoint detection algorithm, LDcnv, which showed significantly improved detection power through integrating the correlation structure in a systematic modeling manner. However, it remains unexplored whether the genomic correlation exists in WES data and how such correlation structure integration can improve the CNV detection accuracy. In this study, we first explored the correlation structure of the WES data using the 1000 Genomes Project data. Both real raw read depth and median-normalized data showed strong evidence of the correlation structure. Motivated by this fact, we proposed a correlation-based method, CORRseq, as a novel release of the LDcnv algorithm in profiling WES data. The performance of CORRseq was evaluated in extensive simulation studies and real data analysis from the 1000 Genomes Project. CORRseq outperformed the existing methods in detecting medium and large CNVs. In conclusion, it would be more advantageous to model genomic correlation structure in detecting relatively long CNVs. This study provides great insights for methodology development of CNV detection with NGS data. |
doi_str_mv | 10.1093/bib/bbab215 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8768456</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2540521446</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-dd4eba7d3ea082807be8b504431be334fcf9fda2a99fbe0cfd7fd2229f84cd103</originalsourceid><addsrcrecordid>eNpVUU1Lw0AQXUSxtXryLjkKErtfaZKLIsUvKHiwnpf9mG1Xkk3dTQr99ya0ip5mhvfmzfAeQpcE3xJcsqlyaqqUVJRkR2hMeJ6nHGf8eOhneZrxGRuhsxg_MaY4L8gpGjFOCMc4G6Pl-1pWVbIC39ROJ7oJASrZusYnsQ2dbrsAiYIe8NEZCGAS5_tps0t8VysIyVYGJ30bEwMt6GHz_hydWFlFuDjUCfp4elzOX9LF2_Pr_GGRalaQNjWGg5K5YSBxQQucKyhUhjlnRAFj3GpbWiOpLEurAGtrcmsopaUtuDYEswm62-tuOlWD0eDbICuxCa6WYSca6cR_xLu1WDVbUeSzgmezXuD6IBCarw5iK2oXNVSV9NB0UdCsd5ISzgfqzZ6qQxNjAPt7hmAx5CD6HMQhh5599fezX-6P8ewbDfWIMg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2540521446</pqid></control><display><type>article</type><title>Shall genomic correlation structure be considered in copy number variants detection?</title><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>EBSCOhost Business Source Complete</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><creator>Qin, Fei ; Luo, Xizhi ; Cai, Guoshuai ; Xiao, Feifei</creator><creatorcontrib>Qin, Fei ; Luo, Xizhi ; Cai, Guoshuai ; Xiao, Feifei</creatorcontrib><description>Copy number variation has been identified as a major source of genomic variation associated with disease susceptibility. With the advent of whole-exome sequencing (WES) technology, massive WES data have been generated, allowing for the identification of copy number variants (CNVs) in the protein-coding regions with direct functional interpretation. We have previously shown evidence of the genomic correlation structure in array data and developed a novel chromosomal breakpoint detection algorithm, LDcnv, which showed significantly improved detection power through integrating the correlation structure in a systematic modeling manner. However, it remains unexplored whether the genomic correlation exists in WES data and how such correlation structure integration can improve the CNV detection accuracy. In this study, we first explored the correlation structure of the WES data using the 1000 Genomes Project data. Both real raw read depth and median-normalized data showed strong evidence of the correlation structure. Motivated by this fact, we proposed a correlation-based method, CORRseq, as a novel release of the LDcnv algorithm in profiling WES data. The performance of CORRseq was evaluated in extensive simulation studies and real data analysis from the 1000 Genomes Project. CORRseq outperformed the existing methods in detecting medium and large CNVs. In conclusion, it would be more advantageous to model genomic correlation structure in detecting relatively long CNVs. This study provides great insights for methodology development of CNV detection with NGS data.</description><identifier>ISSN: 1467-5463</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbab215</identifier><identifier>PMID: 34114005</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Computational Biology - methods ; DNA Copy Number Variations ; Genetic Association Studies - methods ; Genetic Predisposition to Disease ; Genetic Testing - methods ; Genomics - methods ; Humans ; Problem Solving Protocol ; Software ; Whole Exome Sequencing ; Workflow</subject><ispartof>Briefings in bioinformatics, 2021-11, Vol.22 (6)</ispartof><rights>The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.</rights><rights>The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-dd4eba7d3ea082807be8b504431be334fcf9fda2a99fbe0cfd7fd2229f84cd103</citedby><cites>FETCH-LOGICAL-c381t-dd4eba7d3ea082807be8b504431be334fcf9fda2a99fbe0cfd7fd2229f84cd103</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8768456/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8768456/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34114005$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Qin, Fei</creatorcontrib><creatorcontrib>Luo, Xizhi</creatorcontrib><creatorcontrib>Cai, Guoshuai</creatorcontrib><creatorcontrib>Xiao, Feifei</creatorcontrib><title>Shall genomic correlation structure be considered in copy number variants detection?</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Copy number variation has been identified as a major source of genomic variation associated with disease susceptibility. With the advent of whole-exome sequencing (WES) technology, massive WES data have been generated, allowing for the identification of copy number variants (CNVs) in the protein-coding regions with direct functional interpretation. We have previously shown evidence of the genomic correlation structure in array data and developed a novel chromosomal breakpoint detection algorithm, LDcnv, which showed significantly improved detection power through integrating the correlation structure in a systematic modeling manner. However, it remains unexplored whether the genomic correlation exists in WES data and how such correlation structure integration can improve the CNV detection accuracy. In this study, we first explored the correlation structure of the WES data using the 1000 Genomes Project data. Both real raw read depth and median-normalized data showed strong evidence of the correlation structure. Motivated by this fact, we proposed a correlation-based method, CORRseq, as a novel release of the LDcnv algorithm in profiling WES data. The performance of CORRseq was evaluated in extensive simulation studies and real data analysis from the 1000 Genomes Project. CORRseq outperformed the existing methods in detecting medium and large CNVs. In conclusion, it would be more advantageous to model genomic correlation structure in detecting relatively long CNVs. This study provides great insights for methodology development of CNV detection with NGS data.</description><subject>Algorithms</subject><subject>Computational Biology - methods</subject><subject>DNA Copy Number Variations</subject><subject>Genetic Association Studies - methods</subject><subject>Genetic Predisposition to Disease</subject><subject>Genetic Testing - methods</subject><subject>Genomics - methods</subject><subject>Humans</subject><subject>Problem Solving Protocol</subject><subject>Software</subject><subject>Whole Exome Sequencing</subject><subject>Workflow</subject><issn>1467-5463</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpVUU1Lw0AQXUSxtXryLjkKErtfaZKLIsUvKHiwnpf9mG1Xkk3dTQr99ya0ip5mhvfmzfAeQpcE3xJcsqlyaqqUVJRkR2hMeJ6nHGf8eOhneZrxGRuhsxg_MaY4L8gpGjFOCMc4G6Pl-1pWVbIC39ROJ7oJASrZusYnsQ2dbrsAiYIe8NEZCGAS5_tps0t8VysIyVYGJ30bEwMt6GHz_hydWFlFuDjUCfp4elzOX9LF2_Pr_GGRalaQNjWGg5K5YSBxQQucKyhUhjlnRAFj3GpbWiOpLEurAGtrcmsopaUtuDYEswm62-tuOlWD0eDbICuxCa6WYSca6cR_xLu1WDVbUeSzgmezXuD6IBCarw5iK2oXNVSV9NB0UdCsd5ISzgfqzZ6qQxNjAPt7hmAx5CD6HMQhh5599fezX-6P8ewbDfWIMg</recordid><startdate>20211105</startdate><enddate>20211105</enddate><creator>Qin, Fei</creator><creator>Luo, Xizhi</creator><creator>Cai, Guoshuai</creator><creator>Xiao, Feifei</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20211105</creationdate><title>Shall genomic correlation structure be considered in copy number variants detection?</title><author>Qin, Fei ; Luo, Xizhi ; Cai, Guoshuai ; Xiao, Feifei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-dd4eba7d3ea082807be8b504431be334fcf9fda2a99fbe0cfd7fd2229f84cd103</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Computational Biology - methods</topic><topic>DNA Copy Number Variations</topic><topic>Genetic Association Studies - methods</topic><topic>Genetic Predisposition to Disease</topic><topic>Genetic Testing - methods</topic><topic>Genomics - methods</topic><topic>Humans</topic><topic>Problem Solving Protocol</topic><topic>Software</topic><topic>Whole Exome Sequencing</topic><topic>Workflow</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qin, Fei</creatorcontrib><creatorcontrib>Luo, Xizhi</creatorcontrib><creatorcontrib>Cai, Guoshuai</creatorcontrib><creatorcontrib>Xiao, Feifei</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qin, Fei</au><au>Luo, Xizhi</au><au>Cai, Guoshuai</au><au>Xiao, Feifei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Shall genomic correlation structure be considered in copy number variants detection?</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2021-11-05</date><risdate>2021</risdate><volume>22</volume><issue>6</issue><issn>1467-5463</issn><eissn>1477-4054</eissn><abstract>Copy number variation has been identified as a major source of genomic variation associated with disease susceptibility. With the advent of whole-exome sequencing (WES) technology, massive WES data have been generated, allowing for the identification of copy number variants (CNVs) in the protein-coding regions with direct functional interpretation. We have previously shown evidence of the genomic correlation structure in array data and developed a novel chromosomal breakpoint detection algorithm, LDcnv, which showed significantly improved detection power through integrating the correlation structure in a systematic modeling manner. However, it remains unexplored whether the genomic correlation exists in WES data and how such correlation structure integration can improve the CNV detection accuracy. In this study, we first explored the correlation structure of the WES data using the 1000 Genomes Project data. Both real raw read depth and median-normalized data showed strong evidence of the correlation structure. Motivated by this fact, we proposed a correlation-based method, CORRseq, as a novel release of the LDcnv algorithm in profiling WES data. The performance of CORRseq was evaluated in extensive simulation studies and real data analysis from the 1000 Genomes Project. CORRseq outperformed the existing methods in detecting medium and large CNVs. In conclusion, it would be more advantageous to model genomic correlation structure in detecting relatively long CNVs. This study provides great insights for methodology development of CNV detection with NGS data.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>34114005</pmid><doi>10.1093/bib/bbab215</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1467-5463 |
ispartof | Briefings in bioinformatics, 2021-11, Vol.22 (6) |
issn | 1467-5463 1477-4054 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8768456 |
source | MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; EBSCOhost Business Source Complete; Oxford Journals Open Access Collection; PubMed Central |
subjects | Algorithms Computational Biology - methods DNA Copy Number Variations Genetic Association Studies - methods Genetic Predisposition to Disease Genetic Testing - methods Genomics - methods Humans Problem Solving Protocol Software Whole Exome Sequencing Workflow |
title | Shall genomic correlation structure be considered in copy number variants detection? |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T09%3A43%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Shall%20genomic%20correlation%20structure%20be%20considered%20in%20copy%20number%20variants%20detection?&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Qin,%20Fei&rft.date=2021-11-05&rft.volume=22&rft.issue=6&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbab215&rft_dat=%3Cproquest_pubme%3E2540521446%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2540521446&rft_id=info:pmid/34114005&rfr_iscdi=true |