Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human

DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of molecular sciences 2017-02, Vol.18 (2), p.420-420
Hauptverfasser: Wu, Chengchao, Yao, Shixin, Li, Xinghao, Chen, Chujia, Hu, Xuehai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 420
container_issue 2
container_start_page 420
container_title International journal of molecular sciences
container_volume 18
creator Wu, Chengchao
Yao, Shixin
Li, Xinghao
Chen, Chujia
Hu, Xuehai
description DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.
doi_str_mv 10.3390/ijms18020420
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5343954</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1891861563</sourcerecordid><originalsourceid>FETCH-LOGICAL-c445t-2e29f2cf8fb355d5b3c3be494d1352c9b817d0d99e376cdd48864c75c382dc7a3</originalsourceid><addsrcrecordid>eNqNkc1P3DAQxS1UBHThxhlF4tJDA_5M7AsS2rZQiS8JUI-uY0_Aq8Re4qTq_vcNu4AWTj159Oan55l5CO0TfMSYwsd-1iYiMcWc4g20QzilOcZF-Wmt3kafU5phTBkVagttU0kJZYTuoN9nEGIL-S_vILvpwHnb-xiyWGffrk6zS-gfF41ZSvfJh4elOo3tPCa_VE1w2S08DRAsLBsN_PX9IvMhOx9aE3bRZm2aBHsv7wTd__h-Nz3PL67Pfk5PL3LLuehzClTV1NayrpgQTlTMsgq44o4wQa2qJCkddkoBKwvrHJey4LYUlknqbGnYBJ2sfOdD1YKzEPrONHre-dZ0Cx2N1-87wT_qh_hHC8aZEnw0-PJi0MVxndTr1icLTWMCxCFpIhWRBREF-w-0UKoQ4_lH9PADOotDF8ZLjFQpx8yK8vnvryvKdjGlDuq3uQnWzynr9ZRH_GB91zf4NVb2D_DyorQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1878420674</pqid></control><display><type>article</type><title>Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>Wu, Chengchao ; Yao, Shixin ; Li, Xinghao ; Chen, Chujia ; Hu, Xuehai</creator><creatorcontrib>Wu, Chengchao ; Yao, Shixin ; Li, Xinghao ; Chen, Chujia ; Hu, Xuehai</creatorcontrib><description>DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.</description><identifier>ISSN: 1422-0067</identifier><identifier>ISSN: 1661-6596</identifier><identifier>EISSN: 1422-0067</identifier><identifier>DOI: 10.3390/ijms18020420</identifier><identifier>PMID: 28212312</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Animal models ; Animals ; Assaying ; Base Composition ; Chromosomes ; Computational Biology - methods ; Computer applications ; CpG Islands ; Datasets as Topic ; Deoxyribonucleic acid ; DNA ; DNA Methylation ; Embryo cells ; Epigenomics - methods ; Gene expression ; Gene Expression Profiling ; Gene regulation ; Genes ; Genome, Human ; Genome-Wide Association Study ; Genomes ; Humans ; Mathematical models ; Models, Genetic ; Nucleotide sequence ; Prediction models ; Reproducibility of Results ; ROC Curve ; Species Specificity ; Stem cell transplantation ; Stem cells ; Transcription</subject><ispartof>International journal of molecular sciences, 2017-02, Vol.18 (2), p.420-420</ispartof><rights>Copyright MDPI AG 2017</rights><rights>2017 by the authors. 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c445t-2e29f2cf8fb355d5b3c3be494d1352c9b817d0d99e376cdd48864c75c382dc7a3</citedby><cites>FETCH-LOGICAL-c445t-2e29f2cf8fb355d5b3c3be494d1352c9b817d0d99e376cdd48864c75c382dc7a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5343954/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5343954/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28212312$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Chengchao</creatorcontrib><creatorcontrib>Yao, Shixin</creatorcontrib><creatorcontrib>Li, Xinghao</creatorcontrib><creatorcontrib>Chen, Chujia</creatorcontrib><creatorcontrib>Hu, Xuehai</creatorcontrib><title>Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human</title><title>International journal of molecular sciences</title><addtitle>Int J Mol Sci</addtitle><description>DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.</description><subject>Animal models</subject><subject>Animals</subject><subject>Assaying</subject><subject>Base Composition</subject><subject>Chromosomes</subject><subject>Computational Biology - methods</subject><subject>Computer applications</subject><subject>CpG Islands</subject><subject>Datasets as Topic</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA Methylation</subject><subject>Embryo cells</subject><subject>Epigenomics - methods</subject><subject>Gene expression</subject><subject>Gene Expression Profiling</subject><subject>Gene regulation</subject><subject>Genes</subject><subject>Genome, Human</subject><subject>Genome-Wide Association Study</subject><subject>Genomes</subject><subject>Humans</subject><subject>Mathematical models</subject><subject>Models, Genetic</subject><subject>Nucleotide sequence</subject><subject>Prediction models</subject><subject>Reproducibility of Results</subject><subject>ROC Curve</subject><subject>Species Specificity</subject><subject>Stem cell transplantation</subject><subject>Stem cells</subject><subject>Transcription</subject><issn>1422-0067</issn><issn>1661-6596</issn><issn>1422-0067</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqNkc1P3DAQxS1UBHThxhlF4tJDA_5M7AsS2rZQiS8JUI-uY0_Aq8Re4qTq_vcNu4AWTj159Oan55l5CO0TfMSYwsd-1iYiMcWc4g20QzilOcZF-Wmt3kafU5phTBkVagttU0kJZYTuoN9nEGIL-S_vILvpwHnb-xiyWGffrk6zS-gfF41ZSvfJh4elOo3tPCa_VE1w2S08DRAsLBsN_PX9IvMhOx9aE3bRZm2aBHsv7wTd__h-Nz3PL67Pfk5PL3LLuehzClTV1NayrpgQTlTMsgq44o4wQa2qJCkddkoBKwvrHJey4LYUlknqbGnYBJ2sfOdD1YKzEPrONHre-dZ0Cx2N1-87wT_qh_hHC8aZEnw0-PJi0MVxndTr1icLTWMCxCFpIhWRBREF-w-0UKoQ4_lH9PADOotDF8ZLjFQpx8yK8vnvryvKdjGlDuq3uQnWzynr9ZRH_GB91zf4NVb2D_DyorQ</recordid><startdate>20170216</startdate><enddate>20170216</enddate><creator>Wu, Chengchao</creator><creator>Yao, Shixin</creator><creator>Li, Xinghao</creator><creator>Chen, Chujia</creator><creator>Hu, Xuehai</creator><general>MDPI AG</general><general>MDPI</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>K9.</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>MBDVC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>7TK</scope><scope>5PM</scope></search><sort><creationdate>20170216</creationdate><title>Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human</title><author>Wu, Chengchao ; Yao, Shixin ; Li, Xinghao ; Chen, Chujia ; Hu, Xuehai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c445t-2e29f2cf8fb355d5b3c3be494d1352c9b817d0d99e376cdd48864c75c382dc7a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Animal models</topic><topic>Animals</topic><topic>Assaying</topic><topic>Base Composition</topic><topic>Chromosomes</topic><topic>Computational Biology - methods</topic><topic>Computer applications</topic><topic>CpG Islands</topic><topic>Datasets as Topic</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA Methylation</topic><topic>Embryo cells</topic><topic>Epigenomics - methods</topic><topic>Gene expression</topic><topic>Gene Expression Profiling</topic><topic>Gene regulation</topic><topic>Genes</topic><topic>Genome, Human</topic><topic>Genome-Wide Association Study</topic><topic>Genomes</topic><topic>Humans</topic><topic>Mathematical models</topic><topic>Models, Genetic</topic><topic>Nucleotide sequence</topic><topic>Prediction models</topic><topic>Reproducibility of Results</topic><topic>ROC Curve</topic><topic>Species Specificity</topic><topic>Stem cell transplantation</topic><topic>Stem cells</topic><topic>Transcription</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Chengchao</creatorcontrib><creatorcontrib>Yao, Shixin</creatorcontrib><creatorcontrib>Li, Xinghao</creatorcontrib><creatorcontrib>Chen, Chujia</creatorcontrib><creatorcontrib>Hu, Xuehai</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>Neurosciences Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>International journal of molecular sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Chengchao</au><au>Yao, Shixin</au><au>Li, Xinghao</au><au>Chen, Chujia</au><au>Hu, Xuehai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human</atitle><jtitle>International journal of molecular sciences</jtitle><addtitle>Int J Mol Sci</addtitle><date>2017-02-16</date><risdate>2017</risdate><volume>18</volume><issue>2</issue><spage>420</spage><epage>420</epage><pages>420-420</pages><issn>1422-0067</issn><issn>1661-6596</issn><eissn>1422-0067</eissn><abstract>DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>28212312</pmid><doi>10.3390/ijms18020420</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1422-0067
ispartof International journal of molecular sciences, 2017-02, Vol.18 (2), p.420-420
issn 1422-0067
1661-6596
1422-0067
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5343954
source MDPI - Multidisciplinary Digital Publishing Institute; MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central
subjects Animal models
Animals
Assaying
Base Composition
Chromosomes
Computational Biology - methods
Computer applications
CpG Islands
Datasets as Topic
Deoxyribonucleic acid
DNA
DNA Methylation
Embryo cells
Epigenomics - methods
Gene expression
Gene Expression Profiling
Gene regulation
Genes
Genome, Human
Genome-Wide Association Study
Genomes
Humans
Mathematical models
Models, Genetic
Nucleotide sequence
Prediction models
Reproducibility of Results
ROC Curve
Species Specificity
Stem cell transplantation
Stem cells
Transcription
title Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T11%3A25%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Genome-Wide%20Prediction%20of%20DNA%20Methylation%20Using%20DNA%20Composition%20and%20Sequence%20Complexity%20in%20Human&rft.jtitle=International%20journal%20of%20molecular%20sciences&rft.au=Wu,%20Chengchao&rft.date=2017-02-16&rft.volume=18&rft.issue=2&rft.spage=420&rft.epage=420&rft.pages=420-420&rft.issn=1422-0067&rft.eissn=1422-0067&rft_id=info:doi/10.3390/ijms18020420&rft_dat=%3Cproquest_pubme%3E1891861563%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1878420674&rft_id=info:pmid/28212312&rfr_iscdi=true