Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species

Abstract Motivation As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction–modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine lear...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2019-04, Vol.35 (8), p.1326-1333
Hauptverfasser:	Wei, Leyi, Luan, Shasha, Nagai, Luis Augusto Eijy, Su, Ran, Zou, Quan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms DNA - genetics Genome Machine Learning Support Vector Machine
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1333
container_issue	8
container_start_page	1326
container_title	Bioinformatics
container_volume	35
creator	Wei, Leyi Luan, Shasha Nagai, Luis Augusto Eijy Su, Ran Zou, Quan
description	Abstract Motivation As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction–modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites. Results In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites. Availability and implementation The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv	10.1093/bioinformatics/bty824
format	Article
fullrecord	<record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_2111151456</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/bty824</oup_id><sourcerecordid>2111151456</sourcerecordid><originalsourceid>FETCH-LOGICAL-c416t-c219f73af628351b0f8bf24b9f7c3cf73637f7ecdbb735341cc179149bca6fa53</originalsourceid><addsrcrecordid>eNqNkE1PwzAMhiMEYmPwE0A5cilLmvTrOI3xIU3jAucqyRwW1DYlSRH79wR1IHHDl1j2-9rOg9AlJTeUVGwujTWdtq4VwSg_l2FfpvwITSnPSZKSrDqOOcuLhJeETdCZ92-EZJRzfoomjKSsytNiivrVZ99YZ7pX7OF9gE5BIoWHLdYgwuDA47gEhx1g0_bOfsRO72BrVDC2w1bj280Cb3jSQtjtG7UP1psOsDchWk2H26EJpm9ipQdlwJ-jEy0aDxeHd4Ze7lbPy4dk_XT_uFysE8VpHhKV0koXTOg8LVlGJdGl1CmXsaiYip2cFboAtZWyYBnjVClaVJRXUolci4zN0PU4Nx4d_-VD3RqvoGlEB3bwdUpjRB5ZHqXZKFXOeu9A170zrXD7mpL6G3b9F3Y9wo6-q8OKQbaw_XX90I0CMgrs0P9z5hcbNpRJ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2111151456</pqid></control><display><type>article</type><title>Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species</title><source>Oxford Journals Open Access Collection</source><creator>Wei, Leyi ; Luan, Shasha ; Nagai, Luis Augusto Eijy ; Su, Ran ; Zou, Quan</creator><contributor>Hancock, John</contributor><creatorcontrib>Wei, Leyi ; Luan, Shasha ; Nagai, Luis Augusto Eijy ; Su, Ran ; Zou, Quan ; Hancock, John</creatorcontrib><description>Abstract Motivation As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction–modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites. Results In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites. Availability and implementation The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/bty824</identifier><identifier>PMID: 30239627</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; DNA - genetics ; Genome ; Machine Learning ; Support Vector Machine</subject><ispartof>Bioinformatics, 2019-04, Vol.35 (8), p.1326-1333</ispartof><rights>The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2018</rights><rights>The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c416t-c219f73af628351b0f8bf24b9f7c3cf73637f7ecdbb735341cc179149bca6fa53</citedby><cites>FETCH-LOGICAL-c416t-c219f73af628351b0f8bf24b9f7c3cf73637f7ecdbb735341cc179149bca6fa53</cites><orcidid>0000-0003-1444-190X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1598,27903,27904</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/bty824$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30239627$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Hancock, John</contributor><creatorcontrib>Wei, Leyi</creatorcontrib><creatorcontrib>Luan, Shasha</creatorcontrib><creatorcontrib>Nagai, Luis Augusto Eijy</creatorcontrib><creatorcontrib>Su, Ran</creatorcontrib><creatorcontrib>Zou, Quan</creatorcontrib><title>Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction–modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites. Results In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites. Availability and implementation The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM. Supplementary information Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>DNA - genetics</subject><subject>Genome</subject><subject>Machine Learning</subject><subject>Support Vector Machine</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkE1PwzAMhiMEYmPwE0A5cilLmvTrOI3xIU3jAucqyRwW1DYlSRH79wR1IHHDl1j2-9rOg9AlJTeUVGwujTWdtq4VwSg_l2FfpvwITSnPSZKSrDqOOcuLhJeETdCZ92-EZJRzfoomjKSsytNiivrVZ99YZ7pX7OF9gE5BIoWHLdYgwuDA47gEhx1g0_bOfsRO72BrVDC2w1bj280Cb3jSQtjtG7UP1psOsDchWk2H26EJpm9ipQdlwJ-jEy0aDxeHd4Ze7lbPy4dk_XT_uFysE8VpHhKV0koXTOg8LVlGJdGl1CmXsaiYip2cFboAtZWyYBnjVClaVJRXUolci4zN0PU4Nx4d_-VD3RqvoGlEB3bwdUpjRB5ZHqXZKFXOeu9A170zrXD7mpL6G3b9F3Y9wo6-q8OKQbaw_XX90I0CMgrs0P9z5hcbNpRJ</recordid><startdate>20190415</startdate><enddate>20190415</enddate><creator>Wei, Leyi</creator><creator>Luan, Shasha</creator><creator>Nagai, Luis Augusto Eijy</creator><creator>Su, Ran</creator><creator>Zou, Quan</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1444-190X</orcidid></search><sort><creationdate>20190415</creationdate><title>Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species</title><author>Wei, Leyi ; Luan, Shasha ; Nagai, Luis Augusto Eijy ; Su, Ran ; Zou, Quan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c416t-c219f73af628351b0f8bf24b9f7c3cf73637f7ecdbb735341cc179149bca6fa53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>DNA - genetics</topic><topic>Genome</topic><topic>Machine Learning</topic><topic>Support Vector Machine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wei, Leyi</creatorcontrib><creatorcontrib>Luan, Shasha</creatorcontrib><creatorcontrib>Nagai, Luis Augusto Eijy</creatorcontrib><creatorcontrib>Su, Ran</creatorcontrib><creatorcontrib>Zou, Quan</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Leyi</au><au>Luan, Shasha</au><au>Nagai, Luis Augusto Eijy</au><au>Su, Ran</au><au>Zou, Quan</au><au>Hancock, John</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2019-04-15</date><risdate>2019</risdate><volume>35</volume><issue>8</issue><spage>1326</spage><epage>1333</epage><pages>1326-1333</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction–modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites. Results In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites. Availability and implementation The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>30239627</pmid><doi>10.1093/bioinformatics/bty824</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-1444-190X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1367-4803
ispartof	Bioinformatics, 2019-04, Vol.35 (8), p.1326-1333
issn	1367-4803 1460-2059 1367-4811
language	eng
recordid	cdi_proquest_miscellaneous_2111151456
source	Oxford Journals Open Access Collection
subjects	Algorithms DNA - genetics Genome Machine Learning Support Vector Machine
title	Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T17%3A05%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploring%20sequence-based%20features%20for%20the%20improved%20prediction%20of%20DNA%20N4-methylcytosine%20sites%20in%20multiple%20species&rft.jtitle=Bioinformatics&rft.au=Wei,%20Leyi&rft.date=2019-04-15&rft.volume=35&rft.issue=8&rft.spage=1326&rft.epage=1333&rft.pages=1326-1333&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/bty824&rft_dat=%3Cproquest_TOX%3E2111151456%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2111151456&rft_id=info:pmid/30239627&rft_oup_id=10.1093/bioinformatics/bty824&rfr_iscdi=true