Could n-gram analysis contribute to genomic island determination?

There are two approaches to identifying genomic and pathogenesis islands (GI/PAIs) in bacterial genomes: the compositional and the functional, based on DNA or protein level composition and gene function, respectively. We applied n-gram analysis in addition to other compositional features, combined t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2008-12, Vol.41 (6), p.936-943
Hauptverfasser: Mitić, Nenad S., Pavlović-Lažetić, Gordana M., Beljanski, Miloš V.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 943
container_issue 6
container_start_page 936
container_title Journal of biomedical informatics
container_volume 41
creator Mitić, Nenad S.
Pavlović-Lažetić, Gordana M.
Beljanski, Miloš V.
description There are two approaches to identifying genomic and pathogenesis islands (GI/PAIs) in bacterial genomes: the compositional and the functional, based on DNA or protein level composition and gene function, respectively. We applied n-gram analysis in addition to other compositional features, combined them by union and intersection and defined two measures for evaluating the results— recall and precision. Using the best criteria (by training on the Escherichia coli O157:H7 EDL933 genome), we predicted GIs for 14 Enterobacteriaceae family members and for 21 randomly selected bacterial genomes. These predictions were compared with results obtained from HGT DB (based on the compositional approach) and PAI DB (based on the combined approach). The results obtained show that intersecting n-grams with other compositional features improves relative precision by up to 10% in case of HGT DB and up to 60% in case of PAI DB. In addition, it was demonstrated that the union of all compositional features results in maximum recall (up to 37%). Thus, the application of n-gram analysis alongside existing or newly developed methods may improve the prediction of GI/PAIs.
doi_str_mv 10.1016/j.jbi.2008.03.007
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_69795819</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046408000403</els_id><sourcerecordid>19809951</sourcerecordid><originalsourceid>FETCH-LOGICAL-c382t-95c05b51f7b0bb4532028295f19450dc786440e4e55fbc3b11bcd2d94fb5d5e23</originalsourceid><addsrcrecordid>eNqFkD1PwzAURS0EoqXwA1hQJraE58RObTGgquJLqsQCsxXbL5WrJC52gtR_T6pWsMH03nDu1dUh5JpCRoGWd5tso12WA4gMigxgfkKmlBd5CkzA6c9fsgm5iHEDQCnn5TmZUMGYKGQ-JYulHxqbdOk6VG1SdVWziy4mxnd9cHroMel9ssbOt84kLjZVZxOLPYbWdVXvfPdwSc7qqol4dbwz8vH0-L58SVdvz6_LxSo1hcj7VHIDXHNazzVozcZlkItc8ppKxsGauSgZA2TIea1NoSnVxuZWslpzyzEvZuT20LsN_nPA2KvWRYPNOAn9EFUp55ILKv8FqRQgJacjSA-gCT7GgLXaBtdWYacoqL1gtVGjYLUXrKBQo-Axc3MsH3SL9jdxNDoC9wcARxdfDoOKxmFn0LqAplfWuz_qvwFeGIq1</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>19809951</pqid></control><display><type>article</type><title>Could n-gram analysis contribute to genomic island determination?</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Mitić, Nenad S. ; Pavlović-Lažetić, Gordana M. ; Beljanski, Miloš V.</creator><creatorcontrib>Mitić, Nenad S. ; Pavlović-Lažetić, Gordana M. ; Beljanski, Miloš V.</creatorcontrib><description>There are two approaches to identifying genomic and pathogenesis islands (GI/PAIs) in bacterial genomes: the compositional and the functional, based on DNA or protein level composition and gene function, respectively. We applied n-gram analysis in addition to other compositional features, combined them by union and intersection and defined two measures for evaluating the results— recall and precision. Using the best criteria (by training on the Escherichia coli O157:H7 EDL933 genome), we predicted GIs for 14 Enterobacteriaceae family members and for 21 randomly selected bacterial genomes. These predictions were compared with results obtained from HGT DB (based on the compositional approach) and PAI DB (based on the combined approach). The results obtained show that intersecting n-grams with other compositional features improves relative precision by up to 10% in case of HGT DB and up to 60% in case of PAI DB. In addition, it was demonstrated that the union of all compositional features results in maximum recall (up to 37%). Thus, the application of n-gram analysis alongside existing or newly developed methods may improve the prediction of GI/PAIs.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2008.03.007</identifier><identifier>PMID: 18448392</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Bacteria ; Enterobacteriaceae ; Escherichia coli ; Escherichia coli O157 - genetics ; Genome, Bacterial ; Genomic islands ; n-Grams</subject><ispartof>Journal of biomedical informatics, 2008-12, Vol.41 (6), p.936-943</ispartof><rights>2008 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c382t-95c05b51f7b0bb4532028295f19450dc786440e4e55fbc3b11bcd2d94fb5d5e23</citedby><cites>FETCH-LOGICAL-c382t-95c05b51f7b0bb4532028295f19450dc786440e4e55fbc3b11bcd2d94fb5d5e23</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1532046408000403$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65534</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18448392$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mitić, Nenad S.</creatorcontrib><creatorcontrib>Pavlović-Lažetić, Gordana M.</creatorcontrib><creatorcontrib>Beljanski, Miloš V.</creatorcontrib><title>Could n-gram analysis contribute to genomic island determination?</title><title>Journal of biomedical informatics</title><addtitle>J Biomed Inform</addtitle><description>There are two approaches to identifying genomic and pathogenesis islands (GI/PAIs) in bacterial genomes: the compositional and the functional, based on DNA or protein level composition and gene function, respectively. We applied n-gram analysis in addition to other compositional features, combined them by union and intersection and defined two measures for evaluating the results— recall and precision. Using the best criteria (by training on the Escherichia coli O157:H7 EDL933 genome), we predicted GIs for 14 Enterobacteriaceae family members and for 21 randomly selected bacterial genomes. These predictions were compared with results obtained from HGT DB (based on the compositional approach) and PAI DB (based on the combined approach). The results obtained show that intersecting n-grams with other compositional features improves relative precision by up to 10% in case of HGT DB and up to 60% in case of PAI DB. In addition, it was demonstrated that the union of all compositional features results in maximum recall (up to 37%). Thus, the application of n-gram analysis alongside existing or newly developed methods may improve the prediction of GI/PAIs.</description><subject>Bacteria</subject><subject>Enterobacteriaceae</subject><subject>Escherichia coli</subject><subject>Escherichia coli O157 - genetics</subject><subject>Genome, Bacterial</subject><subject>Genomic islands</subject><subject>n-Grams</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkD1PwzAURS0EoqXwA1hQJraE58RObTGgquJLqsQCsxXbL5WrJC52gtR_T6pWsMH03nDu1dUh5JpCRoGWd5tso12WA4gMigxgfkKmlBd5CkzA6c9fsgm5iHEDQCnn5TmZUMGYKGQ-JYulHxqbdOk6VG1SdVWziy4mxnd9cHroMel9ssbOt84kLjZVZxOLPYbWdVXvfPdwSc7qqol4dbwz8vH0-L58SVdvz6_LxSo1hcj7VHIDXHNazzVozcZlkItc8ppKxsGauSgZA2TIea1NoSnVxuZWslpzyzEvZuT20LsN_nPA2KvWRYPNOAn9EFUp55ILKv8FqRQgJacjSA-gCT7GgLXaBtdWYacoqL1gtVGjYLUXrKBQo-Axc3MsH3SL9jdxNDoC9wcARxdfDoOKxmFn0LqAplfWuz_qvwFeGIq1</recordid><startdate>20081201</startdate><enddate>20081201</enddate><creator>Mitić, Nenad S.</creator><creator>Pavlović-Lažetić, Gordana M.</creator><creator>Beljanski, Miloš V.</creator><general>Elsevier Inc</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20081201</creationdate><title>Could n-gram analysis contribute to genomic island determination?</title><author>Mitić, Nenad S. ; Pavlović-Lažetić, Gordana M. ; Beljanski, Miloš V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c382t-95c05b51f7b0bb4532028295f19450dc786440e4e55fbc3b11bcd2d94fb5d5e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Bacteria</topic><topic>Enterobacteriaceae</topic><topic>Escherichia coli</topic><topic>Escherichia coli O157 - genetics</topic><topic>Genome, Bacterial</topic><topic>Genomic islands</topic><topic>n-Grams</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mitić, Nenad S.</creatorcontrib><creatorcontrib>Pavlović-Lažetić, Gordana M.</creatorcontrib><creatorcontrib>Beljanski, Miloš V.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mitić, Nenad S.</au><au>Pavlović-Lažetić, Gordana M.</au><au>Beljanski, Miloš V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Could n-gram analysis contribute to genomic island determination?</atitle><jtitle>Journal of biomedical informatics</jtitle><addtitle>J Biomed Inform</addtitle><date>2008-12-01</date><risdate>2008</risdate><volume>41</volume><issue>6</issue><spage>936</spage><epage>943</epage><pages>936-943</pages><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>There are two approaches to identifying genomic and pathogenesis islands (GI/PAIs) in bacterial genomes: the compositional and the functional, based on DNA or protein level composition and gene function, respectively. We applied n-gram analysis in addition to other compositional features, combined them by union and intersection and defined two measures for evaluating the results— recall and precision. Using the best criteria (by training on the Escherichia coli O157:H7 EDL933 genome), we predicted GIs for 14 Enterobacteriaceae family members and for 21 randomly selected bacterial genomes. These predictions were compared with results obtained from HGT DB (based on the compositional approach) and PAI DB (based on the combined approach). The results obtained show that intersecting n-grams with other compositional features improves relative precision by up to 10% in case of HGT DB and up to 60% in case of PAI DB. In addition, it was demonstrated that the union of all compositional features results in maximum recall (up to 37%). Thus, the application of n-gram analysis alongside existing or newly developed methods may improve the prediction of GI/PAIs.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>18448392</pmid><doi>10.1016/j.jbi.2008.03.007</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1532-0464
ispartof Journal of biomedical informatics, 2008-12, Vol.41 (6), p.936-943
issn 1532-0464
1532-0480
language eng
recordid cdi_proquest_miscellaneous_69795819
source MEDLINE; Elsevier ScienceDirect Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Bacteria
Enterobacteriaceae
Escherichia coli
Escherichia coli O157 - genetics
Genome, Bacterial
Genomic islands
n-Grams
title Could n-gram analysis contribute to genomic island determination?
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-18T22%3A07%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Could%20n-gram%20analysis%20contribute%20to%20genomic%20island%20determination?&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=Miti%C4%87,%20Nenad%20S.&rft.date=2008-12-01&rft.volume=41&rft.issue=6&rft.spage=936&rft.epage=943&rft.pages=936-943&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2008.03.007&rft_dat=%3Cproquest_cross%3E19809951%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=19809951&rft_id=info:pmid/18448392&rft_els_id=S1532046408000403&rfr_iscdi=true