Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds

We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 10...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computer-aided molecular design 2007-05, Vol.21 (5), p.269-280
Hauptverfasser: Cannon, Edward O, Amini, Ata, Bender, Andreas, Sternberg, Michael J E, Muggleton, Stephen H, Glen, Robert C, Mitchell, John B O
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 280
container_issue 5
container_start_page 269
container_title Journal of computer-aided molecular design
container_volume 21
creator Cannon, Edward O
Amini, Ata
Bender, Andreas
Sternberg, Michael J E
Muggleton, Stephen H
Glen, Robert C
Mitchell, John B O
description We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.
doi_str_mv 10.1007/s10822-007-9113-3
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_70408006</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>19654180</sourcerecordid><originalsourceid>FETCH-LOGICAL-c357t-b09b4436a45b355f230aa71b4f329c43c8e35269ce90a1e0672bca2fdb1687213</originalsourceid><addsrcrecordid>eNqFkc1q3DAURkVJ6UwmeYBuisiiO7dXP7asZRKatDDQRRLITsiyPKPBthzJDswb5TErZ6YUsmhWuqBzPnT1IfSZwDcCIL5HAiWlWRozSQjL2Ae0JLlgGZc5OUFLkBSyIuePC3Qa4w4SKAv4hBZEsFJwJpbo5W4aBh9G_GzN6AN2fT2Z0T1b3PqNM3gIfhN017l-g_00DjY0PnQRj1uLez1zV3pvIzatjtE1zgas-_q_MSngVf-rGD0632Pf4Mp5fbDM1nbposXGd4Of-jqeoY-NbqM9P54r9HDz4_76Z7b-ffvr-nKdGZaLMatAVpyzQvO8YnneUAZaC1LxhlFpODOlZTktpLESNLFQCFoZTZu6IkUpKGEr9PWQm578NNk4qs5FY9tW99ZPUQngUAIU74JEpp8nJSTw4g2481Po0xJKMEE5LeUMkQNkgo8x2EYNwXU67BUBNZetDmWreZzLViw5X47BU9XZ-p9xbJf9AXVhqPw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>737242890</pqid></control><display><type>article</type><title>Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds</title><source>MEDLINE</source><source>SpringerLink (Online service)</source><creator>Cannon, Edward O ; Amini, Ata ; Bender, Andreas ; Sternberg, Michael J E ; Muggleton, Stephen H ; Glen, Robert C ; Mitchell, John B O</creator><creatorcontrib>Cannon, Edward O ; Amini, Ata ; Bender, Andreas ; Sternberg, Michael J E ; Muggleton, Stephen H ; Glen, Robert C ; Mitchell, John B O</creatorcontrib><description>We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p &lt; 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.</description><identifier>ISSN: 0920-654X</identifier><identifier>EISSN: 1573-4951</identifier><identifier>DOI: 10.1007/s10822-007-9113-3</identifier><identifier>PMID: 17387437</identifier><language>eng</language><publisher>Netherlands: Springer Nature B.V</publisher><subject>Bayes Theorem ; Chemical compounds ; Classification ; Classifiers ; Computational Biology ; Confidence Intervals ; Correlation coefficient ; Correlation coefficients ; Drug Design ; Logic programming ; Pharmaceutical Preparations - chemical synthesis ; Pharmaceutical Preparations - classification ; Pharmaceutical Preparations - metabolism ; Recall ; Software ; Support vector machines</subject><ispartof>Journal of computer-aided molecular design, 2007-05, Vol.21 (5), p.269-280</ispartof><rights>Springer Science+Business Media, LLC 2007.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c357t-b09b4436a45b355f230aa71b4f329c43c8e35269ce90a1e0672bca2fdb1687213</citedby><cites>FETCH-LOGICAL-c357t-b09b4436a45b355f230aa71b4f329c43c8e35269ce90a1e0672bca2fdb1687213</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/17387437$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cannon, Edward O</creatorcontrib><creatorcontrib>Amini, Ata</creatorcontrib><creatorcontrib>Bender, Andreas</creatorcontrib><creatorcontrib>Sternberg, Michael J E</creatorcontrib><creatorcontrib>Muggleton, Stephen H</creatorcontrib><creatorcontrib>Glen, Robert C</creatorcontrib><creatorcontrib>Mitchell, John B O</creatorcontrib><title>Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds</title><title>Journal of computer-aided molecular design</title><addtitle>J Comput Aided Mol Des</addtitle><description>We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p &lt; 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.</description><subject>Bayes Theorem</subject><subject>Chemical compounds</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Computational Biology</subject><subject>Confidence Intervals</subject><subject>Correlation coefficient</subject><subject>Correlation coefficients</subject><subject>Drug Design</subject><subject>Logic programming</subject><subject>Pharmaceutical Preparations - chemical synthesis</subject><subject>Pharmaceutical Preparations - classification</subject><subject>Pharmaceutical Preparations - metabolism</subject><subject>Recall</subject><subject>Software</subject><subject>Support vector machines</subject><issn>0920-654X</issn><issn>1573-4951</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><recordid>eNqFkc1q3DAURkVJ6UwmeYBuisiiO7dXP7asZRKatDDQRRLITsiyPKPBthzJDswb5TErZ6YUsmhWuqBzPnT1IfSZwDcCIL5HAiWlWRozSQjL2Ae0JLlgGZc5OUFLkBSyIuePC3Qa4w4SKAv4hBZEsFJwJpbo5W4aBh9G_GzN6AN2fT2Z0T1b3PqNM3gIfhN017l-g_00DjY0PnQRj1uLez1zV3pvIzatjtE1zgas-_q_MSngVf-rGD0632Pf4Mp5fbDM1nbposXGd4Of-jqeoY-NbqM9P54r9HDz4_76Z7b-ffvr-nKdGZaLMatAVpyzQvO8YnneUAZaC1LxhlFpODOlZTktpLESNLFQCFoZTZu6IkUpKGEr9PWQm578NNk4qs5FY9tW99ZPUQngUAIU74JEpp8nJSTw4g2481Po0xJKMEE5LeUMkQNkgo8x2EYNwXU67BUBNZetDmWreZzLViw5X47BU9XZ-p9xbJf9AXVhqPw</recordid><startdate>20070501</startdate><enddate>20070501</enddate><creator>Cannon, Edward O</creator><creator>Amini, Ata</creator><creator>Bender, Andreas</creator><creator>Sternberg, Michael J E</creator><creator>Muggleton, Stephen H</creator><creator>Glen, Robert C</creator><creator>Mitchell, John B O</creator><general>Springer Nature B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>BKSAR</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>KB.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PCBAR</scope><scope>PDBOC</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7QO</scope><scope>FR3</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>20070501</creationdate><title>Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds</title><author>Cannon, Edward O ; Amini, Ata ; Bender, Andreas ; Sternberg, Michael J E ; Muggleton, Stephen H ; Glen, Robert C ; Mitchell, John B O</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c357t-b09b4436a45b355f230aa71b4f329c43c8e35269ce90a1e0672bca2fdb1687213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Bayes Theorem</topic><topic>Chemical compounds</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Computational Biology</topic><topic>Confidence Intervals</topic><topic>Correlation coefficient</topic><topic>Correlation coefficients</topic><topic>Drug Design</topic><topic>Logic programming</topic><topic>Pharmaceutical Preparations - chemical synthesis</topic><topic>Pharmaceutical Preparations - classification</topic><topic>Pharmaceutical Preparations - metabolism</topic><topic>Recall</topic><topic>Software</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cannon, Edward O</creatorcontrib><creatorcontrib>Amini, Ata</creatorcontrib><creatorcontrib>Bender, Andreas</creatorcontrib><creatorcontrib>Sternberg, Michael J E</creatorcontrib><creatorcontrib>Muggleton, Stephen H</creatorcontrib><creatorcontrib>Glen, Robert C</creatorcontrib><creatorcontrib>Mitchell, John B O</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health &amp; Medical Collection (Proquest)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Earth, Atmospheric &amp; Aquatic Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>https://resources.nclive.org/materials</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Science Database (ProQuest)</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Earth, Atmospheric &amp; Aquatic Science Database</collection><collection>Materials Science Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Biotechnology Research Abstracts</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of computer-aided molecular design</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cannon, Edward O</au><au>Amini, Ata</au><au>Bender, Andreas</au><au>Sternberg, Michael J E</au><au>Muggleton, Stephen H</au><au>Glen, Robert C</au><au>Mitchell, John B O</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds</atitle><jtitle>Journal of computer-aided molecular design</jtitle><addtitle>J Comput Aided Mol Des</addtitle><date>2007-05-01</date><risdate>2007</risdate><volume>21</volume><issue>5</issue><spage>269</spage><epage>280</epage><pages>269-280</pages><issn>0920-654X</issn><eissn>1573-4951</eissn><abstract>We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p &lt; 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.</abstract><cop>Netherlands</cop><pub>Springer Nature B.V</pub><pmid>17387437</pmid><doi>10.1007/s10822-007-9113-3</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0920-654X
ispartof Journal of computer-aided molecular design, 2007-05, Vol.21 (5), p.269-280
issn 0920-654X
1573-4951
language eng
recordid cdi_proquest_miscellaneous_70408006
source MEDLINE; SpringerLink (Online service)
subjects Bayes Theorem
Chemical compounds
Classification
Classifiers
Computational Biology
Confidence Intervals
Correlation coefficient
Correlation coefficients
Drug Design
Logic programming
Pharmaceutical Preparations - chemical synthesis
Pharmaceutical Preparations - classification
Pharmaceutical Preparations - metabolism
Recall
Software
Support vector machines
title Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T04%3A45%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Support%20vector%20inductive%20logic%20programming%20outperforms%20the%20naive%20Bayes%20classifier%20and%20inductive%20logic%20programming%20for%20the%20classification%20of%20bioactive%20chemical%20compounds&rft.jtitle=Journal%20of%20computer-aided%20molecular%20design&rft.au=Cannon,%20Edward%20O&rft.date=2007-05-01&rft.volume=21&rft.issue=5&rft.spage=269&rft.epage=280&rft.pages=269-280&rft.issn=0920-654X&rft.eissn=1573-4951&rft_id=info:doi/10.1007/s10822-007-9113-3&rft_dat=%3Cproquest_cross%3E19654180%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=737242890&rft_id=info:pmid/17387437&rfr_iscdi=true