Automating literature screening and curation with applications to computational neuroscience

Abstract Objective ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the American Medical Informatics Association : JAMIA 2024-06, Vol.31 (7), p.1463-1470
Hauptverfasser:	Ji, Ziqing, Guo, Siyan, Qiao, Yujie, McDougal, Robert A
Format:	Artikel
Sprache:	eng
Schlagworte:	Computational Biology - methods Data Curation - methods Data Mining - methods Databases, Factual Humans Metadata Models, Neurological Neurosciences
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1470
container_issue	7
container_start_page	1463
container_title	Journal of the American Medical Informatics Association : JAMIA
container_volume	31
creator	Ji, Ziqing Guo, Siyan Qiao, Yujie McDougal, Robert A
description	Abstract Objective ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics). Materials and Methods Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata. Results SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations. Discussion Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc. Conclusion Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.
doi_str_mv	10.1093/jamia/ocae097
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3053980018</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/jamia/ocae097</oup_id><sourcerecordid>3053980018</sourcerecordid><originalsourceid>FETCH-LOGICAL-c212t-c3f3d47847bd7945ff4d6b2eb017bebe016835a491e68161ffa5efe7d17f83223</originalsourceid><addsrcrecordid>eNqFUE1LxDAUDKK4unr0Kjl6qeajbdrjsvgFghcFD0JJ0xfN0jY1H4j_3uyHevT03huGeTOD0Bkll5TU_GolByOvrJJAarGHjmjBRFaL_GU_7aQUWUGYmKFj71eE0JLx4hDNeCUYY5wfoddFDHaQwYxvuDcBnAzRAfbKAYxrUI4dVjHBxo7404R3LKepN2oDeBwsVnaYYtjcsscjRGe9MjAqOEEHWvYeTndzjp5vrp-Wd9nD4-39cvGQKUZZyBTXvMtFlYu2E3VeaJ13ZcugJVS00EKyXfFC5jWFsqIl1VoWoEF0VOiKpxxzdLHVnZz9iOBDMxivoO_lCDb6hpOC11WKXyVqtqWq5NI70M3kzCDdV0NJsy602RTa7ApN_POddGwH6H7ZPw3-_bZx-kfrG_pSg2o</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3053980018</pqid></control><display><type>article</type><title>Automating literature screening and curation with applications to computational neuroscience</title><source>MEDLINE</source><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>Ji, Ziqing ; Guo, Siyan ; Qiao, Yujie ; McDougal, Robert A</creator><creatorcontrib>Ji, Ziqing ; Guo, Siyan ; Qiao, Yujie ; McDougal, Robert A</creatorcontrib><description>Abstract Objective ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics). Materials and Methods Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata. Results SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations. Discussion Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc. Conclusion Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.</description><identifier>ISSN: 1067-5027</identifier><identifier>ISSN: 1527-974X</identifier><identifier>EISSN: 1527-974X</identifier><identifier>DOI: 10.1093/jamia/ocae097</identifier><identifier>PMID: 38722233</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Computational Biology - methods ; Data Curation - methods ; Data Mining - methods ; Databases, Factual ; Humans ; Metadata ; Models, Neurological ; Neurosciences</subject><ispartof>Journal of the American Medical Informatics Association : JAMIA, 2024-06, Vol.31 (7), p.1463-1470</ispartof><rights>The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com 2024</rights><rights>The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c212t-c3f3d47847bd7945ff4d6b2eb017bebe016835a491e68161ffa5efe7d17f83223</cites><orcidid>0000-0001-6394-3127</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1584,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38722233$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ji, Ziqing</creatorcontrib><creatorcontrib>Guo, Siyan</creatorcontrib><creatorcontrib>Qiao, Yujie</creatorcontrib><creatorcontrib>McDougal, Robert A</creatorcontrib><title>Automating literature screening and curation with applications to computational neuroscience</title><title>Journal of the American Medical Informatics Association : JAMIA</title><addtitle>J Am Med Inform Assoc</addtitle><description>Abstract Objective ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics). Materials and Methods Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata. Results SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations. Discussion Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc. Conclusion Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.</description><subject>Computational Biology - methods</subject><subject>Data Curation - methods</subject><subject>Data Mining - methods</subject><subject>Databases, Factual</subject><subject>Humans</subject><subject>Metadata</subject><subject>Models, Neurological</subject><subject>Neurosciences</subject><issn>1067-5027</issn><issn>1527-974X</issn><issn>1527-974X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFUE1LxDAUDKK4unr0Kjl6qeajbdrjsvgFghcFD0JJ0xfN0jY1H4j_3uyHevT03huGeTOD0Bkll5TU_GolByOvrJJAarGHjmjBRFaL_GU_7aQUWUGYmKFj71eE0JLx4hDNeCUYY5wfoddFDHaQwYxvuDcBnAzRAfbKAYxrUI4dVjHBxo7404R3LKepN2oDeBwsVnaYYtjcsscjRGe9MjAqOEEHWvYeTndzjp5vrp-Wd9nD4-39cvGQKUZZyBTXvMtFlYu2E3VeaJ13ZcugJVS00EKyXfFC5jWFsqIl1VoWoEF0VOiKpxxzdLHVnZz9iOBDMxivoO_lCDb6hpOC11WKXyVqtqWq5NI70M3kzCDdV0NJsy602RTa7ApN_POddGwH6H7ZPw3-_bZx-kfrG_pSg2o</recordid><startdate>20240620</startdate><enddate>20240620</enddate><creator>Ji, Ziqing</creator><creator>Guo, Siyan</creator><creator>Qiao, Yujie</creator><creator>McDougal, Robert A</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6394-3127</orcidid></search><sort><creationdate>20240620</creationdate><title>Automating literature screening and curation with applications to computational neuroscience</title><author>Ji, Ziqing ; Guo, Siyan ; Qiao, Yujie ; McDougal, Robert A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c212t-c3f3d47847bd7945ff4d6b2eb017bebe016835a491e68161ffa5efe7d17f83223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational Biology - methods</topic><topic>Data Curation - methods</topic><topic>Data Mining - methods</topic><topic>Databases, Factual</topic><topic>Humans</topic><topic>Metadata</topic><topic>Models, Neurological</topic><topic>Neurosciences</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ji, Ziqing</creatorcontrib><creatorcontrib>Guo, Siyan</creatorcontrib><creatorcontrib>Qiao, Yujie</creatorcontrib><creatorcontrib>McDougal, Robert A</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ji, Ziqing</au><au>Guo, Siyan</au><au>Qiao, Yujie</au><au>McDougal, Robert A</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automating literature screening and curation with applications to computational neuroscience</atitle><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle><addtitle>J Am Med Inform Assoc</addtitle><date>2024-06-20</date><risdate>2024</risdate><volume>31</volume><issue>7</issue><spage>1463</spage><epage>1470</epage><pages>1463-1470</pages><issn>1067-5027</issn><issn>1527-974X</issn><eissn>1527-974X</eissn><abstract>Abstract Objective ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics). Materials and Methods Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata. Results SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations. Discussion Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc. Conclusion Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>38722233</pmid><doi>10.1093/jamia/ocae097</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0001-6394-3127</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1067-5027
ispartof	Journal of the American Medical Informatics Association : JAMIA, 2024-06, Vol.31 (7), p.1463-1470
issn	1067-5027 1527-974X 1527-974X
language	eng
recordid	cdi_proquest_miscellaneous_3053980018
source	MEDLINE; Oxford University Press Journals All Titles (1996-Current)
subjects	Computational Biology - methods Data Curation - methods Data Mining - methods Databases, Factual Humans Metadata Models, Neurological Neurosciences
title	Automating literature screening and curation with applications to computational neuroscience
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T16%3A40%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automating%20literature%20screening%20and%20curation%20with%20applications%20to%20computational%20neuroscience&rft.jtitle=Journal%20of%20the%20American%20Medical%20Informatics%20Association%20:%20JAMIA&rft.au=Ji,%20Ziqing&rft.date=2024-06-20&rft.volume=31&rft.issue=7&rft.spage=1463&rft.epage=1470&rft.pages=1463-1470&rft.issn=1067-5027&rft.eissn=1527-974X&rft_id=info:doi/10.1093/jamia/ocae097&rft_dat=%3Cproquest_cross%3E3053980018%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3053980018&rft_id=info:pmid/38722233&rft_oup_id=10.1093/jamia/ocae097&rfr_iscdi=true