On expert curation and scalability: UniProtKB/Swiss-Prot as a case study

Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scienti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) England), 2017-11, Vol.33 (21), p.3454-3460
Hauptverfasser: Poux, Sylvain, Arighi, Cecilia N, Magrane, Michele, Bateman, Alex, Wei, Chih-Hsuan, Lu, Zhiyong, Boutet, Emmanuel, Bye-A-Jee, Hema, Famiglietti, Maria Livia, Roechert, Bernd, UniProt Consortium, The
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3460
container_issue 21
container_start_page 3454
container_title Bioinformatics (Oxford, England)
container_volume 33
creator Poux, Sylvain
Arighi, Cecilia N
Magrane, Michele
Bateman, Alex
Wei, Chih-Hsuan
Lu, Zhiyong
Boutet, Emmanuel
Bye-A-Jee, Hema
Famiglietti, Maria Livia
Roechert, Bernd
UniProt Consortium, The
description Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. UniProt is freely available at http://www.uniprot.org/. sylvain.poux@sib.swiss. Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btx439
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5860168</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2253260914</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-826aaccfe8024feb2905ee371e8671472b305bc543200b247b464725e9d776423</originalsourceid><addsrcrecordid>eNqFUclKBDEUDKK4f4KSo5d2Xtbu9iDo4IaCgnoOSSatkZ7OmKTV-XtbRgc9eXpbVVGPQmiPwCGBmo2MD75rQpzq7G0amfzBWb2CNgmTZcErQlaXPbANtJXSCwAIEHIdbdAamKQlbKLL2w67j5mLGds-Dlqhw7qb4GR1q41vfZ4f4cfO38WQr09H9-8-peJrwDphja1ODqfcT-Y7aK3RbXK733UbPZ6fPYwvi5vbi6vxyU1hOee5qKjU2trGVUB548zgRDjHSuIqWRJeUsNAGCs4owCG8tJwOWyFqydlKTll2-h4oTvrzdRNrOty1K2aRT_Vca6C9urvpfPP6im8KVFJILIaBA6-BWJ47V3KauqTdW2rOxf6pCgVjEqoCf8XSmpBCZCKkwEqFlAbQ0rRNUtHBNRXYOpvYGoR2MDb__3OkvWTEPsEy-GW8g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1952101841</pqid></control><display><type>article</type><title>On expert curation and scalability: UniProtKB/Swiss-Prot as a case study</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Poux, Sylvain ; Arighi, Cecilia N ; Magrane, Michele ; Bateman, Alex ; Wei, Chih-Hsuan ; Lu, Zhiyong ; Boutet, Emmanuel ; Bye-A-Jee, Hema ; Famiglietti, Maria Livia ; Roechert, Bernd ; UniProt Consortium, The</creator><contributor>Kelso, Janet</contributor><creatorcontrib>Poux, Sylvain ; Arighi, Cecilia N ; Magrane, Michele ; Bateman, Alex ; Wei, Chih-Hsuan ; Lu, Zhiyong ; Boutet, Emmanuel ; Bye-A-Jee, Hema ; Famiglietti, Maria Livia ; Roechert, Bernd ; UniProt Consortium, The ; Kelso, Janet</creatorcontrib><description>Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. UniProt is freely available at http://www.uniprot.org/. sylvain.poux@sib.swiss. Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>ISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btx439</identifier><identifier>PMID: 29036270</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>amino acid sequences ; bioinformatics ; case studies ; computer software ; Data Curation - statistics &amp; numerical data ; Data Mining ; Databases, Protein - statistics &amp; numerical data ; Humans ; Knowledge Bases ; Original Papers ; proteins ; PubMed - statistics &amp; numerical data ; Review Literature as Topic ; Statistics as Topic</subject><ispartof>Bioinformatics (Oxford, England), 2017-11, Vol.33 (21), p.3454-3460</ispartof><rights>The Author 2017. Published by Oxford University Press.</rights><rights>The Author 2017. Published by Oxford University Press. 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-826aaccfe8024feb2905ee371e8671472b305bc543200b247b464725e9d776423</citedby><cites>FETCH-LOGICAL-c444t-826aaccfe8024feb2905ee371e8671472b305bc543200b247b464725e9d776423</cites><orcidid>0000-0001-7299-6685</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860168/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860168/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27903,27904,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29036270$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Kelso, Janet</contributor><creatorcontrib>Poux, Sylvain</creatorcontrib><creatorcontrib>Arighi, Cecilia N</creatorcontrib><creatorcontrib>Magrane, Michele</creatorcontrib><creatorcontrib>Bateman, Alex</creatorcontrib><creatorcontrib>Wei, Chih-Hsuan</creatorcontrib><creatorcontrib>Lu, Zhiyong</creatorcontrib><creatorcontrib>Boutet, Emmanuel</creatorcontrib><creatorcontrib>Bye-A-Jee, Hema</creatorcontrib><creatorcontrib>Famiglietti, Maria Livia</creatorcontrib><creatorcontrib>Roechert, Bernd</creatorcontrib><creatorcontrib>UniProt Consortium, The</creatorcontrib><title>On expert curation and scalability: UniProtKB/Swiss-Prot as a case study</title><title>Bioinformatics (Oxford, England)</title><addtitle>Bioinformatics</addtitle><description>Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. UniProt is freely available at http://www.uniprot.org/. sylvain.poux@sib.swiss. Supplementary data are available at Bioinformatics online.</description><subject>amino acid sequences</subject><subject>bioinformatics</subject><subject>case studies</subject><subject>computer software</subject><subject>Data Curation - statistics &amp; numerical data</subject><subject>Data Mining</subject><subject>Databases, Protein - statistics &amp; numerical data</subject><subject>Humans</subject><subject>Knowledge Bases</subject><subject>Original Papers</subject><subject>proteins</subject><subject>PubMed - statistics &amp; numerical data</subject><subject>Review Literature as Topic</subject><subject>Statistics as Topic</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFUclKBDEUDKK4f4KSo5d2Xtbu9iDo4IaCgnoOSSatkZ7OmKTV-XtbRgc9eXpbVVGPQmiPwCGBmo2MD75rQpzq7G0amfzBWb2CNgmTZcErQlaXPbANtJXSCwAIEHIdbdAamKQlbKLL2w67j5mLGds-Dlqhw7qb4GR1q41vfZ4f4cfO38WQr09H9-8-peJrwDphja1ODqfcT-Y7aK3RbXK733UbPZ6fPYwvi5vbi6vxyU1hOee5qKjU2trGVUB548zgRDjHSuIqWRJeUsNAGCs4owCG8tJwOWyFqydlKTll2-h4oTvrzdRNrOty1K2aRT_Vca6C9urvpfPP6im8KVFJILIaBA6-BWJ47V3KauqTdW2rOxf6pCgVjEqoCf8XSmpBCZCKkwEqFlAbQ0rRNUtHBNRXYOpvYGoR2MDb__3OkvWTEPsEy-GW8g</recordid><startdate>20171101</startdate><enddate>20171101</enddate><creator>Poux, Sylvain</creator><creator>Arighi, Cecilia N</creator><creator>Magrane, Michele</creator><creator>Bateman, Alex</creator><creator>Wei, Chih-Hsuan</creator><creator>Lu, Zhiyong</creator><creator>Boutet, Emmanuel</creator><creator>Bye-A-Jee, Hema</creator><creator>Famiglietti, Maria Livia</creator><creator>Roechert, Bernd</creator><creator>UniProt Consortium, The</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7S9</scope><scope>L.6</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-7299-6685</orcidid></search><sort><creationdate>20171101</creationdate><title>On expert curation and scalability: UniProtKB/Swiss-Prot as a case study</title><author>Poux, Sylvain ; Arighi, Cecilia N ; Magrane, Michele ; Bateman, Alex ; Wei, Chih-Hsuan ; Lu, Zhiyong ; Boutet, Emmanuel ; Bye-A-Jee, Hema ; Famiglietti, Maria Livia ; Roechert, Bernd ; UniProt Consortium, The</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-826aaccfe8024feb2905ee371e8671472b305bc543200b247b464725e9d776423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>amino acid sequences</topic><topic>bioinformatics</topic><topic>case studies</topic><topic>computer software</topic><topic>Data Curation - statistics &amp; numerical data</topic><topic>Data Mining</topic><topic>Databases, Protein - statistics &amp; numerical data</topic><topic>Humans</topic><topic>Knowledge Bases</topic><topic>Original Papers</topic><topic>proteins</topic><topic>PubMed - statistics &amp; numerical data</topic><topic>Review Literature as Topic</topic><topic>Statistics as Topic</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Poux, Sylvain</creatorcontrib><creatorcontrib>Arighi, Cecilia N</creatorcontrib><creatorcontrib>Magrane, Michele</creatorcontrib><creatorcontrib>Bateman, Alex</creatorcontrib><creatorcontrib>Wei, Chih-Hsuan</creatorcontrib><creatorcontrib>Lu, Zhiyong</creatorcontrib><creatorcontrib>Boutet, Emmanuel</creatorcontrib><creatorcontrib>Bye-A-Jee, Hema</creatorcontrib><creatorcontrib>Famiglietti, Maria Livia</creatorcontrib><creatorcontrib>Roechert, Bernd</creatorcontrib><creatorcontrib>UniProt Consortium, The</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>AGRICOLA</collection><collection>AGRICOLA - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Poux, Sylvain</au><au>Arighi, Cecilia N</au><au>Magrane, Michele</au><au>Bateman, Alex</au><au>Wei, Chih-Hsuan</au><au>Lu, Zhiyong</au><au>Boutet, Emmanuel</au><au>Bye-A-Jee, Hema</au><au>Famiglietti, Maria Livia</au><au>Roechert, Bernd</au><au>UniProt Consortium, The</au><au>Kelso, Janet</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On expert curation and scalability: UniProtKB/Swiss-Prot as a case study</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><addtitle>Bioinformatics</addtitle><date>2017-11-01</date><risdate>2017</risdate><volume>33</volume><issue>21</issue><spage>3454</spage><epage>3460</epage><pages>3454-3460</pages><issn>1367-4803</issn><issn>1460-2059</issn><eissn>1367-4811</eissn><abstract>Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. UniProt is freely available at http://www.uniprot.org/. sylvain.poux@sib.swiss. Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>29036270</pmid><doi>10.1093/bioinformatics/btx439</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0001-7299-6685</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics (Oxford, England), 2017-11, Vol.33 (21), p.3454-3460
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5860168
source MEDLINE; Oxford Journals Open Access Collection; EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection
subjects amino acid sequences
bioinformatics
case studies
computer software
Data Curation - statistics & numerical data
Data Mining
Databases, Protein - statistics & numerical data
Humans
Knowledge Bases
Original Papers
proteins
PubMed - statistics & numerical data
Review Literature as Topic
Statistics as Topic
title On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T06%3A21%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20expert%20curation%20and%20scalability:%20UniProtKB/Swiss-Prot%20as%20a%20case%20study&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Poux,%20Sylvain&rft.date=2017-11-01&rft.volume=33&rft.issue=21&rft.spage=3454&rft.epage=3460&rft.pages=3454-3460&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btx439&rft_dat=%3Cproquest_pubme%3E2253260914%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1952101841&rft_id=info:pmid/29036270&rfr_iscdi=true