Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm

Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proteome science 2016-12, Vol.14 (1), p.19-19, Article 19
Hauptverfasser: Stanford, Tyman E, Bagley, Christopher J, Solomon, Patty J
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 19
container_issue 1
container_start_page 19
container_title Proteome science
container_volume 14
creator Stanford, Tyman E
Bagley, Christopher J
Solomon, Patty J
description Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios ( / ), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed / -axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed / scale. The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.
doi_str_mv 10.1186/s12953-016-0107-8
format Article
fullrecord <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5142289</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A472998398</galeid><sourcerecordid>A472998398</sourcerecordid><originalsourceid>FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</originalsourceid><addsrcrecordid>eNptkt1r1TAYxosobk7_AG8k4M286EzSNB83whh-HBgIflyHNH3bZbbJMUk3z39vyplzR6SEhuT3POR9eKrqJcFnhEj-NhGq2qbGhJeFRS0fVceECVG3quWPH-yPqmcpXWNMqaL8aXVEhZKYcXxc_dj4IcQZetSZBJPzgNLS5WhsdsGjMKBtDBnC7CyaTUoobcHmGGbIcYd6kw0yrl_lO2SQDzcwoTS53vkR3Trfh1tkpjFEl6_m59WTwUwJXtz9T6rvH95_u_hUX37-uLk4v6xtS2Wuh3YA1jWWCsCMMsBcMEE620uM-6E8GwxVvJXAVWcwYCota1olOmlpQzrWnFTv9r7bpSuTWfBlnklvo5tN3OlgnD688e5Kj-FGt4RRKlUxOL0ziOHnAinr2SUL02Q8hCVpIlvKpVSCFPT1P-h1WKIv460UJw1XXP6lRjOBdiXyNeHVVJ8zQZWSjVqps_9Q5euhxB88DK6cHwjeHAgKk-FXHs2Skt58_XLIkj1rY0gpwnCfB8F6bZPet0mXNum1TXrVvHoY5L3iT32a33aYxRA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1856136968</pqid></control><display><type>article</type><title>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</title><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>Springer Nature - Complete Springer Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Stanford, Tyman E ; Bagley, Christopher J ; Solomon, Patty J</creator><creatorcontrib>Stanford, Tyman E ; Bagley, Christopher J ; Solomon, Patty J</creatorcontrib><description>Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios ( / ), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed / -axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed / scale. The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.</description><identifier>ISSN: 1477-5956</identifier><identifier>EISSN: 1477-5956</identifier><identifier>DOI: 10.1186/s12953-016-0107-8</identifier><identifier>PMID: 27980460</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Algorithms ; Analysis ; Mass spectrometry ; Methodology ; Proteomics</subject><ispartof>Proteome science, 2016-12, Vol.14 (1), p.19-19, Article 19</ispartof><rights>COPYRIGHT 2016 BioMed Central Ltd.</rights><rights>Copyright BioMed Central 2016</rights><rights>The Author(s) 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</citedby><cites>FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142289/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142289/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,861,882,27905,27906,53772,53774</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27980460$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Stanford, Tyman E</creatorcontrib><creatorcontrib>Bagley, Christopher J</creatorcontrib><creatorcontrib>Solomon, Patty J</creatorcontrib><title>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</title><title>Proteome science</title><addtitle>Proteome Sci</addtitle><description>Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios ( / ), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed / -axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed / scale. The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Mass spectrometry</subject><subject>Methodology</subject><subject>Proteomics</subject><issn>1477-5956</issn><issn>1477-5956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNptkt1r1TAYxosobk7_AG8k4M286EzSNB83whh-HBgIflyHNH3bZbbJMUk3z39vyplzR6SEhuT3POR9eKrqJcFnhEj-NhGq2qbGhJeFRS0fVceECVG3quWPH-yPqmcpXWNMqaL8aXVEhZKYcXxc_dj4IcQZetSZBJPzgNLS5WhsdsGjMKBtDBnC7CyaTUoobcHmGGbIcYd6kw0yrl_lO2SQDzcwoTS53vkR3Trfh1tkpjFEl6_m59WTwUwJXtz9T6rvH95_u_hUX37-uLk4v6xtS2Wuh3YA1jWWCsCMMsBcMEE620uM-6E8GwxVvJXAVWcwYCota1olOmlpQzrWnFTv9r7bpSuTWfBlnklvo5tN3OlgnD688e5Kj-FGt4RRKlUxOL0ziOHnAinr2SUL02Q8hCVpIlvKpVSCFPT1P-h1WKIv460UJw1XXP6lRjOBdiXyNeHVVJ8zQZWSjVqps_9Q5euhxB88DK6cHwjeHAgKk-FXHs2Skt58_XLIkj1rY0gpwnCfB8F6bZPet0mXNum1TXrVvHoY5L3iT32a33aYxRA</recordid><startdate>20161207</startdate><enddate>20161207</enddate><creator>Stanford, Tyman E</creator><creator>Bagley, Christopher J</creator><creator>Solomon, Patty J</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20161207</creationdate><title>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</title><author>Stanford, Tyman E ; Bagley, Christopher J ; Solomon, Patty J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Mass spectrometry</topic><topic>Methodology</topic><topic>Proteomics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Stanford, Tyman E</creatorcontrib><creatorcontrib>Bagley, Christopher J</creatorcontrib><creatorcontrib>Solomon, Patty J</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proteome science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Stanford, Tyman E</au><au>Bagley, Christopher J</au><au>Solomon, Patty J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</atitle><jtitle>Proteome science</jtitle><addtitle>Proteome Sci</addtitle><date>2016-12-07</date><risdate>2016</risdate><volume>14</volume><issue>1</issue><spage>19</spage><epage>19</epage><pages>19-19</pages><artnum>19</artnum><issn>1477-5956</issn><eissn>1477-5956</eissn><abstract>Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios ( / ), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed / -axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed / scale. The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>27980460</pmid><doi>10.1186/s12953-016-0107-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1477-5956
ispartof Proteome science, 2016-12, Vol.14 (1), p.19-19, Article 19
issn 1477-5956
1477-5956
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5142289
source DOAJ Directory of Open Access Journals; PubMed Central Open Access; Springer Nature OA Free Journals; Springer Nature - Complete Springer Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects Algorithms
Analysis
Mass spectrometry
Methodology
Proteomics
title Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T06%3A45%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Informed%20baseline%20subtraction%20of%20proteomic%20mass%20spectrometry%20data%20aided%20by%20a%20novel%20sliding%20window%20algorithm&rft.jtitle=Proteome%20science&rft.au=Stanford,%20Tyman%20E&rft.date=2016-12-07&rft.volume=14&rft.issue=1&rft.spage=19&rft.epage=19&rft.pages=19-19&rft.artnum=19&rft.issn=1477-5956&rft.eissn=1477-5956&rft_id=info:doi/10.1186/s12953-016-0107-8&rft_dat=%3Cgale_pubme%3EA472998398%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1856136968&rft_id=info:pmid/27980460&rft_galeid=A472998398&rfr_iscdi=true