Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm

Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proteome science 2016-12, Vol.14 (1), p.19-19, Article 19
Hauptverfasser:	Stanford, Tyman E, Bagley, Christopher J, Solomon, Patty J
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Mass spectrometry Methodology Proteomics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	19
container_issue	1
container_start_page	19
container_title	Proteome science
container_volume	14
creator	Stanford, Tyman E Bagley, Christopher J Solomon, Patty J
description	Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios ( / ), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed / -axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed / scale. The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.
doi_str_mv	10.1186/s12953-016-0107-8
format	Article
fullrecord	<record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5142289</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A472998398</galeid><sourcerecordid>A472998398</sourcerecordid><originalsourceid>FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</originalsourceid><addsrcrecordid>eNptkt1r1TAYxosobk7_AG8k4M286EzSNB83whh-HBgIflyHNH3bZbbJMUk3z39vyplzR6SEhuT3POR9eKrqJcFnhEj-NhGq2qbGhJeFRS0fVceECVG3quWPH-yPqmcpXWNMqaL8aXVEhZKYcXxc_dj4IcQZetSZBJPzgNLS5WhsdsGjMKBtDBnC7CyaTUoobcHmGGbIcYd6kw0yrl_lO2SQDzcwoTS53vkR3Trfh1tkpjFEl6_m59WTwUwJXtz9T6rvH95_u_hUX37-uLk4v6xtS2Wuh3YA1jWWCsCMMsBcMEE620uM-6E8GwxVvJXAVWcwYCota1olOmlpQzrWnFTv9r7bpSuTWfBlnklvo5tN3OlgnD688e5Kj-FGt4RRKlUxOL0ziOHnAinr2SUL02Q8hCVpIlvKpVSCFPT1P-h1WKIv460UJw1XXP6lRjOBdiXyNeHVVJ8zQZWSjVqps_9Q5euhxB88DK6cHwjeHAgKk-FXHs2Skt58_XLIkj1rY0gpwnCfB8F6bZPet0mXNum1TXrVvHoY5L3iT32a33aYxRA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1856136968</pqid></control><display><type>article</type><title>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</title><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>Springer Nature - Complete Springer Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Stanford, Tyman E ; Bagley, Christopher J ; Solomon, Patty J</creator><creatorcontrib>Stanford, Tyman E ; Bagley, Christopher J ; Solomon, Patty J</creatorcontrib><description>Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios ( / ), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed / -axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed / scale. The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.</description><identifier>ISSN: 1477-5956</identifier><identifier>EISSN: 1477-5956</identifier><identifier>DOI: 10.1186/s12953-016-0107-8</identifier><identifier>PMID: 27980460</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Algorithms ; Analysis ; Mass spectrometry ; Methodology ; Proteomics</subject><ispartof>Proteome science, 2016-12, Vol.14 (1), p.19-19, Article 19</ispartof><rights>COPYRIGHT 2016 BioMed Central Ltd.</rights><rights>Copyright BioMed Central 2016</rights><rights>The Author(s) 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</citedby><cites>FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142289/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142289/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,861,882,27905,27906,53772,53774</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27980460$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Stanford, Tyman E</creatorcontrib><creatorcontrib>Bagley, Christopher J</creatorcontrib><creatorcontrib>Solomon, Patty J</creatorcontrib><title>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</title><title>Proteome science</title><addtitle>Proteome Sci</addtitle><description>Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios ( / ), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed / -axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed / scale. The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Mass spectrometry</subject><subject>Methodology</subject><subject>Proteomics</subject><issn>1477-5956</issn><issn>1477-5956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNptkt1r1TAYxosobk7_AG8k4M286EzSNB83whh-HBgIflyHNH3bZbbJMUk3z39vyplzR6SEhuT3POR9eKrqJcFnhEj-NhGq2qbGhJeFRS0fVceECVG3quWPH-yPqmcpXWNMqaL8aXVEhZKYcXxc_dj4IcQZetSZBJPzgNLS5WhsdsGjMKBtDBnC7CyaTUoobcHmGGbIcYd6kw0yrl_lO2SQDzcwoTS53vkR3Trfh1tkpjFEl6_m59WTwUwJXtz9T6rvH95_u_hUX37-uLk4v6xtS2Wuh3YA1jWWCsCMMsBcMEE620uM-6E8GwxVvJXAVWcwYCota1olOmlpQzrWnFTv9r7bpSuTWfBlnklvo5tN3OlgnD688e5Kj-FGt4RRKlUxOL0ziOHnAinr2SUL02Q8hCVpIlvKpVSCFPT1P-h1WKIv460UJw1XXP6lRjOBdiXyNeHVVJ8zQZWSjVqps_9Q5euhxB88DK6cHwjeHAgKk-FXHs2Skt58_XLIkj1rY0gpwnCfB8F6bZPet0mXNum1TXrVvHoY5L3iT32a33aYxRA</recordid><startdate>20161207</startdate><enddate>20161207</enddate><creator>Stanford, Tyman E</creator><creator>Bagley, Christopher J</creator><creator>Solomon, Patty J</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20161207</creationdate><title>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</title><author>Stanford, Tyman E ; Bagley, Christopher J ; Solomon, Patty J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Mass spectrometry</topic><topic>Methodology</topic><topic>Proteomics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Stanford, Tyman E</creatorcontrib><creatorcontrib>Bagley, Christopher J</creatorcontrib><creatorcontrib>Solomon, Patty J</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Nucleic Acids Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proteome science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Stanford, Tyman E</au><au>Bagley, Christopher J</au><au>Solomon, Patty J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</atitle><jtitle>Proteome science</jtitle><addtitle>Proteome Sci</addtitle><date>2016-12-07</date><risdate>2016</risdate><volume>14</volume><issue>1</issue><spage>19</spage><epage>19</epage><pages>19-19</pages><artnum>19</artnum><issn>1477-5956</issn><eissn>1477-5956</eissn><abstract>Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios ( / ), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed / -axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed / scale. The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>27980460</pmid><doi>10.1186/s12953-016-0107-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1477-5956
ispartof	Proteome science, 2016-12, Vol.14 (1), p.19-19, Article 19
issn	1477-5956 1477-5956
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5142289
source	DOAJ Directory of Open Access Journals; PubMed Central Open Access; Springer Nature OA Free Journals; Springer Nature - Complete Springer Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects	Algorithms Analysis Mass spectrometry Methodology Proteomics
title	Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T06%3A45%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Informed%20baseline%20subtraction%20of%20proteomic%20mass%20spectrometry%20data%20aided%20by%20a%20novel%20sliding%20window%20algorithm&rft.jtitle=Proteome%20science&rft.au=Stanford,%20Tyman%20E&rft.date=2016-12-07&rft.volume=14&rft.issue=1&rft.spage=19&rft.epage=19&rft.pages=19-19&rft.artnum=19&rft.issn=1477-5956&rft.eissn=1477-5956&rft_id=info:doi/10.1186/s12953-016-0107-8&rft_dat=%3Cgale_pubme%3EA472998398%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1856136968&rft_id=info:pmid/27980460&rft_galeid=A472998398&rfr_iscdi=true