Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias o...
Gespeichert in:
Veröffentlicht in: | Proteome science 2016-12, Vol.14 (1), p.19-19, Article 19 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 19 |
---|---|
container_issue | 1 |
container_start_page | 19 |
container_title | Proteome science |
container_volume | 14 |
creator | Stanford, Tyman E Bagley, Christopher J Solomon, Patty J |
description | Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios (
/
), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed
/
-axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed
/
scale.
The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets.
The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines. |
doi_str_mv | 10.1186/s12953-016-0107-8 |
format | Article |
fullrecord | <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5142289</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A472998398</galeid><sourcerecordid>A472998398</sourcerecordid><originalsourceid>FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</originalsourceid><addsrcrecordid>eNptkt1r1TAYxosobk7_AG8k4M286EzSNB83whh-HBgIflyHNH3bZbbJMUk3z39vyplzR6SEhuT3POR9eKrqJcFnhEj-NhGq2qbGhJeFRS0fVceECVG3quWPH-yPqmcpXWNMqaL8aXVEhZKYcXxc_dj4IcQZetSZBJPzgNLS5WhsdsGjMKBtDBnC7CyaTUoobcHmGGbIcYd6kw0yrl_lO2SQDzcwoTS53vkR3Trfh1tkpjFEl6_m59WTwUwJXtz9T6rvH95_u_hUX37-uLk4v6xtS2Wuh3YA1jWWCsCMMsBcMEE620uM-6E8GwxVvJXAVWcwYCota1olOmlpQzrWnFTv9r7bpSuTWfBlnklvo5tN3OlgnD688e5Kj-FGt4RRKlUxOL0ziOHnAinr2SUL02Q8hCVpIlvKpVSCFPT1P-h1WKIv460UJw1XXP6lRjOBdiXyNeHVVJ8zQZWSjVqps_9Q5euhxB88DK6cHwjeHAgKk-FXHs2Skt58_XLIkj1rY0gpwnCfB8F6bZPet0mXNum1TXrVvHoY5L3iT32a33aYxRA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1856136968</pqid></control><display><type>article</type><title>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</title><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>Springer Nature - Complete Springer Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Stanford, Tyman E ; Bagley, Christopher J ; Solomon, Patty J</creator><creatorcontrib>Stanford, Tyman E ; Bagley, Christopher J ; Solomon, Patty J</creatorcontrib><description>Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios (
/
), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed
/
-axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed
/
scale.
The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets.
The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.</description><identifier>ISSN: 1477-5956</identifier><identifier>EISSN: 1477-5956</identifier><identifier>DOI: 10.1186/s12953-016-0107-8</identifier><identifier>PMID: 27980460</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Algorithms ; Analysis ; Mass spectrometry ; Methodology ; Proteomics</subject><ispartof>Proteome science, 2016-12, Vol.14 (1), p.19-19, Article 19</ispartof><rights>COPYRIGHT 2016 BioMed Central Ltd.</rights><rights>Copyright BioMed Central 2016</rights><rights>The Author(s) 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</citedby><cites>FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142289/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142289/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,861,882,27905,27906,53772,53774</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27980460$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Stanford, Tyman E</creatorcontrib><creatorcontrib>Bagley, Christopher J</creatorcontrib><creatorcontrib>Solomon, Patty J</creatorcontrib><title>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</title><title>Proteome science</title><addtitle>Proteome Sci</addtitle><description>Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios (
/
), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed
/
-axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed
/
scale.
The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets.
The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Mass spectrometry</subject><subject>Methodology</subject><subject>Proteomics</subject><issn>1477-5956</issn><issn>1477-5956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNptkt1r1TAYxosobk7_AG8k4M286EzSNB83whh-HBgIflyHNH3bZbbJMUk3z39vyplzR6SEhuT3POR9eKrqJcFnhEj-NhGq2qbGhJeFRS0fVceECVG3quWPH-yPqmcpXWNMqaL8aXVEhZKYcXxc_dj4IcQZetSZBJPzgNLS5WhsdsGjMKBtDBnC7CyaTUoobcHmGGbIcYd6kw0yrl_lO2SQDzcwoTS53vkR3Trfh1tkpjFEl6_m59WTwUwJXtz9T6rvH95_u_hUX37-uLk4v6xtS2Wuh3YA1jWWCsCMMsBcMEE620uM-6E8GwxVvJXAVWcwYCota1olOmlpQzrWnFTv9r7bpSuTWfBlnklvo5tN3OlgnD688e5Kj-FGt4RRKlUxOL0ziOHnAinr2SUL02Q8hCVpIlvKpVSCFPT1P-h1WKIv460UJw1XXP6lRjOBdiXyNeHVVJ8zQZWSjVqps_9Q5euhxB88DK6cHwjeHAgKk-FXHs2Skt58_XLIkj1rY0gpwnCfB8F6bZPet0mXNum1TXrVvHoY5L3iT32a33aYxRA</recordid><startdate>20161207</startdate><enddate>20161207</enddate><creator>Stanford, Tyman E</creator><creator>Bagley, Christopher J</creator><creator>Solomon, Patty J</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20161207</creationdate><title>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</title><author>Stanford, Tyman E ; Bagley, Christopher J ; Solomon, Patty J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c528t-f5fe4b3c27e0424e067471bcd800df046ea29658e69ba0e028c43597b8c231b43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Mass spectrometry</topic><topic>Methodology</topic><topic>Proteomics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Stanford, Tyman E</creatorcontrib><creatorcontrib>Bagley, Christopher J</creatorcontrib><creatorcontrib>Solomon, Patty J</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Nucleic Acids Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proteome science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Stanford, Tyman E</au><au>Bagley, Christopher J</au><au>Solomon, Patty J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm</atitle><jtitle>Proteome science</jtitle><addtitle>Proteome Sci</addtitle><date>2016-12-07</date><risdate>2016</risdate><volume>14</volume><issue>1</issue><spage>19</spage><epage>19</epage><pages>19-19</pages><artnum>19</artnum><issn>1477-5956</issn><eissn>1477-5956</eissn><abstract>Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios (
/
), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed
/
-axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed
/
scale.
The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets.
The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>27980460</pmid><doi>10.1186/s12953-016-0107-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1477-5956 |
ispartof | Proteome science, 2016-12, Vol.14 (1), p.19-19, Article 19 |
issn | 1477-5956 1477-5956 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5142289 |
source | DOAJ Directory of Open Access Journals; PubMed Central Open Access; Springer Nature OA Free Journals; Springer Nature - Complete Springer Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Algorithms Analysis Mass spectrometry Methodology Proteomics |
title | Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T06%3A45%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Informed%20baseline%20subtraction%20of%20proteomic%20mass%20spectrometry%20data%20aided%20by%20a%20novel%20sliding%20window%20algorithm&rft.jtitle=Proteome%20science&rft.au=Stanford,%20Tyman%20E&rft.date=2016-12-07&rft.volume=14&rft.issue=1&rft.spage=19&rft.epage=19&rft.pages=19-19&rft.artnum=19&rft.issn=1477-5956&rft.eissn=1477-5956&rft_id=info:doi/10.1186/s12953-016-0107-8&rft_dat=%3Cgale_pubme%3EA472998398%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1856136968&rft_id=info:pmid/27980460&rft_galeid=A472998398&rfr_iscdi=true |