Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure
Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the...
Gespeichert in:
Veröffentlicht in: | Molecular & cellular proteomics 2022-09, Vol.21 (9), p.100269-100269, Article 100269 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 100269 |
---|---|
container_issue | 9 |
container_start_page | 100269 |
container_title | Molecular & cellular proteomics |
container_volume | 21 |
creator | Dressler, Franz F. Brägelmann, Johannes Reischl, Markus Perner, Sven |
description | Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the increasing depth of proteomic analyses often requires the selection of subsets with a high probability of being DE to obtain meaningful results in downstream bioinformatical analyses. Based on the relationship of technical variation and (true) biological DE of an unknown share of proteins, we propose the “Normics” algorithm: Proteins are ranked based on their expression level–corrected variance and the mean correlation with all other proteins. The latter serves as a novel indicator of the non-DE likelihood of a protein in a given dataset. Subsequent normalization is based on a subset of non-DE proteins only. No a priori information such as batch, clinical, or replicate group is necessary. Simulation data demonstrated robust and superior performance across a wide range of stochastically chosen parameters. Five publicly available spike-in and biologically variant datasets were reliably and quantitively accurately normalized by Normics with improved performance compared to standard variance stabilization as well as median, quantile, and LOESS normalizations. In complex biological datasets Normics correctly determined proteins as being DE that had been cross-validated by an independent transcriptome analysis of the same samples. In both complex datasets Normics identified the most DE proteins. We demonstrate that combining variance analysis and data-inherent correlation structure to identify non-DE proteins improves data normalization. Standard normalization algorithms can be consolidated against high shares of (one-sided) biological regulation. The statistical power of downstream analyses can be increased by focusing on Normics-selected subsets of high DE likelihood.
[Display omitted]
•Normics is a tool for the normalization of proteomic data based on existing algorithms.•Specifically addresses data with high shares of differential expression.•Combines variance and data-inherent correlation structure.•Provides a ranking of differential expression likelihood.•Enables normalization based on the most stable proteins.
Normalization of proteomic data is necessary for quantitative comparison and to improve statistical power. Share, extent, and direction of differential e |
doi_str_mv | 10.1016/j.mcpro.2022.100269 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9450154</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1535947622000779</els_id><sourcerecordid>2692070854</sourcerecordid><originalsourceid>FETCH-LOGICAL-c389t-e719d768aa1aed2c743a508ae2f4fd5f34448902e8bdb35e9c5d626c17b4c9b73</originalsourceid><addsrcrecordid>eNp9kU1v1DAQhi0EoqXlFyChHLlkazt2nCCBhJaPVqoKUgtXy7En1KskXsZOpfLr623Kil56mtHMMx96X0LeMLpilNUnm9VotxhWnHKeK5TX7TNyyGQly1Y04vk-V_UBeRXjJiOUKfmSHFSyyS0lD8nVRcDR2_i--IEhQch5sSuZwf81yYep6G6LXwa9mSwUZnLFZ5NMeTZdA8KUinVAhGEhLxPONs0Ix-RFb4YIrx_iEfn59cvV-rQ8__7tbP3pvLRV06YSFGudqhtjmAHHrRKVkbQxwHvRO9lXQoimpRyaznWVhNZKV_PaMtUJ23aqOiIfl73buRvB2fwQmkFv0Y8Gb3UwXj_uTP5a_w43uhWSMinygncPCzD8mSEmPfpoYRjMBGGOOkvKqaLNPVotqMUQI0K_P8Oo3vmhN_reD73zQy9-5Km3_3-4n_lnQAY-LABknW48oI7WQ9baeQSbtAv-yQN3VDmfSQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2692070854</pqid></control><display><type>article</type><title>Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure</title><source>MEDLINE</source><source>Directory of Open Journals (DOAJ)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Dressler, Franz F. ; Brägelmann, Johannes ; Reischl, Markus ; Perner, Sven</creator><creatorcontrib>Dressler, Franz F. ; Brägelmann, Johannes ; Reischl, Markus ; Perner, Sven</creatorcontrib><description>Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the increasing depth of proteomic analyses often requires the selection of subsets with a high probability of being DE to obtain meaningful results in downstream bioinformatical analyses. Based on the relationship of technical variation and (true) biological DE of an unknown share of proteins, we propose the “Normics” algorithm: Proteins are ranked based on their expression level–corrected variance and the mean correlation with all other proteins. The latter serves as a novel indicator of the non-DE likelihood of a protein in a given dataset. Subsequent normalization is based on a subset of non-DE proteins only. No a priori information such as batch, clinical, or replicate group is necessary. Simulation data demonstrated robust and superior performance across a wide range of stochastically chosen parameters. Five publicly available spike-in and biologically variant datasets were reliably and quantitively accurately normalized by Normics with improved performance compared to standard variance stabilization as well as median, quantile, and LOESS normalizations. In complex biological datasets Normics correctly determined proteins as being DE that had been cross-validated by an independent transcriptome analysis of the same samples. In both complex datasets Normics identified the most DE proteins. We demonstrate that combining variance analysis and data-inherent correlation structure to identify non-DE proteins improves data normalization. Standard normalization algorithms can be consolidated against high shares of (one-sided) biological regulation. The statistical power of downstream analyses can be increased by focusing on Normics-selected subsets of high DE likelihood.
[Display omitted]
•Normics is a tool for the normalization of proteomic data based on existing algorithms.•Specifically addresses data with high shares of differential expression.•Combines variance and data-inherent correlation structure.•Provides a ranking of differential expression likelihood.•Enables normalization based on the most stable proteins.
Normalization of proteomic data is necessary for quantitative comparison and to improve statistical power. Share, extent, and direction of differential expression are usually unknown. Normalizing with unbalanced or high shares of differential expression can distort the data. Normics computes a ranking list for the selection of a likely invariant protein subset for normalization. It increases sensitivity, specificity, and quantitative accuracy compared to standard normalization alone. Its reversed ranking list provides a filter for highly variant proteins for downstream bioinformatic analyses.</description><identifier>ISSN: 1535-9476</identifier><identifier>EISSN: 1535-9484</identifier><identifier>DOI: 10.1016/j.mcpro.2022.100269</identifier><identifier>PMID: 35853575</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Algorithms ; Analysis of Variance ; Computer Simulation ; Data normalization ; Differential expression analysis ; Gene Expression Profiling - methods ; Omics data ; Protein quantitation ; Proteins ; Proteomics ; Proteomics - methods</subject><ispartof>Molecular & cellular proteomics, 2022-09, Vol.21 (9), p.100269-100269, Article 100269</ispartof><rights>2022 The Authors</rights><rights>Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.</rights><rights>2022 The Authors 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c389t-e719d768aa1aed2c743a508ae2f4fd5f34448902e8bdb35e9c5d626c17b4c9b73</citedby><cites>FETCH-LOGICAL-c389t-e719d768aa1aed2c743a508ae2f4fd5f34448902e8bdb35e9c5d626c17b4c9b73</cites><orcidid>0000-0002-7780-6374 ; 0000-0003-0873-5805 ; 0000-0002-1306-2169</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9450154/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9450154/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,725,778,782,862,883,27907,27908,53774,53776</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35853575$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Dressler, Franz F.</creatorcontrib><creatorcontrib>Brägelmann, Johannes</creatorcontrib><creatorcontrib>Reischl, Markus</creatorcontrib><creatorcontrib>Perner, Sven</creatorcontrib><title>Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure</title><title>Molecular & cellular proteomics</title><addtitle>Mol Cell Proteomics</addtitle><description>Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the increasing depth of proteomic analyses often requires the selection of subsets with a high probability of being DE to obtain meaningful results in downstream bioinformatical analyses. Based on the relationship of technical variation and (true) biological DE of an unknown share of proteins, we propose the “Normics” algorithm: Proteins are ranked based on their expression level–corrected variance and the mean correlation with all other proteins. The latter serves as a novel indicator of the non-DE likelihood of a protein in a given dataset. Subsequent normalization is based on a subset of non-DE proteins only. No a priori information such as batch, clinical, or replicate group is necessary. Simulation data demonstrated robust and superior performance across a wide range of stochastically chosen parameters. Five publicly available spike-in and biologically variant datasets were reliably and quantitively accurately normalized by Normics with improved performance compared to standard variance stabilization as well as median, quantile, and LOESS normalizations. In complex biological datasets Normics correctly determined proteins as being DE that had been cross-validated by an independent transcriptome analysis of the same samples. In both complex datasets Normics identified the most DE proteins. We demonstrate that combining variance analysis and data-inherent correlation structure to identify non-DE proteins improves data normalization. Standard normalization algorithms can be consolidated against high shares of (one-sided) biological regulation. The statistical power of downstream analyses can be increased by focusing on Normics-selected subsets of high DE likelihood.
[Display omitted]
•Normics is a tool for the normalization of proteomic data based on existing algorithms.•Specifically addresses data with high shares of differential expression.•Combines variance and data-inherent correlation structure.•Provides a ranking of differential expression likelihood.•Enables normalization based on the most stable proteins.
Normalization of proteomic data is necessary for quantitative comparison and to improve statistical power. Share, extent, and direction of differential expression are usually unknown. Normalizing with unbalanced or high shares of differential expression can distort the data. Normics computes a ranking list for the selection of a likely invariant protein subset for normalization. It increases sensitivity, specificity, and quantitative accuracy compared to standard normalization alone. Its reversed ranking list provides a filter for highly variant proteins for downstream bioinformatic analyses.</description><subject>Algorithms</subject><subject>Analysis of Variance</subject><subject>Computer Simulation</subject><subject>Data normalization</subject><subject>Differential expression analysis</subject><subject>Gene Expression Profiling - methods</subject><subject>Omics data</subject><subject>Protein quantitation</subject><subject>Proteins</subject><subject>Proteomics</subject><subject>Proteomics - methods</subject><issn>1535-9476</issn><issn>1535-9484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kU1v1DAQhi0EoqXlFyChHLlkazt2nCCBhJaPVqoKUgtXy7En1KskXsZOpfLr623Kil56mtHMMx96X0LeMLpilNUnm9VotxhWnHKeK5TX7TNyyGQly1Y04vk-V_UBeRXjJiOUKfmSHFSyyS0lD8nVRcDR2_i--IEhQch5sSuZwf81yYep6G6LXwa9mSwUZnLFZ5NMeTZdA8KUinVAhGEhLxPONs0Ix-RFb4YIrx_iEfn59cvV-rQ8__7tbP3pvLRV06YSFGudqhtjmAHHrRKVkbQxwHvRO9lXQoimpRyaznWVhNZKV_PaMtUJ23aqOiIfl73buRvB2fwQmkFv0Y8Gb3UwXj_uTP5a_w43uhWSMinygncPCzD8mSEmPfpoYRjMBGGOOkvKqaLNPVotqMUQI0K_P8Oo3vmhN_reD73zQy9-5Km3_3-4n_lnQAY-LABknW48oI7WQ9baeQSbtAv-yQN3VDmfSQ</recordid><startdate>20220901</startdate><enddate>20220901</enddate><creator>Dressler, Franz F.</creator><creator>Brägelmann, Johannes</creator><creator>Reischl, Markus</creator><creator>Perner, Sven</creator><general>Elsevier Inc</general><general>American Society for Biochemistry and Molecular Biology</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-7780-6374</orcidid><orcidid>https://orcid.org/0000-0003-0873-5805</orcidid><orcidid>https://orcid.org/0000-0002-1306-2169</orcidid></search><sort><creationdate>20220901</creationdate><title>Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure</title><author>Dressler, Franz F. ; Brägelmann, Johannes ; Reischl, Markus ; Perner, Sven</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c389t-e719d768aa1aed2c743a508ae2f4fd5f34448902e8bdb35e9c5d626c17b4c9b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Analysis of Variance</topic><topic>Computer Simulation</topic><topic>Data normalization</topic><topic>Differential expression analysis</topic><topic>Gene Expression Profiling - methods</topic><topic>Omics data</topic><topic>Protein quantitation</topic><topic>Proteins</topic><topic>Proteomics</topic><topic>Proteomics - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dressler, Franz F.</creatorcontrib><creatorcontrib>Brägelmann, Johannes</creatorcontrib><creatorcontrib>Reischl, Markus</creatorcontrib><creatorcontrib>Perner, Sven</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Molecular & cellular proteomics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dressler, Franz F.</au><au>Brägelmann, Johannes</au><au>Reischl, Markus</au><au>Perner, Sven</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure</atitle><jtitle>Molecular & cellular proteomics</jtitle><addtitle>Mol Cell Proteomics</addtitle><date>2022-09-01</date><risdate>2022</risdate><volume>21</volume><issue>9</issue><spage>100269</spage><epage>100269</epage><pages>100269-100269</pages><artnum>100269</artnum><issn>1535-9476</issn><eissn>1535-9484</eissn><abstract>Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the increasing depth of proteomic analyses often requires the selection of subsets with a high probability of being DE to obtain meaningful results in downstream bioinformatical analyses. Based on the relationship of technical variation and (true) biological DE of an unknown share of proteins, we propose the “Normics” algorithm: Proteins are ranked based on their expression level–corrected variance and the mean correlation with all other proteins. The latter serves as a novel indicator of the non-DE likelihood of a protein in a given dataset. Subsequent normalization is based on a subset of non-DE proteins only. No a priori information such as batch, clinical, or replicate group is necessary. Simulation data demonstrated robust and superior performance across a wide range of stochastically chosen parameters. Five publicly available spike-in and biologically variant datasets were reliably and quantitively accurately normalized by Normics with improved performance compared to standard variance stabilization as well as median, quantile, and LOESS normalizations. In complex biological datasets Normics correctly determined proteins as being DE that had been cross-validated by an independent transcriptome analysis of the same samples. In both complex datasets Normics identified the most DE proteins. We demonstrate that combining variance analysis and data-inherent correlation structure to identify non-DE proteins improves data normalization. Standard normalization algorithms can be consolidated against high shares of (one-sided) biological regulation. The statistical power of downstream analyses can be increased by focusing on Normics-selected subsets of high DE likelihood.
[Display omitted]
•Normics is a tool for the normalization of proteomic data based on existing algorithms.•Specifically addresses data with high shares of differential expression.•Combines variance and data-inherent correlation structure.•Provides a ranking of differential expression likelihood.•Enables normalization based on the most stable proteins.
Normalization of proteomic data is necessary for quantitative comparison and to improve statistical power. Share, extent, and direction of differential expression are usually unknown. Normalizing with unbalanced or high shares of differential expression can distort the data. Normics computes a ranking list for the selection of a likely invariant protein subset for normalization. It increases sensitivity, specificity, and quantitative accuracy compared to standard normalization alone. Its reversed ranking list provides a filter for highly variant proteins for downstream bioinformatic analyses.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>35853575</pmid><doi>10.1016/j.mcpro.2022.100269</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-7780-6374</orcidid><orcidid>https://orcid.org/0000-0003-0873-5805</orcidid><orcidid>https://orcid.org/0000-0002-1306-2169</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1535-9476 |
ispartof | Molecular & cellular proteomics, 2022-09, Vol.21 (9), p.100269-100269, Article 100269 |
issn | 1535-9476 1535-9484 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9450154 |
source | MEDLINE; Directory of Open Journals (DOAJ); EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry |
subjects | Algorithms Analysis of Variance Computer Simulation Data normalization Differential expression analysis Gene Expression Profiling - methods Omics data Protein quantitation Proteins Proteomics Proteomics - methods |
title | Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T05%3A09%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Normics:%20Proteomic%20Normalization%20by%20Variance%20and%20Data-Inherent%20Correlation%20Structure&rft.jtitle=Molecular%20&%20cellular%20proteomics&rft.au=Dressler,%20Franz%20F.&rft.date=2022-09-01&rft.volume=21&rft.issue=9&rft.spage=100269&rft.epage=100269&rft.pages=100269-100269&rft.artnum=100269&rft.issn=1535-9476&rft.eissn=1535-9484&rft_id=info:doi/10.1016/j.mcpro.2022.100269&rft_dat=%3Cproquest_pubme%3E2692070854%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2692070854&rft_id=info:pmid/35853575&rft_els_id=S1535947622000779&rfr_iscdi=true |