Wrangling phosphoproteomic data to elucidate cancer signaling pathways

The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2013-01, Vol.8 (1), p.e52884-e52884
Hauptverfasser: Grimes, Mark L, Lee, Wan-Jui, van der Maaten, Laurens, Shannon, Paul
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e52884
container_issue 1
container_start_page e52884
container_title PloS one
container_volume 8
creator Grimes, Mark L
Lee, Wan-Jui
van der Maaten, Laurens
Shannon, Paul
description The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.
doi_str_mv 10.1371/journal.pone.0052884
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1290099170</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A478439716</galeid><doaj_id>oai_doaj_org_article_f3fb134332e44aa491b6b5466e81f768</doaj_id><sourcerecordid>A478439716</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</originalsourceid><addsrcrecordid>eNqNkl2L1DAUhoso7rr6D0QLgujFjEmTJs2NsCyuDiws-HkZTjOnnSydZkxSdf-96U53mcpeSOlX-rzv6Tl5s-w5JUvKJH135QbfQ7fcuR6XhJRFVfEH2TFVrFiIgrCHB89H2ZMQrhLEKiEeZ0cFY4QopY6z8x8e-razfZvvNi6kc-ddRLe1Jl9DhDy6HLvB2PSCuYHeoM-DbVPlGw3EzW-4Dk-zRw10AZ9N95Ps2_mHr2efFheXH1dnpxcLI1QRFw2KQgiGBEuuCGcoS9KImmFdKlJXsqAASMx6vFRNqUoBCIiojFR1IRt2kr3c--46F_Q0gqBpocZ-qCSJWO2JtYMrvfN2C_5aO7D6ZsH5VoOP1nSoG9bUlHHGCuQcgCtai7rkQmBFGymq5PV-qjbUW1wb7KOHbmY6_9LbjW7dL81KJmTFksGbycC7nwOGqLc2GOw66NEN439LxrhUdKz16h_0_u4mqoXUgO0bl-qa0VSfcllxpiQViVreQ6VjjWlfU14am9ZngrczQWIi_oktDCHo1ZfP_89efp-zrw_YDUIXN8F1Q7SuD3OQ70HjXQgem7shU6LHuN9OQ49x11Pck-zF4QbdiW7zzf4CiGT6sA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1290099170</pqid></control><display><type>article</type><title>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Grimes, Mark L ; Lee, Wan-Jui ; van der Maaten, Laurens ; Shannon, Paul</creator><contributor>Burns, Jorge Sans</contributor><creatorcontrib>Grimes, Mark L ; Lee, Wan-Jui ; van der Maaten, Laurens ; Shannon, Paul ; Burns, Jorge Sans</creatorcontrib><description>The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0052884</identifier><identifier>PMID: 23300999</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Analysis ; Bioinformatics ; Biology ; Cancer ; Care and treatment ; Cellular signal transduction ; Cluster Analysis ; Clustering ; Computational Biology - methods ; Correlation ; Data analysis ; Data Interpretation, Statistical ; Datasets ; Embedding ; Gene Expression Profiling ; Genetic aspects ; Health aspects ; Humans ; Kinases ; Lung cancer ; Lung diseases ; Mass Spectrometry ; Mass spectroscopy ; Medical research ; Medicine ; Missing data ; Multidimensional methods ; Multidimensional scaling ; Neoplasms - metabolism ; Pattern recognition ; Peptides ; Phosphorylation ; Phosphotransferases ; Programming languages ; Protein Interaction Maps ; Protein-tyrosine kinase ; Protein-Tyrosine Kinases - metabolism ; Proteins ; Proteomics - methods ; Scaling ; Scientific imaging ; Signal transduction ; Signal Transduction - physiology ; Signaling ; Software ; Software packages ; Stochastic Processes ; Stochasticity ; Tumors ; Tyrosine</subject><ispartof>PloS one, 2013-01, Vol.8 (1), p.e52884-e52884</ispartof><rights>COPYRIGHT 2013 Public Library of Science</rights><rights>2013 Grimes et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2013 Grimes et al 2013 Grimes et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</citedby><cites>FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536783/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536783/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23300999$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Burns, Jorge Sans</contributor><creatorcontrib>Grimes, Mark L</creatorcontrib><creatorcontrib>Lee, Wan-Jui</creatorcontrib><creatorcontrib>van der Maaten, Laurens</creatorcontrib><creatorcontrib>Shannon, Paul</creatorcontrib><title>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Bioinformatics</subject><subject>Biology</subject><subject>Cancer</subject><subject>Care and treatment</subject><subject>Cellular signal transduction</subject><subject>Cluster Analysis</subject><subject>Clustering</subject><subject>Computational Biology - methods</subject><subject>Correlation</subject><subject>Data analysis</subject><subject>Data Interpretation, Statistical</subject><subject>Datasets</subject><subject>Embedding</subject><subject>Gene Expression Profiling</subject><subject>Genetic aspects</subject><subject>Health aspects</subject><subject>Humans</subject><subject>Kinases</subject><subject>Lung cancer</subject><subject>Lung diseases</subject><subject>Mass Spectrometry</subject><subject>Mass spectroscopy</subject><subject>Medical research</subject><subject>Medicine</subject><subject>Missing data</subject><subject>Multidimensional methods</subject><subject>Multidimensional scaling</subject><subject>Neoplasms - metabolism</subject><subject>Pattern recognition</subject><subject>Peptides</subject><subject>Phosphorylation</subject><subject>Phosphotransferases</subject><subject>Programming languages</subject><subject>Protein Interaction Maps</subject><subject>Protein-tyrosine kinase</subject><subject>Protein-Tyrosine Kinases - metabolism</subject><subject>Proteins</subject><subject>Proteomics - methods</subject><subject>Scaling</subject><subject>Scientific imaging</subject><subject>Signal transduction</subject><subject>Signal Transduction - physiology</subject><subject>Signaling</subject><subject>Software</subject><subject>Software packages</subject><subject>Stochastic Processes</subject><subject>Stochasticity</subject><subject>Tumors</subject><subject>Tyrosine</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkl2L1DAUhoso7rr6D0QLgujFjEmTJs2NsCyuDiws-HkZTjOnnSydZkxSdf-96U53mcpeSOlX-rzv6Tl5s-w5JUvKJH135QbfQ7fcuR6XhJRFVfEH2TFVrFiIgrCHB89H2ZMQrhLEKiEeZ0cFY4QopY6z8x8e-razfZvvNi6kc-ddRLe1Jl9DhDy6HLvB2PSCuYHeoM-DbVPlGw3EzW-4Dk-zRw10AZ9N95Ps2_mHr2efFheXH1dnpxcLI1QRFw2KQgiGBEuuCGcoS9KImmFdKlJXsqAASMx6vFRNqUoBCIiojFR1IRt2kr3c--46F_Q0gqBpocZ-qCSJWO2JtYMrvfN2C_5aO7D6ZsH5VoOP1nSoG9bUlHHGCuQcgCtai7rkQmBFGymq5PV-qjbUW1wb7KOHbmY6_9LbjW7dL81KJmTFksGbycC7nwOGqLc2GOw66NEN439LxrhUdKz16h_0_u4mqoXUgO0bl-qa0VSfcllxpiQViVreQ6VjjWlfU14am9ZngrczQWIi_oktDCHo1ZfP_89efp-zrw_YDUIXN8F1Q7SuD3OQ70HjXQgem7shU6LHuN9OQ49x11Pck-zF4QbdiW7zzf4CiGT6sA</recordid><startdate>20130103</startdate><enddate>20130103</enddate><creator>Grimes, Mark L</creator><creator>Lee, Wan-Jui</creator><creator>van der Maaten, Laurens</creator><creator>Shannon, Paul</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20130103</creationdate><title>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</title><author>Grimes, Mark L ; Lee, Wan-Jui ; van der Maaten, Laurens ; Shannon, Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Bioinformatics</topic><topic>Biology</topic><topic>Cancer</topic><topic>Care and treatment</topic><topic>Cellular signal transduction</topic><topic>Cluster Analysis</topic><topic>Clustering</topic><topic>Computational Biology - methods</topic><topic>Correlation</topic><topic>Data analysis</topic><topic>Data Interpretation, Statistical</topic><topic>Datasets</topic><topic>Embedding</topic><topic>Gene Expression Profiling</topic><topic>Genetic aspects</topic><topic>Health aspects</topic><topic>Humans</topic><topic>Kinases</topic><topic>Lung cancer</topic><topic>Lung diseases</topic><topic>Mass Spectrometry</topic><topic>Mass spectroscopy</topic><topic>Medical research</topic><topic>Medicine</topic><topic>Missing data</topic><topic>Multidimensional methods</topic><topic>Multidimensional scaling</topic><topic>Neoplasms - metabolism</topic><topic>Pattern recognition</topic><topic>Peptides</topic><topic>Phosphorylation</topic><topic>Phosphotransferases</topic><topic>Programming languages</topic><topic>Protein Interaction Maps</topic><topic>Protein-tyrosine kinase</topic><topic>Protein-Tyrosine Kinases - metabolism</topic><topic>Proteins</topic><topic>Proteomics - methods</topic><topic>Scaling</topic><topic>Scientific imaging</topic><topic>Signal transduction</topic><topic>Signal Transduction - physiology</topic><topic>Signaling</topic><topic>Software</topic><topic>Software packages</topic><topic>Stochastic Processes</topic><topic>Stochasticity</topic><topic>Tumors</topic><topic>Tyrosine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Grimes, Mark L</creatorcontrib><creatorcontrib>Lee, Wan-Jui</creatorcontrib><creatorcontrib>van der Maaten, Laurens</creatorcontrib><creatorcontrib>Shannon, Paul</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Grimes, Mark L</au><au>Lee, Wan-Jui</au><au>van der Maaten, Laurens</au><au>Shannon, Paul</au><au>Burns, Jorge Sans</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2013-01-03</date><risdate>2013</risdate><volume>8</volume><issue>1</issue><spage>e52884</spage><epage>e52884</epage><pages>e52884-e52884</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>23300999</pmid><doi>10.1371/journal.pone.0052884</doi><tpages>e52884</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2013-01, Vol.8 (1), p.e52884-e52884
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_1290099170
source MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS); EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects Algorithms
Analysis
Bioinformatics
Biology
Cancer
Care and treatment
Cellular signal transduction
Cluster Analysis
Clustering
Computational Biology - methods
Correlation
Data analysis
Data Interpretation, Statistical
Datasets
Embedding
Gene Expression Profiling
Genetic aspects
Health aspects
Humans
Kinases
Lung cancer
Lung diseases
Mass Spectrometry
Mass spectroscopy
Medical research
Medicine
Missing data
Multidimensional methods
Multidimensional scaling
Neoplasms - metabolism
Pattern recognition
Peptides
Phosphorylation
Phosphotransferases
Programming languages
Protein Interaction Maps
Protein-tyrosine kinase
Protein-Tyrosine Kinases - metabolism
Proteins
Proteomics - methods
Scaling
Scientific imaging
Signal transduction
Signal Transduction - physiology
Signaling
Software
Software packages
Stochastic Processes
Stochasticity
Tumors
Tyrosine
title Wrangling phosphoproteomic data to elucidate cancer signaling pathways
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T03%3A13%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Wrangling%20phosphoproteomic%20data%20to%20elucidate%20cancer%20signaling%20pathways&rft.jtitle=PloS%20one&rft.au=Grimes,%20Mark%20L&rft.date=2013-01-03&rft.volume=8&rft.issue=1&rft.spage=e52884&rft.epage=e52884&rft.pages=e52884-e52884&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0052884&rft_dat=%3Cgale_plos_%3EA478439716%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1290099170&rft_id=info:pmid/23300999&rft_galeid=A478439716&rft_doaj_id=oai_doaj_org_article_f3fb134332e44aa491b6b5466e81f768&rfr_iscdi=true