Wrangling phosphoproteomic data to elucidate cancer signaling pathways

The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2013-01, Vol.8 (1), p.e52884-e52884
Hauptverfasser:	Grimes, Mark L, Lee, Wan-Jui, van der Maaten, Laurens, Shannon, Paul
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Bioinformatics Biology Cancer Care and treatment Cellular signal transduction Cluster Analysis Clustering Computational Biology - methods Correlation Data analysis Data Interpretation, Statistical Datasets Embedding Gene Expression Profiling Genetic aspects Health aspects Humans Kinases Lung cancer Lung diseases Mass Spectrometry Mass spectroscopy Medical research Medicine Missing data Multidimensional methods Multidimensional scaling Neoplasms - metabolism Pattern recognition Peptides Phosphorylation Phosphotransferases Programming languages Protein Interaction Maps Protein-tyrosine kinase Protein-Tyrosine Kinases - metabolism Proteins Proteomics - methods Scaling Scientific imaging Signal transduction Signal Transduction - physiology Signaling Software Software packages Stochastic Processes Stochasticity Tumors Tyrosine
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	e52884
container_issue	1
container_start_page	e52884
container_title	PloS one
container_volume	8
creator	Grimes, Mark L Lee, Wan-Jui van der Maaten, Laurens Shannon, Paul
description	The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.
doi_str_mv	10.1371/journal.pone.0052884
format	Article
fullrecord	<record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1290099170</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A478439716</galeid><doaj_id>oai_doaj_org_article_f3fb134332e44aa491b6b5466e81f768</doaj_id><sourcerecordid>A478439716</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</originalsourceid><addsrcrecordid>eNqNkl2L1DAUhoso7rr6D0QLgujFjEmTJs2NsCyuDiws-HkZTjOnnSydZkxSdf-96U53mcpeSOlX-rzv6Tl5s-w5JUvKJH135QbfQ7fcuR6XhJRFVfEH2TFVrFiIgrCHB89H2ZMQrhLEKiEeZ0cFY4QopY6z8x8e-razfZvvNi6kc-ddRLe1Jl9DhDy6HLvB2PSCuYHeoM-DbVPlGw3EzW-4Dk-zRw10AZ9N95Ps2_mHr2efFheXH1dnpxcLI1QRFw2KQgiGBEuuCGcoS9KImmFdKlJXsqAASMx6vFRNqUoBCIiojFR1IRt2kr3c--46F_Q0gqBpocZ-qCSJWO2JtYMrvfN2C_5aO7D6ZsH5VoOP1nSoG9bUlHHGCuQcgCtai7rkQmBFGymq5PV-qjbUW1wb7KOHbmY6_9LbjW7dL81KJmTFksGbycC7nwOGqLc2GOw66NEN439LxrhUdKz16h_0_u4mqoXUgO0bl-qa0VSfcllxpiQViVreQ6VjjWlfU14am9ZngrczQWIi_oktDCHo1ZfP_89efp-zrw_YDUIXN8F1Q7SuD3OQ70HjXQgem7shU6LHuN9OQ49x11Pck-zF4QbdiW7zzf4CiGT6sA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1290099170</pqid></control><display><type>article</type><title>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Grimes, Mark L ; Lee, Wan-Jui ; van der Maaten, Laurens ; Shannon, Paul</creator><contributor>Burns, Jorge Sans</contributor><creatorcontrib>Grimes, Mark L ; Lee, Wan-Jui ; van der Maaten, Laurens ; Shannon, Paul ; Burns, Jorge Sans</creatorcontrib><description>The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0052884</identifier><identifier>PMID: 23300999</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Analysis ; Bioinformatics ; Biology ; Cancer ; Care and treatment ; Cellular signal transduction ; Cluster Analysis ; Clustering ; Computational Biology - methods ; Correlation ; Data analysis ; Data Interpretation, Statistical ; Datasets ; Embedding ; Gene Expression Profiling ; Genetic aspects ; Health aspects ; Humans ; Kinases ; Lung cancer ; Lung diseases ; Mass Spectrometry ; Mass spectroscopy ; Medical research ; Medicine ; Missing data ; Multidimensional methods ; Multidimensional scaling ; Neoplasms - metabolism ; Pattern recognition ; Peptides ; Phosphorylation ; Phosphotransferases ; Programming languages ; Protein Interaction Maps ; Protein-tyrosine kinase ; Protein-Tyrosine Kinases - metabolism ; Proteins ; Proteomics - methods ; Scaling ; Scientific imaging ; Signal transduction ; Signal Transduction - physiology ; Signaling ; Software ; Software packages ; Stochastic Processes ; Stochasticity ; Tumors ; Tyrosine</subject><ispartof>PloS one, 2013-01, Vol.8 (1), p.e52884-e52884</ispartof><rights>COPYRIGHT 2013 Public Library of Science</rights><rights>2013 Grimes et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2013 Grimes et al 2013 Grimes et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</citedby><cites>FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536783/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536783/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23300999$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Burns, Jorge Sans</contributor><creatorcontrib>Grimes, Mark L</creatorcontrib><creatorcontrib>Lee, Wan-Jui</creatorcontrib><creatorcontrib>van der Maaten, Laurens</creatorcontrib><creatorcontrib>Shannon, Paul</creatorcontrib><title>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Bioinformatics</subject><subject>Biology</subject><subject>Cancer</subject><subject>Care and treatment</subject><subject>Cellular signal transduction</subject><subject>Cluster Analysis</subject><subject>Clustering</subject><subject>Computational Biology - methods</subject><subject>Correlation</subject><subject>Data analysis</subject><subject>Data Interpretation, Statistical</subject><subject>Datasets</subject><subject>Embedding</subject><subject>Gene Expression Profiling</subject><subject>Genetic aspects</subject><subject>Health aspects</subject><subject>Humans</subject><subject>Kinases</subject><subject>Lung cancer</subject><subject>Lung diseases</subject><subject>Mass Spectrometry</subject><subject>Mass spectroscopy</subject><subject>Medical research</subject><subject>Medicine</subject><subject>Missing data</subject><subject>Multidimensional methods</subject><subject>Multidimensional scaling</subject><subject>Neoplasms - metabolism</subject><subject>Pattern recognition</subject><subject>Peptides</subject><subject>Phosphorylation</subject><subject>Phosphotransferases</subject><subject>Programming languages</subject><subject>Protein Interaction Maps</subject><subject>Protein-tyrosine kinase</subject><subject>Protein-Tyrosine Kinases - metabolism</subject><subject>Proteins</subject><subject>Proteomics - methods</subject><subject>Scaling</subject><subject>Scientific imaging</subject><subject>Signal transduction</subject><subject>Signal Transduction - physiology</subject><subject>Signaling</subject><subject>Software</subject><subject>Software packages</subject><subject>Stochastic Processes</subject><subject>Stochasticity</subject><subject>Tumors</subject><subject>Tyrosine</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkl2L1DAUhoso7rr6D0QLgujFjEmTJs2NsCyuDiws-HkZTjOnnSydZkxSdf-96U53mcpeSOlX-rzv6Tl5s-w5JUvKJH135QbfQ7fcuR6XhJRFVfEH2TFVrFiIgrCHB89H2ZMQrhLEKiEeZ0cFY4QopY6z8x8e-razfZvvNi6kc-ddRLe1Jl9DhDy6HLvB2PSCuYHeoM-DbVPlGw3EzW-4Dk-zRw10AZ9N95Ps2_mHr2efFheXH1dnpxcLI1QRFw2KQgiGBEuuCGcoS9KImmFdKlJXsqAASMx6vFRNqUoBCIiojFR1IRt2kr3c--46F_Q0gqBpocZ-qCSJWO2JtYMrvfN2C_5aO7D6ZsH5VoOP1nSoG9bUlHHGCuQcgCtai7rkQmBFGymq5PV-qjbUW1wb7KOHbmY6_9LbjW7dL81KJmTFksGbycC7nwOGqLc2GOw66NEN439LxrhUdKz16h_0_u4mqoXUgO0bl-qa0VSfcllxpiQViVreQ6VjjWlfU14am9ZngrczQWIi_oktDCHo1ZfP_89efp-zrw_YDUIXN8F1Q7SuD3OQ70HjXQgem7shU6LHuN9OQ49x11Pck-zF4QbdiW7zzf4CiGT6sA</recordid><startdate>20130103</startdate><enddate>20130103</enddate><creator>Grimes, Mark L</creator><creator>Lee, Wan-Jui</creator><creator>van der Maaten, Laurens</creator><creator>Shannon, Paul</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20130103</creationdate><title>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</title><author>Grimes, Mark L ; Lee, Wan-Jui ; van der Maaten, Laurens ; Shannon, Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Bioinformatics</topic><topic>Biology</topic><topic>Cancer</topic><topic>Care and treatment</topic><topic>Cellular signal transduction</topic><topic>Cluster Analysis</topic><topic>Clustering</topic><topic>Computational Biology - methods</topic><topic>Correlation</topic><topic>Data analysis</topic><topic>Data Interpretation, Statistical</topic><topic>Datasets</topic><topic>Embedding</topic><topic>Gene Expression Profiling</topic><topic>Genetic aspects</topic><topic>Health aspects</topic><topic>Humans</topic><topic>Kinases</topic><topic>Lung cancer</topic><topic>Lung diseases</topic><topic>Mass Spectrometry</topic><topic>Mass spectroscopy</topic><topic>Medical research</topic><topic>Medicine</topic><topic>Missing data</topic><topic>Multidimensional methods</topic><topic>Multidimensional scaling</topic><topic>Neoplasms - metabolism</topic><topic>Pattern recognition</topic><topic>Peptides</topic><topic>Phosphorylation</topic><topic>Phosphotransferases</topic><topic>Programming languages</topic><topic>Protein Interaction Maps</topic><topic>Protein-tyrosine kinase</topic><topic>Protein-Tyrosine Kinases - metabolism</topic><topic>Proteins</topic><topic>Proteomics - methods</topic><topic>Scaling</topic><topic>Scientific imaging</topic><topic>Signal transduction</topic><topic>Signal Transduction - physiology</topic><topic>Signaling</topic><topic>Software</topic><topic>Software packages</topic><topic>Stochastic Processes</topic><topic>Stochasticity</topic><topic>Tumors</topic><topic>Tyrosine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Grimes, Mark L</creatorcontrib><creatorcontrib>Lee, Wan-Jui</creatorcontrib><creatorcontrib>van der Maaten, Laurens</creatorcontrib><creatorcontrib>Shannon, Paul</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Grimes, Mark L</au><au>Lee, Wan-Jui</au><au>van der Maaten, Laurens</au><au>Shannon, Paul</au><au>Burns, Jorge Sans</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2013-01-03</date><risdate>2013</risdate><volume>8</volume><issue>1</issue><spage>e52884</spage><epage>e52884</epage><pages>e52884-e52884</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>23300999</pmid><doi>10.1371/journal.pone.0052884</doi><tpages>e52884</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1932-6203
ispartof	PloS one, 2013-01, Vol.8 (1), p.e52884-e52884
issn	1932-6203 1932-6203
language	eng
recordid	cdi_plos_journals_1290099170
source	MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS); EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects	Algorithms Analysis Bioinformatics Biology Cancer Care and treatment Cellular signal transduction Cluster Analysis Clustering Computational Biology - methods Correlation Data analysis Data Interpretation, Statistical Datasets Embedding Gene Expression Profiling Genetic aspects Health aspects Humans Kinases Lung cancer Lung diseases Mass Spectrometry Mass spectroscopy Medical research Medicine Missing data Multidimensional methods Multidimensional scaling Neoplasms - metabolism Pattern recognition Peptides Phosphorylation Phosphotransferases Programming languages Protein Interaction Maps Protein-tyrosine kinase Protein-Tyrosine Kinases - metabolism Proteins Proteomics - methods Scaling Scientific imaging Signal transduction Signal Transduction - physiology Signaling Software Software packages Stochastic Processes Stochasticity Tumors Tyrosine
title	Wrangling phosphoproteomic data to elucidate cancer signaling pathways
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T03%3A13%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Wrangling%20phosphoproteomic%20data%20to%20elucidate%20cancer%20signaling%20pathways&rft.jtitle=PloS%20one&rft.au=Grimes,%20Mark%20L&rft.date=2013-01-03&rft.volume=8&rft.issue=1&rft.spage=e52884&rft.epage=e52884&rft.pages=e52884-e52884&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0052884&rft_dat=%3Cgale_plos_%3EA478439716%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1290099170&rft_id=info:pmid/23300999&rft_galeid=A478439716&rft_doaj_id=oai_doaj_org_article_f3fb134332e44aa491b6b5466e81f768&rfr_iscdi=true