Wrangling phosphoproteomic data to elucidate cancer signaling pathways
The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging...
Gespeichert in:
Veröffentlicht in: | PloS one 2013-01, Vol.8 (1), p.e52884-e52884 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e52884 |
---|---|
container_issue | 1 |
container_start_page | e52884 |
container_title | PloS one |
container_volume | 8 |
creator | Grimes, Mark L Lee, Wan-Jui van der Maaten, Laurens Shannon, Paul |
description | The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed. |
doi_str_mv | 10.1371/journal.pone.0052884 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1290099170</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A478439716</galeid><doaj_id>oai_doaj_org_article_f3fb134332e44aa491b6b5466e81f768</doaj_id><sourcerecordid>A478439716</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</originalsourceid><addsrcrecordid>eNqNkl2L1DAUhoso7rr6D0QLgujFjEmTJs2NsCyuDiws-HkZTjOnnSydZkxSdf-96U53mcpeSOlX-rzv6Tl5s-w5JUvKJH135QbfQ7fcuR6XhJRFVfEH2TFVrFiIgrCHB89H2ZMQrhLEKiEeZ0cFY4QopY6z8x8e-razfZvvNi6kc-ddRLe1Jl9DhDy6HLvB2PSCuYHeoM-DbVPlGw3EzW-4Dk-zRw10AZ9N95Ps2_mHr2efFheXH1dnpxcLI1QRFw2KQgiGBEuuCGcoS9KImmFdKlJXsqAASMx6vFRNqUoBCIiojFR1IRt2kr3c--46F_Q0gqBpocZ-qCSJWO2JtYMrvfN2C_5aO7D6ZsH5VoOP1nSoG9bUlHHGCuQcgCtai7rkQmBFGymq5PV-qjbUW1wb7KOHbmY6_9LbjW7dL81KJmTFksGbycC7nwOGqLc2GOw66NEN439LxrhUdKz16h_0_u4mqoXUgO0bl-qa0VSfcllxpiQViVreQ6VjjWlfU14am9ZngrczQWIi_oktDCHo1ZfP_89efp-zrw_YDUIXN8F1Q7SuD3OQ70HjXQgem7shU6LHuN9OQ49x11Pck-zF4QbdiW7zzf4CiGT6sA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1290099170</pqid></control><display><type>article</type><title>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Grimes, Mark L ; Lee, Wan-Jui ; van der Maaten, Laurens ; Shannon, Paul</creator><contributor>Burns, Jorge Sans</contributor><creatorcontrib>Grimes, Mark L ; Lee, Wan-Jui ; van der Maaten, Laurens ; Shannon, Paul ; Burns, Jorge Sans</creatorcontrib><description>The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0052884</identifier><identifier>PMID: 23300999</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Analysis ; Bioinformatics ; Biology ; Cancer ; Care and treatment ; Cellular signal transduction ; Cluster Analysis ; Clustering ; Computational Biology - methods ; Correlation ; Data analysis ; Data Interpretation, Statistical ; Datasets ; Embedding ; Gene Expression Profiling ; Genetic aspects ; Health aspects ; Humans ; Kinases ; Lung cancer ; Lung diseases ; Mass Spectrometry ; Mass spectroscopy ; Medical research ; Medicine ; Missing data ; Multidimensional methods ; Multidimensional scaling ; Neoplasms - metabolism ; Pattern recognition ; Peptides ; Phosphorylation ; Phosphotransferases ; Programming languages ; Protein Interaction Maps ; Protein-tyrosine kinase ; Protein-Tyrosine Kinases - metabolism ; Proteins ; Proteomics - methods ; Scaling ; Scientific imaging ; Signal transduction ; Signal Transduction - physiology ; Signaling ; Software ; Software packages ; Stochastic Processes ; Stochasticity ; Tumors ; Tyrosine</subject><ispartof>PloS one, 2013-01, Vol.8 (1), p.e52884-e52884</ispartof><rights>COPYRIGHT 2013 Public Library of Science</rights><rights>2013 Grimes et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2013 Grimes et al 2013 Grimes et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</citedby><cites>FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536783/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536783/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23300999$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Burns, Jorge Sans</contributor><creatorcontrib>Grimes, Mark L</creatorcontrib><creatorcontrib>Lee, Wan-Jui</creatorcontrib><creatorcontrib>van der Maaten, Laurens</creatorcontrib><creatorcontrib>Shannon, Paul</creatorcontrib><title>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Bioinformatics</subject><subject>Biology</subject><subject>Cancer</subject><subject>Care and treatment</subject><subject>Cellular signal transduction</subject><subject>Cluster Analysis</subject><subject>Clustering</subject><subject>Computational Biology - methods</subject><subject>Correlation</subject><subject>Data analysis</subject><subject>Data Interpretation, Statistical</subject><subject>Datasets</subject><subject>Embedding</subject><subject>Gene Expression Profiling</subject><subject>Genetic aspects</subject><subject>Health aspects</subject><subject>Humans</subject><subject>Kinases</subject><subject>Lung cancer</subject><subject>Lung diseases</subject><subject>Mass Spectrometry</subject><subject>Mass spectroscopy</subject><subject>Medical research</subject><subject>Medicine</subject><subject>Missing data</subject><subject>Multidimensional methods</subject><subject>Multidimensional scaling</subject><subject>Neoplasms - metabolism</subject><subject>Pattern recognition</subject><subject>Peptides</subject><subject>Phosphorylation</subject><subject>Phosphotransferases</subject><subject>Programming languages</subject><subject>Protein Interaction Maps</subject><subject>Protein-tyrosine kinase</subject><subject>Protein-Tyrosine Kinases - metabolism</subject><subject>Proteins</subject><subject>Proteomics - methods</subject><subject>Scaling</subject><subject>Scientific imaging</subject><subject>Signal transduction</subject><subject>Signal Transduction - physiology</subject><subject>Signaling</subject><subject>Software</subject><subject>Software packages</subject><subject>Stochastic Processes</subject><subject>Stochasticity</subject><subject>Tumors</subject><subject>Tyrosine</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkl2L1DAUhoso7rr6D0QLgujFjEmTJs2NsCyuDiws-HkZTjOnnSydZkxSdf-96U53mcpeSOlX-rzv6Tl5s-w5JUvKJH135QbfQ7fcuR6XhJRFVfEH2TFVrFiIgrCHB89H2ZMQrhLEKiEeZ0cFY4QopY6z8x8e-razfZvvNi6kc-ddRLe1Jl9DhDy6HLvB2PSCuYHeoM-DbVPlGw3EzW-4Dk-zRw10AZ9N95Ps2_mHr2efFheXH1dnpxcLI1QRFw2KQgiGBEuuCGcoS9KImmFdKlJXsqAASMx6vFRNqUoBCIiojFR1IRt2kr3c--46F_Q0gqBpocZ-qCSJWO2JtYMrvfN2C_5aO7D6ZsH5VoOP1nSoG9bUlHHGCuQcgCtai7rkQmBFGymq5PV-qjbUW1wb7KOHbmY6_9LbjW7dL81KJmTFksGbycC7nwOGqLc2GOw66NEN439LxrhUdKz16h_0_u4mqoXUgO0bl-qa0VSfcllxpiQViVreQ6VjjWlfU14am9ZngrczQWIi_oktDCHo1ZfP_89efp-zrw_YDUIXN8F1Q7SuD3OQ70HjXQgem7shU6LHuN9OQ49x11Pck-zF4QbdiW7zzf4CiGT6sA</recordid><startdate>20130103</startdate><enddate>20130103</enddate><creator>Grimes, Mark L</creator><creator>Lee, Wan-Jui</creator><creator>van der Maaten, Laurens</creator><creator>Shannon, Paul</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20130103</creationdate><title>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</title><author>Grimes, Mark L ; Lee, Wan-Jui ; van der Maaten, Laurens ; Shannon, Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-fe62663e0e549043e750f6b3eb590b8721aae0cdae0c8f5956aeaeee9c79b27f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Bioinformatics</topic><topic>Biology</topic><topic>Cancer</topic><topic>Care and treatment</topic><topic>Cellular signal transduction</topic><topic>Cluster Analysis</topic><topic>Clustering</topic><topic>Computational Biology - methods</topic><topic>Correlation</topic><topic>Data analysis</topic><topic>Data Interpretation, Statistical</topic><topic>Datasets</topic><topic>Embedding</topic><topic>Gene Expression Profiling</topic><topic>Genetic aspects</topic><topic>Health aspects</topic><topic>Humans</topic><topic>Kinases</topic><topic>Lung cancer</topic><topic>Lung diseases</topic><topic>Mass Spectrometry</topic><topic>Mass spectroscopy</topic><topic>Medical research</topic><topic>Medicine</topic><topic>Missing data</topic><topic>Multidimensional methods</topic><topic>Multidimensional scaling</topic><topic>Neoplasms - metabolism</topic><topic>Pattern recognition</topic><topic>Peptides</topic><topic>Phosphorylation</topic><topic>Phosphotransferases</topic><topic>Programming languages</topic><topic>Protein Interaction Maps</topic><topic>Protein-tyrosine kinase</topic><topic>Protein-Tyrosine Kinases - metabolism</topic><topic>Proteins</topic><topic>Proteomics - methods</topic><topic>Scaling</topic><topic>Scientific imaging</topic><topic>Signal transduction</topic><topic>Signal Transduction - physiology</topic><topic>Signaling</topic><topic>Software</topic><topic>Software packages</topic><topic>Stochastic Processes</topic><topic>Stochasticity</topic><topic>Tumors</topic><topic>Tyrosine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Grimes, Mark L</creatorcontrib><creatorcontrib>Lee, Wan-Jui</creatorcontrib><creatorcontrib>van der Maaten, Laurens</creatorcontrib><creatorcontrib>Shannon, Paul</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Grimes, Mark L</au><au>Lee, Wan-Jui</au><au>van der Maaten, Laurens</au><au>Shannon, Paul</au><au>Burns, Jorge Sans</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Wrangling phosphoproteomic data to elucidate cancer signaling pathways</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2013-01-03</date><risdate>2013</risdate><volume>8</volume><issue>1</issue><spage>e52884</spage><epage>e52884</epage><pages>e52884-e52884</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>23300999</pmid><doi>10.1371/journal.pone.0052884</doi><tpages>e52884</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2013-01, Vol.8 (1), p.e52884-e52884 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_1290099170 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS); EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Algorithms Analysis Bioinformatics Biology Cancer Care and treatment Cellular signal transduction Cluster Analysis Clustering Computational Biology - methods Correlation Data analysis Data Interpretation, Statistical Datasets Embedding Gene Expression Profiling Genetic aspects Health aspects Humans Kinases Lung cancer Lung diseases Mass Spectrometry Mass spectroscopy Medical research Medicine Missing data Multidimensional methods Multidimensional scaling Neoplasms - metabolism Pattern recognition Peptides Phosphorylation Phosphotransferases Programming languages Protein Interaction Maps Protein-tyrosine kinase Protein-Tyrosine Kinases - metabolism Proteins Proteomics - methods Scaling Scientific imaging Signal transduction Signal Transduction - physiology Signaling Software Software packages Stochastic Processes Stochasticity Tumors Tyrosine |
title | Wrangling phosphoproteomic data to elucidate cancer signaling pathways |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T03%3A13%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Wrangling%20phosphoproteomic%20data%20to%20elucidate%20cancer%20signaling%20pathways&rft.jtitle=PloS%20one&rft.au=Grimes,%20Mark%20L&rft.date=2013-01-03&rft.volume=8&rft.issue=1&rft.spage=e52884&rft.epage=e52884&rft.pages=e52884-e52884&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0052884&rft_dat=%3Cgale_plos_%3EA478439716%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1290099170&rft_id=info:pmid/23300999&rft_galeid=A478439716&rft_doaj_id=oai_doaj_org_article_f3fb134332e44aa491b6b5466e81f768&rfr_iscdi=true |