ClusTrack: Feature extraction and similarity measures for clustering of genome-wide data sets

Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-leve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Rydbeck, Halfdan, Sandve, Geir Kjetil F, Ferkingstad, Egil, Simovski, Boris, Rye, Morten Beck, Hovig, Johannes Eivind
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Rydbeck, Halfdan Sandve, Geir Kjetil F Ferkingstad, Egil Simovski, Boris Rye, Morten Beck Hovig, Johannes Eivind
description	Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.
format	Article
fullrecord	<record><control><sourceid>cristin_3HK</sourceid><recordid>TN_cdi_cristin_nora_11250_2373562</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>11250_2373562</sourcerecordid><originalsourceid>FETCH-cristin_nora_11250_23735623</originalsourceid><addsrcrecordid>eNqNi0EKwkAMAPfiQdQ_xAcUbEsVvBaLD-hVSthNS3C7C0mK-nt78AGeBoaZrXu0cdFe0D-v0BHaIgT0tlUY5wSYAijPHFHYPjAT6loojFnAr6eRcJogjzBRyjMVLw4EAQ1ByXTvNiNGpcOPO3fsbn17L7ywGqchZcGhLKvmNFT1pW7OVf1P8wUGyzwA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ClusTrack: Feature extraction and similarity measures for clustering of genome-wide data sets</title><source>NORA - Norwegian Open Research Archives</source><creator>Rydbeck, Halfdan ; Sandve, Geir Kjetil F ; Ferkingstad, Egil ; Simovski, Boris ; Rye, Morten Beck ; Hovig, Johannes Eivind</creator><creatorcontrib>Rydbeck, Halfdan ; Sandve, Geir Kjetil F ; Ferkingstad, Egil ; Simovski, Boris ; Rye, Morten Beck ; Hovig, Johannes Eivind</creatorcontrib><description>Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.</description><language>eng</language><publisher>Public Library of Science</publisher><creationdate>2015</creationdate><rights>info:eu-repo/semantics/openAccess</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,780,885,26567</link.rule.ids><linktorsrc>$$Uhttp://hdl.handle.net/11250/2373562$$EView_record_in_NORA$$FView_record_in_$$GNORA$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Rydbeck, Halfdan</creatorcontrib><creatorcontrib>Sandve, Geir Kjetil F</creatorcontrib><creatorcontrib>Ferkingstad, Egil</creatorcontrib><creatorcontrib>Simovski, Boris</creatorcontrib><creatorcontrib>Rye, Morten Beck</creatorcontrib><creatorcontrib>Hovig, Johannes Eivind</creatorcontrib><title>ClusTrack: Feature extraction and similarity measures for clustering of genome-wide data sets</title><description>Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.</description><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>3HK</sourceid><recordid>eNqNi0EKwkAMAPfiQdQ_xAcUbEsVvBaLD-hVSthNS3C7C0mK-nt78AGeBoaZrXu0cdFe0D-v0BHaIgT0tlUY5wSYAijPHFHYPjAT6loojFnAr6eRcJogjzBRyjMVLw4EAQ1ByXTvNiNGpcOPO3fsbn17L7ywGqchZcGhLKvmNFT1pW7OVf1P8wUGyzwA</recordid><startdate>2015</startdate><enddate>2015</enddate><creator>Rydbeck, Halfdan</creator><creator>Sandve, Geir Kjetil F</creator><creator>Ferkingstad, Egil</creator><creator>Simovski, Boris</creator><creator>Rye, Morten Beck</creator><creator>Hovig, Johannes Eivind</creator><general>Public Library of Science</general><scope>3HK</scope></search><sort><creationdate>2015</creationdate><title>ClusTrack: Feature extraction and similarity measures for clustering of genome-wide data sets</title><author>Rydbeck, Halfdan ; Sandve, Geir Kjetil F ; Ferkingstad, Egil ; Simovski, Boris ; Rye, Morten Beck ; Hovig, Johannes Eivind</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-cristin_nora_11250_23735623</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Rydbeck, Halfdan</creatorcontrib><creatorcontrib>Sandve, Geir Kjetil F</creatorcontrib><creatorcontrib>Ferkingstad, Egil</creatorcontrib><creatorcontrib>Simovski, Boris</creatorcontrib><creatorcontrib>Rye, Morten Beck</creatorcontrib><creatorcontrib>Hovig, Johannes Eivind</creatorcontrib><collection>NORA - Norwegian Open Research Archives</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rydbeck, Halfdan</au><au>Sandve, Geir Kjetil F</au><au>Ferkingstad, Egil</au><au>Simovski, Boris</au><au>Rye, Morten Beck</au><au>Hovig, Johannes Eivind</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ClusTrack: Feature extraction and similarity measures for clustering of genome-wide data sets</atitle><date>2015</date><risdate>2015</risdate><abstract>Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.</abstract><pub>Public Library of Science</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_cristin_nora_11250_2373562
source	NORA - Norwegian Open Research Archives
title	ClusTrack: Feature extraction and similarity measures for clustering of genome-wide data sets
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T07%3A20%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-cristin_3HK&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ClusTrack:%20Feature%20extraction%20and%20similarity%20measures%20for%20clustering%20of%20genome-wide%20data%20sets&rft.au=Rydbeck,%20Halfdan&rft.date=2015&rft_id=info:doi/&rft_dat=%3Ccristin_3HK%3E11250_2373562%3C/cristin_3HK%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true