Clustering of Longitudinal Trajectories Using Correlation-Based Distances

The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SN computer science 2021-11, Vol.2 (6), p.432, Article 432
Hauptverfasser:	Pinto da Costa, Joaquim F., Ferreira, Fábio, Mascarello, Martina, Gaio, Rita
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Cluster analysis Clustering Computer Imaging Computer Science Computer Systems Organization and Communication Networks Correlation analysis Data Structures and Information Theory Datasets Decomposition Euclidean geometry Information Systems and Communication Service Longitudinal studies Methodology Original Research Pattern Recognition and Graphics Proust, Marcel (1871-1922) Software Engineering/Programming and Operating Systems Stocks Time series Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	6
container_start_page	432
container_title	SN computer science
container_volume	2
creator	Pinto da Costa, Joaquim F. Ferreira, Fábio Mascarello, Martina Gaio, Rita
description	The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model, lcmm (Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.
doi_str_mv	10.1007/s42979-021-00822-2
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2932345015</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2932345015</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EElXpH2CKxGw4X-zEHiF8VarEUiQ2y3WcKlWIiy8Z-PekLRIb093wPu-dHsauBdwKgPKOJJrScEDBATQixzM2w6IQXBsoz487cmPUxyVbEO0AABVIWagZW1bdSENIbb_NYpOtYr9th7Fue9dl6-R2wQ8xtYGydzpEqphS6NzQxp4_OAp19tjS4Hof6IpdNK6jsPidc7Z-flpXr3z19rKs7lfcYy6RCy9VaQoMRsuNrlGiLJtSFxKE9o03tcq9ghwDNn6DJkgUTteN0NLVoIp8zm5OtfsUv8ZAg93FMU3vkkWTTycUCDWl8JTyKRKl0Nh9aj9d-rYC7EGaPUmzkzR7lGZxgvITRPuDj5D-qv-hfgDgum5N</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932345015</pqid></control><display><type>article</type><title>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</title><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Pinto da Costa, Joaquim F. ; Ferreira, Fábio ; Mascarello, Martina ; Gaio, Rita</creator><creatorcontrib>Pinto da Costa, Joaquim F. ; Ferreira, Fábio ; Mascarello, Martina ; Gaio, Rita</creatorcontrib><description>The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model, lcmm (Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.</description><identifier>ISSN: 2662-995X</identifier><identifier>EISSN: 2661-8907</identifier><identifier>DOI: 10.1007/s42979-021-00822-2</identifier><language>eng</language><publisher>Singapore: Springer Singapore</publisher><subject>Algorithms ; Cluster analysis ; Clustering ; Computer Imaging ; Computer Science ; Computer Systems Organization and Communication Networks ; Correlation analysis ; Data Structures and Information Theory ; Datasets ; Decomposition ; Euclidean geometry ; Information Systems and Communication Service ; Longitudinal studies ; Methodology ; Original Research ; Pattern Recognition and Graphics ; Proust, Marcel (1871-1922) ; Software Engineering/Programming and Operating Systems ; Stocks ; Time series ; Vision</subject><ispartof>SN computer science, 2021-11, Vol.2 (6), p.432, Article 432</ispartof><rights>The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021</rights><rights>The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</citedby><cites>FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</cites><orcidid>0000-0002-3991-2715</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s42979-021-00822-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2932345015?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Pinto da Costa, Joaquim F.</creatorcontrib><creatorcontrib>Ferreira, Fábio</creatorcontrib><creatorcontrib>Mascarello, Martina</creatorcontrib><creatorcontrib>Gaio, Rita</creatorcontrib><title>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</title><title>SN computer science</title><addtitle>SN COMPUT. SCI</addtitle><description>The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model, lcmm (Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.</description><subject>Algorithms</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer Systems Organization and Communication Networks</subject><subject>Correlation analysis</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Decomposition</subject><subject>Euclidean geometry</subject><subject>Information Systems and Communication Service</subject><subject>Longitudinal studies</subject><subject>Methodology</subject><subject>Original Research</subject><subject>Pattern Recognition and Graphics</subject><subject>Proust, Marcel (1871-1922)</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Stocks</subject><subject>Time series</subject><subject>Vision</subject><issn>2662-995X</issn><issn>2661-8907</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kD1PwzAQhi0EElXpH2CKxGw4X-zEHiF8VarEUiQ2y3WcKlWIiy8Z-PekLRIb093wPu-dHsauBdwKgPKOJJrScEDBATQixzM2w6IQXBsoz487cmPUxyVbEO0AABVIWagZW1bdSENIbb_NYpOtYr9th7Fue9dl6-R2wQ8xtYGydzpEqphS6NzQxp4_OAp19tjS4Hof6IpdNK6jsPidc7Z-flpXr3z19rKs7lfcYy6RCy9VaQoMRsuNrlGiLJtSFxKE9o03tcq9ghwDNn6DJkgUTteN0NLVoIp8zm5OtfsUv8ZAg93FMU3vkkWTTycUCDWl8JTyKRKl0Nh9aj9d-rYC7EGaPUmzkzR7lGZxgvITRPuDj5D-qv-hfgDgum5N</recordid><startdate>20211101</startdate><enddate>20211101</enddate><creator>Pinto da Costa, Joaquim F.</creator><creator>Ferreira, Fábio</creator><creator>Mascarello, Martina</creator><creator>Gaio, Rita</creator><general>Springer Singapore</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0002-3991-2715</orcidid></search><sort><creationdate>20211101</creationdate><title>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</title><author>Pinto da Costa, Joaquim F. ; Ferreira, Fábio ; Mascarello, Martina ; Gaio, Rita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer Systems Organization and Communication Networks</topic><topic>Correlation analysis</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Decomposition</topic><topic>Euclidean geometry</topic><topic>Information Systems and Communication Service</topic><topic>Longitudinal studies</topic><topic>Methodology</topic><topic>Original Research</topic><topic>Pattern Recognition and Graphics</topic><topic>Proust, Marcel (1871-1922)</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Stocks</topic><topic>Time series</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pinto da Costa, Joaquim F.</creatorcontrib><creatorcontrib>Ferreira, Fábio</creatorcontrib><creatorcontrib>Mascarello, Martina</creatorcontrib><creatorcontrib>Gaio, Rita</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>SN computer science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pinto da Costa, Joaquim F.</au><au>Ferreira, Fábio</au><au>Mascarello, Martina</au><au>Gaio, Rita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</atitle><jtitle>SN computer science</jtitle><stitle>SN COMPUT. SCI</stitle><date>2021-11-01</date><risdate>2021</risdate><volume>2</volume><issue>6</issue><spage>432</spage><pages>432-</pages><artnum>432</artnum><issn>2662-995X</issn><eissn>2661-8907</eissn><abstract>The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model, lcmm (Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.</abstract><cop>Singapore</cop><pub>Springer Singapore</pub><doi>10.1007/s42979-021-00822-2</doi><orcidid>https://orcid.org/0000-0002-3991-2715</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2662-995X
ispartof	SN computer science, 2021-11, Vol.2 (6), p.432, Article 432
issn	2662-995X 2661-8907
language	eng
recordid	cdi_proquest_journals_2932345015
source	SpringerLink Journals - AutoHoldings; ProQuest Central
subjects	Algorithms Cluster analysis Clustering Computer Imaging Computer Science Computer Systems Organization and Communication Networks Correlation analysis Data Structures and Information Theory Datasets Decomposition Euclidean geometry Information Systems and Communication Service Longitudinal studies Methodology Original Research Pattern Recognition and Graphics Proust, Marcel (1871-1922) Software Engineering/Programming and Operating Systems Stocks Time series Vision
title	Clustering of Longitudinal Trajectories Using Correlation-Based Distances
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T14%3A34%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20of%20Longitudinal%20Trajectories%20Using%20Correlation-Based%20Distances&rft.jtitle=SN%20computer%20science&rft.au=Pinto%20da%20Costa,%20Joaquim%20F.&rft.date=2021-11-01&rft.volume=2&rft.issue=6&rft.spage=432&rft.pages=432-&rft.artnum=432&rft.issn=2662-995X&rft.eissn=2661-8907&rft_id=info:doi/10.1007/s42979-021-00822-2&rft_dat=%3Cproquest_cross%3E2932345015%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2932345015&rft_id=info:pmid/&rfr_iscdi=true