Clustering of Longitudinal Trajectories Using Correlation-Based Distances

The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SN computer science 2021-11, Vol.2 (6), p.432, Article 432
Hauptverfasser: Pinto da Costa, Joaquim F., Ferreira, Fábio, Mascarello, Martina, Gaio, Rita
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 6
container_start_page 432
container_title SN computer science
container_volume 2
creator Pinto da Costa, Joaquim F.
Ferreira, Fábio
Mascarello, Martina
Gaio, Rita
description The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model, lcmm (Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.
doi_str_mv 10.1007/s42979-021-00822-2
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2932345015</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2932345015</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EElXpH2CKxGw4X-zEHiF8VarEUiQ2y3WcKlWIiy8Z-PekLRIb093wPu-dHsauBdwKgPKOJJrScEDBATQixzM2w6IQXBsoz487cmPUxyVbEO0AABVIWagZW1bdSENIbb_NYpOtYr9th7Fue9dl6-R2wQ8xtYGydzpEqphS6NzQxp4_OAp19tjS4Hof6IpdNK6jsPidc7Z-flpXr3z19rKs7lfcYy6RCy9VaQoMRsuNrlGiLJtSFxKE9o03tcq9ghwDNn6DJkgUTteN0NLVoIp8zm5OtfsUv8ZAg93FMU3vkkWTTycUCDWl8JTyKRKl0Nh9aj9d-rYC7EGaPUmzkzR7lGZxgvITRPuDj5D-qv-hfgDgum5N</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932345015</pqid></control><display><type>article</type><title>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</title><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Pinto da Costa, Joaquim F. ; Ferreira, Fábio ; Mascarello, Martina ; Gaio, Rita</creator><creatorcontrib>Pinto da Costa, Joaquim F. ; Ferreira, Fábio ; Mascarello, Martina ; Gaio, Rita</creatorcontrib><description>The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model, lcmm (Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.</description><identifier>ISSN: 2662-995X</identifier><identifier>EISSN: 2661-8907</identifier><identifier>DOI: 10.1007/s42979-021-00822-2</identifier><language>eng</language><publisher>Singapore: Springer Singapore</publisher><subject>Algorithms ; Cluster analysis ; Clustering ; Computer Imaging ; Computer Science ; Computer Systems Organization and Communication Networks ; Correlation analysis ; Data Structures and Information Theory ; Datasets ; Decomposition ; Euclidean geometry ; Information Systems and Communication Service ; Longitudinal studies ; Methodology ; Original Research ; Pattern Recognition and Graphics ; Proust, Marcel (1871-1922) ; Software Engineering/Programming and Operating Systems ; Stocks ; Time series ; Vision</subject><ispartof>SN computer science, 2021-11, Vol.2 (6), p.432, Article 432</ispartof><rights>The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021</rights><rights>The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</citedby><cites>FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</cites><orcidid>0000-0002-3991-2715</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s42979-021-00822-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2932345015?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Pinto da Costa, Joaquim F.</creatorcontrib><creatorcontrib>Ferreira, Fábio</creatorcontrib><creatorcontrib>Mascarello, Martina</creatorcontrib><creatorcontrib>Gaio, Rita</creatorcontrib><title>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</title><title>SN computer science</title><addtitle>SN COMPUT. SCI</addtitle><description>The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model, lcmm (Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.</description><subject>Algorithms</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer Systems Organization and Communication Networks</subject><subject>Correlation analysis</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Decomposition</subject><subject>Euclidean geometry</subject><subject>Information Systems and Communication Service</subject><subject>Longitudinal studies</subject><subject>Methodology</subject><subject>Original Research</subject><subject>Pattern Recognition and Graphics</subject><subject>Proust, Marcel (1871-1922)</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Stocks</subject><subject>Time series</subject><subject>Vision</subject><issn>2662-995X</issn><issn>2661-8907</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kD1PwzAQhi0EElXpH2CKxGw4X-zEHiF8VarEUiQ2y3WcKlWIiy8Z-PekLRIb093wPu-dHsauBdwKgPKOJJrScEDBATQixzM2w6IQXBsoz487cmPUxyVbEO0AABVIWagZW1bdSENIbb_NYpOtYr9th7Fue9dl6-R2wQ8xtYGydzpEqphS6NzQxp4_OAp19tjS4Hof6IpdNK6jsPidc7Z-flpXr3z19rKs7lfcYy6RCy9VaQoMRsuNrlGiLJtSFxKE9o03tcq9ghwDNn6DJkgUTteN0NLVoIp8zm5OtfsUv8ZAg93FMU3vkkWTTycUCDWl8JTyKRKl0Nh9aj9d-rYC7EGaPUmzkzR7lGZxgvITRPuDj5D-qv-hfgDgum5N</recordid><startdate>20211101</startdate><enddate>20211101</enddate><creator>Pinto da Costa, Joaquim F.</creator><creator>Ferreira, Fábio</creator><creator>Mascarello, Martina</creator><creator>Gaio, Rita</creator><general>Springer Singapore</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0002-3991-2715</orcidid></search><sort><creationdate>20211101</creationdate><title>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</title><author>Pinto da Costa, Joaquim F. ; Ferreira, Fábio ; Mascarello, Martina ; Gaio, Rita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer Systems Organization and Communication Networks</topic><topic>Correlation analysis</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Decomposition</topic><topic>Euclidean geometry</topic><topic>Information Systems and Communication Service</topic><topic>Longitudinal studies</topic><topic>Methodology</topic><topic>Original Research</topic><topic>Pattern Recognition and Graphics</topic><topic>Proust, Marcel (1871-1922)</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Stocks</topic><topic>Time series</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pinto da Costa, Joaquim F.</creatorcontrib><creatorcontrib>Ferreira, Fábio</creatorcontrib><creatorcontrib>Mascarello, Martina</creatorcontrib><creatorcontrib>Gaio, Rita</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>SN computer science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pinto da Costa, Joaquim F.</au><au>Ferreira, Fábio</au><au>Mascarello, Martina</au><au>Gaio, Rita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</atitle><jtitle>SN computer science</jtitle><stitle>SN COMPUT. SCI</stitle><date>2021-11-01</date><risdate>2021</risdate><volume>2</volume><issue>6</issue><spage>432</spage><pages>432-</pages><artnum>432</artnum><issn>2662-995X</issn><eissn>2661-8907</eissn><abstract>The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model, lcmm (Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.</abstract><cop>Singapore</cop><pub>Springer Singapore</pub><doi>10.1007/s42979-021-00822-2</doi><orcidid>https://orcid.org/0000-0002-3991-2715</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2662-995X
ispartof SN computer science, 2021-11, Vol.2 (6), p.432, Article 432
issn 2662-995X
2661-8907
language eng
recordid cdi_proquest_journals_2932345015
source SpringerLink Journals - AutoHoldings; ProQuest Central
subjects Algorithms
Cluster analysis
Clustering
Computer Imaging
Computer Science
Computer Systems Organization and Communication Networks
Correlation analysis
Data Structures and Information Theory
Datasets
Decomposition
Euclidean geometry
Information Systems and Communication Service
Longitudinal studies
Methodology
Original Research
Pattern Recognition and Graphics
Proust, Marcel (1871-1922)
Software Engineering/Programming and Operating Systems
Stocks
Time series
Vision
title Clustering of Longitudinal Trajectories Using Correlation-Based Distances
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T14%3A34%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20of%20Longitudinal%20Trajectories%20Using%20Correlation-Based%20Distances&rft.jtitle=SN%20computer%20science&rft.au=Pinto%20da%20Costa,%20Joaquim%20F.&rft.date=2021-11-01&rft.volume=2&rft.issue=6&rft.spage=432&rft.pages=432-&rft.artnum=432&rft.issn=2662-995X&rft.eissn=2661-8907&rft_id=info:doi/10.1007/s42979-021-00822-2&rft_dat=%3Cproquest_cross%3E2932345015%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2932345015&rft_id=info:pmid/&rfr_iscdi=true