Clustering of Longitudinal Trajectories Using Correlation-Based Distances
The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition...
Gespeichert in:
Veröffentlicht in: | SN computer science 2021-11, Vol.2 (6), p.432, Article 432 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 6 |
container_start_page | 432 |
container_title | SN computer science |
container_volume | 2 |
creator | Pinto da Costa, Joaquim F. Ferreira, Fábio Mascarello, Martina Gaio, Rita |
description | The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model,
lcmm
(Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented. |
doi_str_mv | 10.1007/s42979-021-00822-2 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2932345015</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2932345015</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EElXpH2CKxGw4X-zEHiF8VarEUiQ2y3WcKlWIiy8Z-PekLRIb093wPu-dHsauBdwKgPKOJJrScEDBATQixzM2w6IQXBsoz487cmPUxyVbEO0AABVIWagZW1bdSENIbb_NYpOtYr9th7Fue9dl6-R2wQ8xtYGydzpEqphS6NzQxp4_OAp19tjS4Hof6IpdNK6jsPidc7Z-flpXr3z19rKs7lfcYy6RCy9VaQoMRsuNrlGiLJtSFxKE9o03tcq9ghwDNn6DJkgUTteN0NLVoIp8zm5OtfsUv8ZAg93FMU3vkkWTTycUCDWl8JTyKRKl0Nh9aj9d-rYC7EGaPUmzkzR7lGZxgvITRPuDj5D-qv-hfgDgum5N</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932345015</pqid></control><display><type>article</type><title>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</title><source>SpringerLink Journals - AutoHoldings</source><source>ProQuest Central</source><creator>Pinto da Costa, Joaquim F. ; Ferreira, Fábio ; Mascarello, Martina ; Gaio, Rita</creator><creatorcontrib>Pinto da Costa, Joaquim F. ; Ferreira, Fábio ; Mascarello, Martina ; Gaio, Rita</creatorcontrib><description>The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model,
lcmm
(Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.</description><identifier>ISSN: 2662-995X</identifier><identifier>EISSN: 2661-8907</identifier><identifier>DOI: 10.1007/s42979-021-00822-2</identifier><language>eng</language><publisher>Singapore: Springer Singapore</publisher><subject>Algorithms ; Cluster analysis ; Clustering ; Computer Imaging ; Computer Science ; Computer Systems Organization and Communication Networks ; Correlation analysis ; Data Structures and Information Theory ; Datasets ; Decomposition ; Euclidean geometry ; Information Systems and Communication Service ; Longitudinal studies ; Methodology ; Original Research ; Pattern Recognition and Graphics ; Proust, Marcel (1871-1922) ; Software Engineering/Programming and Operating Systems ; Stocks ; Time series ; Vision</subject><ispartof>SN computer science, 2021-11, Vol.2 (6), p.432, Article 432</ispartof><rights>The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021</rights><rights>The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</citedby><cites>FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</cites><orcidid>0000-0002-3991-2715</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s42979-021-00822-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2932345015?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Pinto da Costa, Joaquim F.</creatorcontrib><creatorcontrib>Ferreira, Fábio</creatorcontrib><creatorcontrib>Mascarello, Martina</creatorcontrib><creatorcontrib>Gaio, Rita</creatorcontrib><title>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</title><title>SN computer science</title><addtitle>SN COMPUT. SCI</addtitle><description>The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model,
lcmm
(Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.</description><subject>Algorithms</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer Systems Organization and Communication Networks</subject><subject>Correlation analysis</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Decomposition</subject><subject>Euclidean geometry</subject><subject>Information Systems and Communication Service</subject><subject>Longitudinal studies</subject><subject>Methodology</subject><subject>Original Research</subject><subject>Pattern Recognition and Graphics</subject><subject>Proust, Marcel (1871-1922)</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Stocks</subject><subject>Time series</subject><subject>Vision</subject><issn>2662-995X</issn><issn>2661-8907</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kD1PwzAQhi0EElXpH2CKxGw4X-zEHiF8VarEUiQ2y3WcKlWIiy8Z-PekLRIb093wPu-dHsauBdwKgPKOJJrScEDBATQixzM2w6IQXBsoz487cmPUxyVbEO0AABVIWagZW1bdSENIbb_NYpOtYr9th7Fue9dl6-R2wQ8xtYGydzpEqphS6NzQxp4_OAp19tjS4Hof6IpdNK6jsPidc7Z-flpXr3z19rKs7lfcYy6RCy9VaQoMRsuNrlGiLJtSFxKE9o03tcq9ghwDNn6DJkgUTteN0NLVoIp8zm5OtfsUv8ZAg93FMU3vkkWTTycUCDWl8JTyKRKl0Nh9aj9d-rYC7EGaPUmzkzR7lGZxgvITRPuDj5D-qv-hfgDgum5N</recordid><startdate>20211101</startdate><enddate>20211101</enddate><creator>Pinto da Costa, Joaquim F.</creator><creator>Ferreira, Fábio</creator><creator>Mascarello, Martina</creator><creator>Gaio, Rita</creator><general>Springer Singapore</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0002-3991-2715</orcidid></search><sort><creationdate>20211101</creationdate><title>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</title><author>Pinto da Costa, Joaquim F. ; Ferreira, Fábio ; Mascarello, Martina ; Gaio, Rita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2342-1c457962e984b8d24247f7864018cfc9d53c5032e2fcb29e421a8df184ad0563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer Systems Organization and Communication Networks</topic><topic>Correlation analysis</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Decomposition</topic><topic>Euclidean geometry</topic><topic>Information Systems and Communication Service</topic><topic>Longitudinal studies</topic><topic>Methodology</topic><topic>Original Research</topic><topic>Pattern Recognition and Graphics</topic><topic>Proust, Marcel (1871-1922)</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Stocks</topic><topic>Time series</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pinto da Costa, Joaquim F.</creatorcontrib><creatorcontrib>Ferreira, Fábio</creatorcontrib><creatorcontrib>Mascarello, Martina</creatorcontrib><creatorcontrib>Gaio, Rita</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>SN computer science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pinto da Costa, Joaquim F.</au><au>Ferreira, Fábio</au><au>Mascarello, Martina</au><au>Gaio, Rita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering of Longitudinal Trajectories Using Correlation-Based Distances</atitle><jtitle>SN computer science</jtitle><stitle>SN COMPUT. SCI</stitle><date>2021-11-01</date><risdate>2021</risdate><volume>2</volume><issue>6</issue><spage>432</spage><pages>432-</pages><artnum>432</artnum><issn>2662-995X</issn><eissn>2661-8907</eissn><abstract>The defining feature of a longitudinal data set is that individuals are measured repeatedly through time, giving rise to (a vector of) observations that tend to be intercorrelated. In longitudinal studies with a large number of subjects, clustering of the longitudinal trajectories and the definition of a much smaller number of mean trajectories is often of interest. Several methods have been built up to extend cluster analysis to longitudinal data. Firstly, we introduce a novel non-parametric methodology for clustering longitudinal data. The correlations between the observations from individual trajectories are taken into account by pre-defined correlation matrices with parameters that are estimated from the data. An original Mahalanobis-type distance using the above correlation matrix is considered and then a longitudinal K-Means algorithm is applied. Regarding the computation of the clustering, a much useful result is introduced which allows us to use the well known kml or kml3d (Genolini et al J Stat Softw 65(4):1–34, 2015) algorithm, avoiding thus the need for a new computer program. In fact, we show that our method with the new Mahalanobis-type distance coincides with the application of the longitudinal K-Means algorithm (kml), using the Euclidean distance, to certain transformed trajectories. This property simplifies the process for general users. Secondly, in some circumstances where it is the relative behavior of the trajectories that matter, rather than their absolute values, we propose the use of profiles before entering the algorithm. The methodology is tested on simulated data with different time behaviors and also on real data. The results are compared with those obtained from the direct application of the K-Means algorithm on the original data and on the profiled data. The new methodology produces in general better results than those obtained from the straightforward application of the longitudinal K-Means algorithm (kml) to the raw data. In addition, a comparison with a parametric model,
lcmm
(Proust-Lima et al in J Stat Softw 78(2), 2017), will also be presented.</abstract><cop>Singapore</cop><pub>Springer Singapore</pub><doi>10.1007/s42979-021-00822-2</doi><orcidid>https://orcid.org/0000-0002-3991-2715</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2662-995X |
ispartof | SN computer science, 2021-11, Vol.2 (6), p.432, Article 432 |
issn | 2662-995X 2661-8907 |
language | eng |
recordid | cdi_proquest_journals_2932345015 |
source | SpringerLink Journals - AutoHoldings; ProQuest Central |
subjects | Algorithms Cluster analysis Clustering Computer Imaging Computer Science Computer Systems Organization and Communication Networks Correlation analysis Data Structures and Information Theory Datasets Decomposition Euclidean geometry Information Systems and Communication Service Longitudinal studies Methodology Original Research Pattern Recognition and Graphics Proust, Marcel (1871-1922) Software Engineering/Programming and Operating Systems Stocks Time series Vision |
title | Clustering of Longitudinal Trajectories Using Correlation-Based Distances |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T14%3A34%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20of%20Longitudinal%20Trajectories%20Using%20Correlation-Based%20Distances&rft.jtitle=SN%20computer%20science&rft.au=Pinto%20da%20Costa,%20Joaquim%20F.&rft.date=2021-11-01&rft.volume=2&rft.issue=6&rft.spage=432&rft.pages=432-&rft.artnum=432&rft.issn=2662-995X&rft.eissn=2661-8907&rft_id=info:doi/10.1007/s42979-021-00822-2&rft_dat=%3Cproquest_cross%3E2932345015%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2932345015&rft_id=info:pmid/&rfr_iscdi=true |