Clusterwise analysis for multiblock component methods

Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the informatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Advances in data analysis and classification 2018-06, Vol.12 (2), p.285-313
Hauptverfasser: Bougeard, Stéphanie, Abdi, Hervé, Saporta, Gilbert, Niang, Ndèye
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 313
container_issue 2
container_start_page 285
container_title Advances in data analysis and classification
container_volume 12
creator Bougeard, Stéphanie
Abdi, Hervé
Saporta, Gilbert
Niang, Ndèye
description Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.
doi_str_mv 10.1007/s11634-017-0296-8
format Article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_02470765v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2071144911</sourcerecordid><originalsourceid>FETCH-LOGICAL-c436t-dd82d87d70484a4904d4eb0c2174e4d2aa038180b182ca9dec03518bb46819b43</originalsourceid><addsrcrecordid>eNp1kMFKw0AQhhdRsFYfwFvAk4fozGaS3RxLUSsUvOh52WS3NjXJ1t1E6dubEKknTzMM3__DfIxdI9whgLgPiFlCMaCIgedZLE_YDGXG4zRJ09PjTuKcXYSwA8iAIJ2xdFn3obP-uwo20q2uD6EK0cb5qOnrripqV35EpWv2rrVtFzW22zoTLtnZRtfBXv3OOXt7fHhdruL1y9PzcrGOS0qyLjZGciOFEUCSNOVAhmwBJUdBlgzXGhKJEgqUvNS5sSUkKcqioExiXlAyZ7dT71bXau-rRvuDcrpSq8VajTfgJEBk6RcO7M3E7r377G3o1M71fvgoKA4CkSjHkcKJKr0LwdvNsRZBjSbVZFINJtVoUskhw6dMGNj23fq_5v9DPwymdE4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2071144911</pqid></control><display><type>article</type><title>Clusterwise analysis for multiblock component methods</title><source>SpringerLink Journals - AutoHoldings</source><creator>Bougeard, Stéphanie ; Abdi, Hervé ; Saporta, Gilbert ; Niang, Ndèye</creator><creatorcontrib>Bougeard, Stéphanie ; Abdi, Hervé ; Saporta, Gilbert ; Niang, Ndèye</creatorcontrib><description>Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.</description><identifier>ISSN: 1862-5347</identifier><identifier>EISSN: 1862-5355</identifier><identifier>DOI: 10.1007/s11634-017-0296-8</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Chemistry and Earth Sciences ; Clustering ; Computer Science ; Computer simulation ; Data Mining and Knowledge Discovery ; Dependent variables ; Economics ; Finance ; Health Sciences ; Humanities ; Identification methods ; Insurance ; Law ; Management ; Mathematical models ; Mathematics and Statistics ; Medicine ; Methodology ; Physics ; Redundancy ; Regression analysis ; Regression coefficients ; Regular Article ; Statistical Theory and Methods ; Statistics ; Statistics for Business ; Statistics for Engineering ; Statistics for Life Sciences ; Statistics for Social Sciences</subject><ispartof>Advances in data analysis and classification, 2018-06, Vol.12 (2), p.285-313</ispartof><rights>Springer-Verlag GmbH Germany 2017</rights><rights>Advances in Data Analysis and Classification is a copyright of Springer, (2017). All Rights Reserved.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c436t-dd82d87d70484a4904d4eb0c2174e4d2aa038180b182ca9dec03518bb46819b43</citedby><cites>FETCH-LOGICAL-c436t-dd82d87d70484a4904d4eb0c2174e4d2aa038180b182ca9dec03518bb46819b43</cites><orcidid>0000-0002-3406-5887 ; 0009-0002-4729-5355 ; 0000-0002-6109-9935</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11634-017-0296-8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11634-017-0296-8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,776,780,881,27901,27902,41464,42533,51294</link.rule.ids><backlink>$$Uhttps://cnam.hal.science/hal-02470765$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Bougeard, Stéphanie</creatorcontrib><creatorcontrib>Abdi, Hervé</creatorcontrib><creatorcontrib>Saporta, Gilbert</creatorcontrib><creatorcontrib>Niang, Ndèye</creatorcontrib><title>Clusterwise analysis for multiblock component methods</title><title>Advances in data analysis and classification</title><addtitle>Adv Data Anal Classif</addtitle><description>Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.</description><subject>Algorithms</subject><subject>Chemistry and Earth Sciences</subject><subject>Clustering</subject><subject>Computer Science</subject><subject>Computer simulation</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Dependent variables</subject><subject>Economics</subject><subject>Finance</subject><subject>Health Sciences</subject><subject>Humanities</subject><subject>Identification methods</subject><subject>Insurance</subject><subject>Law</subject><subject>Management</subject><subject>Mathematical models</subject><subject>Mathematics and Statistics</subject><subject>Medicine</subject><subject>Methodology</subject><subject>Physics</subject><subject>Redundancy</subject><subject>Regression analysis</subject><subject>Regression coefficients</subject><subject>Regular Article</subject><subject>Statistical Theory and Methods</subject><subject>Statistics</subject><subject>Statistics for Business</subject><subject>Statistics for Engineering</subject><subject>Statistics for Life Sciences</subject><subject>Statistics for Social Sciences</subject><issn>1862-5347</issn><issn>1862-5355</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp1kMFKw0AQhhdRsFYfwFvAk4fozGaS3RxLUSsUvOh52WS3NjXJ1t1E6dubEKknTzMM3__DfIxdI9whgLgPiFlCMaCIgedZLE_YDGXG4zRJ09PjTuKcXYSwA8iAIJ2xdFn3obP-uwo20q2uD6EK0cb5qOnrripqV35EpWv2rrVtFzW22zoTLtnZRtfBXv3OOXt7fHhdruL1y9PzcrGOS0qyLjZGciOFEUCSNOVAhmwBJUdBlgzXGhKJEgqUvNS5sSUkKcqioExiXlAyZ7dT71bXau-rRvuDcrpSq8VajTfgJEBk6RcO7M3E7r377G3o1M71fvgoKA4CkSjHkcKJKr0LwdvNsRZBjSbVZFINJtVoUskhw6dMGNj23fq_5v9DPwymdE4</recordid><startdate>20180601</startdate><enddate>20180601</enddate><creator>Bougeard, Stéphanie</creator><creator>Abdi, Hervé</creator><creator>Saporta, Gilbert</creator><creator>Niang, Ndèye</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><general>Springer Verlag</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-3406-5887</orcidid><orcidid>https://orcid.org/0009-0002-4729-5355</orcidid><orcidid>https://orcid.org/0000-0002-6109-9935</orcidid></search><sort><creationdate>20180601</creationdate><title>Clusterwise analysis for multiblock component methods</title><author>Bougeard, Stéphanie ; Abdi, Hervé ; Saporta, Gilbert ; Niang, Ndèye</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c436t-dd82d87d70484a4904d4eb0c2174e4d2aa038180b182ca9dec03518bb46819b43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Chemistry and Earth Sciences</topic><topic>Clustering</topic><topic>Computer Science</topic><topic>Computer simulation</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Dependent variables</topic><topic>Economics</topic><topic>Finance</topic><topic>Health Sciences</topic><topic>Humanities</topic><topic>Identification methods</topic><topic>Insurance</topic><topic>Law</topic><topic>Management</topic><topic>Mathematical models</topic><topic>Mathematics and Statistics</topic><topic>Medicine</topic><topic>Methodology</topic><topic>Physics</topic><topic>Redundancy</topic><topic>Regression analysis</topic><topic>Regression coefficients</topic><topic>Regular Article</topic><topic>Statistical Theory and Methods</topic><topic>Statistics</topic><topic>Statistics for Business</topic><topic>Statistics for Engineering</topic><topic>Statistics for Life Sciences</topic><topic>Statistics for Social Sciences</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bougeard, Stéphanie</creatorcontrib><creatorcontrib>Abdi, Hervé</creatorcontrib><creatorcontrib>Saporta, Gilbert</creatorcontrib><creatorcontrib>Niang, Ndèye</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Advances in data analysis and classification</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bougeard, Stéphanie</au><au>Abdi, Hervé</au><au>Saporta, Gilbert</au><au>Niang, Ndèye</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clusterwise analysis for multiblock component methods</atitle><jtitle>Advances in data analysis and classification</jtitle><stitle>Adv Data Anal Classif</stitle><date>2018-06-01</date><risdate>2018</risdate><volume>12</volume><issue>2</issue><spage>285</spage><epage>313</epage><pages>285-313</pages><issn>1862-5347</issn><eissn>1862-5355</eissn><abstract>Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s11634-017-0296-8</doi><tpages>29</tpages><orcidid>https://orcid.org/0000-0002-3406-5887</orcidid><orcidid>https://orcid.org/0009-0002-4729-5355</orcidid><orcidid>https://orcid.org/0000-0002-6109-9935</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1862-5347
ispartof Advances in data analysis and classification, 2018-06, Vol.12 (2), p.285-313
issn 1862-5347
1862-5355
language eng
recordid cdi_hal_primary_oai_HAL_hal_02470765v1
source SpringerLink Journals - AutoHoldings
subjects Algorithms
Chemistry and Earth Sciences
Clustering
Computer Science
Computer simulation
Data Mining and Knowledge Discovery
Dependent variables
Economics
Finance
Health Sciences
Humanities
Identification methods
Insurance
Law
Management
Mathematical models
Mathematics and Statistics
Medicine
Methodology
Physics
Redundancy
Regression analysis
Regression coefficients
Regular Article
Statistical Theory and Methods
Statistics
Statistics for Business
Statistics for Engineering
Statistics for Life Sciences
Statistics for Social Sciences
title Clusterwise analysis for multiblock component methods
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T13%3A11%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clusterwise%20analysis%20for%20multiblock%20component%20methods&rft.jtitle=Advances%20in%20data%20analysis%20and%20classification&rft.au=Bougeard,%20St%C3%A9phanie&rft.date=2018-06-01&rft.volume=12&rft.issue=2&rft.spage=285&rft.epage=313&rft.pages=285-313&rft.issn=1862-5347&rft.eissn=1862-5355&rft_id=info:doi/10.1007/s11634-017-0296-8&rft_dat=%3Cproquest_hal_p%3E2071144911%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2071144911&rft_id=info:pmid/&rfr_iscdi=true