Skeleton estimation of directed acyclic graphs using partial least squares from correlated data

•We proposed a two-stage approach for Directed acyclic graph (DAG) skeleton estimation with highly correlated variables.•The neighborhood selection stage relies on a sparse adaptive partial least squares (PLS) regression combined with a novel cluster-weighted adaptive penalty on the PLS weight vecto...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2023-07, Vol.139, p.109460, Article 109460
Hauptverfasser: Wang, Xiaokang, Lu, Shan, Zhou, Rui, Wang, Huiwen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We proposed a two-stage approach for Directed acyclic graph (DAG) skeleton estimation with highly correlated variables.•The neighborhood selection stage relies on a sparse adaptive partial least squares (PLS) regression combined with a novel cluster-weighted adaptive penalty on the PLS weight vectors.•The proposed algorithm is most competitive on the dense hub network structure with multiple clusters. Directed acyclic graphs (DAGs) are directed graphical models that are well known for discovering causal relationships between variables in a high-dimensional setting. When the DAG is not identifiable due to the lack of interventional data, the skeleton can be estimated using observational data, which is formed by removing the direction of the edges in a DAG. In real data analyses, variables are often highly correlated due to some form of clustered sampling, and ignoring this correlation will inflate the standard errors of the parameter estimates in the regression-based DAG structure learning framework. In this work, we propose a two-stage DAG skeleton estimation approach for highly correlated data. First, we propose a novel neighborhood selection method based on sparse partial least squares (PLS) regression, and a cluster-weighted adaptive penalty is imposed on the PLS weight vectors to exploit the local information. In the second stage, the DAG skeleton is estimated by evaluating a set of conditional independence hypotheses. Simulation studies are presented to demonstrate the effectiveness of the proposed method. The algorithm is also tested on publicly available datasets, and we show that our algorithm obtains higher sensitivity with comparable false discovery rates for high-dimensional data under different network structures.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2023.109460