Incorporating Arbitrary Matrix Group Equivariance into KANs
Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains thanks to spline activation functions, becoming an alternative to Multi-Layer Perceptrons (MLPs). However, spline functions may not respect symmetry in tasks, which is crucial prior knowledge in machine learning. Previou...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Kolmogorov-Arnold Networks (KANs) have seen great success in scientific
domains thanks to spline activation functions, becoming an alternative to
Multi-Layer Perceptrons (MLPs). However, spline functions may not respect
symmetry in tasks, which is crucial prior knowledge in machine learning.
Previously, equivariant networks embed symmetry into their architectures,
achieving better performance in specific applications. Among these, Equivariant
Multi-Layer Perceptrons (EMLP) introduce arbitrary matrix group equivariance
into MLPs, providing a general framework for constructing equivariant networks
layer by layer. In this paper, we propose Equivariant Kolmogorov-Arnold
Networks (EKAN), a method for incorporating matrix group equivariance into
KANs, aiming to broaden their applicability to more fields. First, we construct
gated spline basis functions, which form the EKAN layer together with
equivariant linear weights. We then define a lift layer to align the input
space of EKAN with the feature space of the dataset, thereby building the
entire EKAN architecture. Compared with baseline models, EKAN achieves higher
accuracy with smaller datasets or fewer parameters on symmetry-related tasks,
such as particle scattering and the three-body problem, often reducing test MSE
by several orders of magnitude. Even in non-symbolic formula scenarios, such as
top quark tagging with three jet constituents, EKAN achieves comparable results
with EMLP using only $26\%$ of the parameters, while KANs do not outperform
MLPs as expected. |
---|---|
DOI: | 10.48550/arxiv.2410.00435 |