A practical utility-based but objective approach to model selection for regression in scientific applications

In many fields of science, various types of models are available to describe phenomena, observations and the results of experiments. In the last decades, given the enormous advances of information gathering technologies, also machine learning techniques have been systematically deployed to extract m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Artificial intelligence review 2023-11, Vol.56 (Suppl 2), p.2825-2859
Hauptverfasser: Murari, Andrea, Rossi, Riccardo, Spolladore, Luca, Lungaroni, Michele, Gaudio, Pasquale, Gelfusa, Michela
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In many fields of science, various types of models are available to describe phenomena, observations and the results of experiments. In the last decades, given the enormous advances of information gathering technologies, also machine learning techniques have been systematically deployed to extract models from the large available databases. However, regardless of their origins, no universal criterion has been found so far to select the most appropriate model given the data. A unique solution is probably a chimera, particularly in applications involving complex systems. Consequently, in this work a utility-based approach is advocated. However, the solutions proposed are not purely subjective but all based on “objective” criteria, rooted in the properties of the data, to preserve generality and to allow comparative assessments of the results. Several methods have been developed and tested, to improve the discrimination capability of basic Bayesian and information theoretic criteria, with particular attention to the BIC (Bayesian Information Criterion) and AIC (Akaike Information Criterion) indicators. Both the quality of the fits and the evaluation of model complexity are aspects addressed by the advances proposed. The competitive advantages of the individual alternatives, for both cross sectional data and time series, are clearly identified, together with their most appropriate fields of application. The proposed improvements of the criteria allow selecting the right models more reliably, more efficiently in terms of data requirements and can be adjusted to very different circumstances and applications. Particular attention has been paid to ensure that the developed versions of the indicators are easy to implement in practice, in both confirmatory and exploratory settings. Extensive numerical tests have been performed to support the conceptual and theoretical considerations.
ISSN:0269-2821
1573-7462
DOI:10.1007/s10462-023-10591-4