A practical utility-based but objective approach to model selection for regression in scientific applications
In many fields of science, various types of models are available to describe phenomena, observations and the results of experiments. In the last decades, given the enormous advances of information gathering technologies, also machine learning techniques have been systematically deployed to extract m...
Gespeichert in:
Veröffentlicht in: | The Artificial intelligence review 2023-11, Vol.56 (Suppl 2), p.2825-2859 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In many fields of science, various types of models are available to describe phenomena, observations and the results of experiments. In the last decades, given the enormous advances of information gathering technologies, also machine learning techniques have been systematically deployed to extract models from the large available databases. However, regardless of their origins, no universal criterion has been found so far to select the most appropriate model given the data. A unique solution is probably a chimera, particularly in applications involving complex systems. Consequently, in this work a utility-based approach is advocated. However, the solutions proposed are not purely subjective but all based on “objective” criteria, rooted in the properties of the data, to preserve generality and to allow comparative assessments of the results. Several methods have been developed and tested, to improve the discrimination capability of basic Bayesian and information theoretic criteria, with particular attention to the BIC (Bayesian Information Criterion) and AIC (Akaike Information Criterion) indicators. Both the quality of the fits and the evaluation of model complexity are aspects addressed by the advances proposed. The competitive advantages of the individual alternatives, for both cross sectional data and time series, are clearly identified, together with their most appropriate fields of application. The proposed improvements of the criteria allow selecting the right models more reliably, more efficiently in terms of data requirements and can be adjusted to very different circumstances and applications. Particular attention has been paid to ensure that the developed versions of the indicators are easy to implement in practice, in both confirmatory and exploratory settings. Extensive numerical tests have been performed to support the conceptual and theoretical considerations. |
---|---|
ISSN: | 0269-2821 1573-7462 |
DOI: | 10.1007/s10462-023-10591-4 |