Cost-sensitive selection of variables by ensemble of model sequences

Many applications require the collection of data on different variables or measurements over many system performance metrics. We term those broadly as measures or variables. Often data collection along each measure incurs a cost, thus it is desirable to consider the cost of measures in modeling. Thi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge and information systems 2021-05, Vol.63 (5), p.1069-1092
Hauptverfasser:	Yan, Donghui, Qin, Zhiwei, Gu, Songxiang, Xu, Haiping, Shao, Ming
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Budgets Computer Science Data collection Data Mining and Knowledge Discovery Database Management Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Model accuracy Performance measurement Regular Paper Schedules Solution space
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Many applications require the collection of data on different variables or measurements over many system performance metrics. We term those broadly as measures or variables. Often data collection along each measure incurs a cost, thus it is desirable to consider the cost of measures in modeling. This is a fairly new class of problems in the area of cost-sensitive learning. A few attempts have been made to incorporate costs in combining and selecting measures. However, existing studies either do not strictly enforce a budget constraint, or are not the ‘most’ cost effective. With a focus on classification problems, we propose a computationally efficient approach that could find a near optimal model under a given budget by exploring the most ‘promising’ part of the solution space. Instead of outputting a single model, we produce a model schedule —a list of models, sorted by model costs and expected predictive accuracy. This could be used to choose the model with the best predictive accuracy under a given budget, or to trade off between the budget and the predictive accuracy. Experiments on some benchmark datasets show that our approach compares favorably to competing methods.
ISSN:	0219-1377 0219-3116
DOI:	10.1007/s10115-021-01551-x