Transaction aggregation as a strategy for credit card fraud detection

The problem of preprocessing transaction data for supervised fraud classification is considered. It is impractical to present an entire series of transactions to a fraud detection system, partly because of the very high dimensionality of such data but also because of the heterogeneity of the transac...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Data mining and knowledge discovery 2009-02, Vol.18 (1), p.30-55
Hauptverfasser:	Whitrow, C., Hand, D. J., Juszczak, P., Weston, D., Adams, N. M.
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Behavior Chemistry and Earth Sciences Computer Science Credit card fraud Data Mining and Knowledge Discovery Fraud prevention Information Storage and Retrieval Neural networks Online gambling Physics Statistics for Engineering
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The problem of preprocessing transaction data for supervised fraud classification is considered. It is impractical to present an entire series of transactions to a fraud detection system, partly because of the very high dimensionality of such data but also because of the heterogeneity of the transactions. Hence, a framework for transaction aggregation is considered and its effectiveness is evaluated against transaction-level detection, using a variety of classification methods and a realistic cost-based performance measure. These methods are applied in two case studies using real data. Transaction aggregation is found to be advantageous in many but not all circumstances. Also, the length of the aggregation period has a large impact upon performance. Aggregation seems particularly effective when a random forest is used for classification. Moreover, random forests were found to perform better than other classification methods, including SVMs, logistic regression and KNN. Aggregation also has the advantage of not requiring precisely labeled data and may be more robust to the effects of population drift.
ISSN:	1384-5810 1573-756X
DOI:	10.1007/s10618-008-0116-z