$An investigation on the factors affecting machine learning classifications in $\gamma$-ray astronomy$

An investigation on the factors affecting machine learning classifications in $\gamma$-ray astronomy

We have investigated a number of factors that can have significant impacts on the classification performance of $\gamma$-ray sources detected by Fermi Large Area Telescope (LAT) with machine learning techniques. We show that a framework of automatic feature selection can construct a simple model w...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-01
Hauptverfasser:	Luo, Shengda, Leung, Alex P, Hui, C Y, Li, K L
Format:	Artikel
Sprache:	eng
Schlagworte:	Active galactic nuclei Astronomy Classification Classifiers Confidence Gamma rays Machine learning Millisecond pulsars Statistical methods Test sets Training Variation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We have investigated a number of factors that can have significant impacts on the classification performance of $\gamma$-ray sources detected by Fermi Large Area Telescope (LAT) with machine learning techniques. We show that a framework of automatic feature selection can construct a simple model with a small set of features which yields better performance over previous results. Secondly, because of the small sample size of the training/test sets of certain classes in $\gamma$-ray, nested re-sampling and cross-validations are suggested for quantifying the statistical fluctuations of the quoted accuracy. We have also constructed a test set by cross-matching the identified active galactic nuclei (AGNs) and the pulsars (PSRs) in the Fermi LAT eight-year point source catalog (4FGL) with those unidentified sources in the previous 3$^{\rm rd}$ Fermi LAT Source Catalog (3FGL). Using this cross-matched set, we show that some features used for building classification model with the identified source can suffer from the problem of covariate shift, which can be a result of various observational effects. This can possibly hamper the actual performance when one applies such model in classifying unidentified sources. Using our framework, both AGN/PSR and young pulsar (YNG)/millisecond pulsar (MSP) classifiers are automatically updated with the new features and the enlarged training samples in 4FGL catalog incorporated. Using a two-layer model with these updated classifiers, we have selected 20 promising MSP candidates with confidence scores $>98\%$ from the unidentified sources in 4FGL catalog which can provide inputs for a multi-wavelength identification campaign.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2001.04081

An investigation on the factors affecting machine learning classifications in \(\gamma\)-ray astronomy