An investigation on the factors affecting machine learning classifications in \(\gamma\)-ray astronomy

We have investigated a number of factors that can have significant impacts on the classification performance of \(\gamma\)-ray sources detected by Fermi Large Area Telescope (LAT) with machine learning techniques. We show that a framework of automatic feature selection can construct a simple model w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2020-01
Hauptverfasser: Luo, Shengda, Leung, Alex P, Hui, C Y, Li, K L
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Luo, Shengda
Leung, Alex P
Hui, C Y
Li, K L
description We have investigated a number of factors that can have significant impacts on the classification performance of \(\gamma\)-ray sources detected by Fermi Large Area Telescope (LAT) with machine learning techniques. We show that a framework of automatic feature selection can construct a simple model with a small set of features which yields better performance over previous results. Secondly, because of the small sample size of the training/test sets of certain classes in \(\gamma\)-ray, nested re-sampling and cross-validations are suggested for quantifying the statistical fluctuations of the quoted accuracy. We have also constructed a test set by cross-matching the identified active galactic nuclei (AGNs) and the pulsars (PSRs) in the Fermi LAT eight-year point source catalog (4FGL) with those unidentified sources in the previous 3\(^{\rm rd}\) Fermi LAT Source Catalog (3FGL). Using this cross-matched set, we show that some features used for building classification model with the identified source can suffer from the problem of covariate shift, which can be a result of various observational effects. This can possibly hamper the actual performance when one applies such model in classifying unidentified sources. Using our framework, both AGN/PSR and young pulsar (YNG)/millisecond pulsar (MSP) classifiers are automatically updated with the new features and the enlarged training samples in 4FGL catalog incorporated. Using a two-layer model with these updated classifiers, we have selected 20 promising MSP candidates with confidence scores \(>98\%\) from the unidentified sources in 4FGL catalog which can provide inputs for a multi-wavelength identification campaign.
doi_str_mv 10.48550/arxiv.2001.04081
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2337685677</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2337685677</sourcerecordid><originalsourceid>FETCH-proquest_journals_23376856773</originalsourceid><addsrcrecordid>eNqNTsGKwjAUDIKgrH7A3gJe3EO7r2nT9irish-wx4I8QlIjbaJ5UfTvjeIHLAwMzMx7M4x9FpBXrZTwjeFmr7kAKHKooC0mbC7KssjaSogZWxIdAUDUjZCynDOzcdy6q6Zoe4zWO54QD5obVNEH4miMVtG6no-oDtZpPmgM7imoAYmssep1SOkP79Zdj-OI3VcW8M6RYvDOj_cFmxocSC_f_MFWP7u_7W92Cv58Se37o78El6x9GtvUraybpvxf6gGz903M</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2337685677</pqid></control><display><type>article</type><title>An investigation on the factors affecting machine learning classifications in \(\gamma\)-ray astronomy</title><source>Freely Accessible Journals</source><creator>Luo, Shengda ; Leung, Alex P ; Hui, C Y ; Li, K L</creator><creatorcontrib>Luo, Shengda ; Leung, Alex P ; Hui, C Y ; Li, K L</creatorcontrib><description>We have investigated a number of factors that can have significant impacts on the classification performance of \(\gamma\)-ray sources detected by Fermi Large Area Telescope (LAT) with machine learning techniques. We show that a framework of automatic feature selection can construct a simple model with a small set of features which yields better performance over previous results. Secondly, because of the small sample size of the training/test sets of certain classes in \(\gamma\)-ray, nested re-sampling and cross-validations are suggested for quantifying the statistical fluctuations of the quoted accuracy. We have also constructed a test set by cross-matching the identified active galactic nuclei (AGNs) and the pulsars (PSRs) in the Fermi LAT eight-year point source catalog (4FGL) with those unidentified sources in the previous 3\(^{\rm rd}\) Fermi LAT Source Catalog (3FGL). Using this cross-matched set, we show that some features used for building classification model with the identified source can suffer from the problem of covariate shift, which can be a result of various observational effects. This can possibly hamper the actual performance when one applies such model in classifying unidentified sources. Using our framework, both AGN/PSR and young pulsar (YNG)/millisecond pulsar (MSP) classifiers are automatically updated with the new features and the enlarged training samples in 4FGL catalog incorporated. Using a two-layer model with these updated classifiers, we have selected 20 promising MSP candidates with confidence scores \(&gt;98\%\) from the unidentified sources in 4FGL catalog which can provide inputs for a multi-wavelength identification campaign.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2001.04081</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Active galactic nuclei ; Astronomy ; Classification ; Classifiers ; Confidence ; Gamma rays ; Machine learning ; Millisecond pulsars ; Statistical methods ; Test sets ; Training ; Variation</subject><ispartof>arXiv.org, 2020-01</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784,27923</link.rule.ids></links><search><creatorcontrib>Luo, Shengda</creatorcontrib><creatorcontrib>Leung, Alex P</creatorcontrib><creatorcontrib>Hui, C Y</creatorcontrib><creatorcontrib>Li, K L</creatorcontrib><title>An investigation on the factors affecting machine learning classifications in \(\gamma\)-ray astronomy</title><title>arXiv.org</title><description>We have investigated a number of factors that can have significant impacts on the classification performance of \(\gamma\)-ray sources detected by Fermi Large Area Telescope (LAT) with machine learning techniques. We show that a framework of automatic feature selection can construct a simple model with a small set of features which yields better performance over previous results. Secondly, because of the small sample size of the training/test sets of certain classes in \(\gamma\)-ray, nested re-sampling and cross-validations are suggested for quantifying the statistical fluctuations of the quoted accuracy. We have also constructed a test set by cross-matching the identified active galactic nuclei (AGNs) and the pulsars (PSRs) in the Fermi LAT eight-year point source catalog (4FGL) with those unidentified sources in the previous 3\(^{\rm rd}\) Fermi LAT Source Catalog (3FGL). Using this cross-matched set, we show that some features used for building classification model with the identified source can suffer from the problem of covariate shift, which can be a result of various observational effects. This can possibly hamper the actual performance when one applies such model in classifying unidentified sources. Using our framework, both AGN/PSR and young pulsar (YNG)/millisecond pulsar (MSP) classifiers are automatically updated with the new features and the enlarged training samples in 4FGL catalog incorporated. Using a two-layer model with these updated classifiers, we have selected 20 promising MSP candidates with confidence scores \(&gt;98\%\) from the unidentified sources in 4FGL catalog which can provide inputs for a multi-wavelength identification campaign.</description><subject>Active galactic nuclei</subject><subject>Astronomy</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Confidence</subject><subject>Gamma rays</subject><subject>Machine learning</subject><subject>Millisecond pulsars</subject><subject>Statistical methods</subject><subject>Test sets</subject><subject>Training</subject><subject>Variation</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNTsGKwjAUDIKgrH7A3gJe3EO7r2nT9irish-wx4I8QlIjbaJ5UfTvjeIHLAwMzMx7M4x9FpBXrZTwjeFmr7kAKHKooC0mbC7KssjaSogZWxIdAUDUjZCynDOzcdy6q6Zoe4zWO54QD5obVNEH4miMVtG6no-oDtZpPmgM7imoAYmssep1SOkP79Zdj-OI3VcW8M6RYvDOj_cFmxocSC_f_MFWP7u_7W92Cv58Se37o78El6x9GtvUraybpvxf6gGz903M</recordid><startdate>20200115</startdate><enddate>20200115</enddate><creator>Luo, Shengda</creator><creator>Leung, Alex P</creator><creator>Hui, C Y</creator><creator>Li, K L</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200115</creationdate><title>An investigation on the factors affecting machine learning classifications in \(\gamma\)-ray astronomy</title><author>Luo, Shengda ; Leung, Alex P ; Hui, C Y ; Li, K L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_23376856773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Active galactic nuclei</topic><topic>Astronomy</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Confidence</topic><topic>Gamma rays</topic><topic>Machine learning</topic><topic>Millisecond pulsars</topic><topic>Statistical methods</topic><topic>Test sets</topic><topic>Training</topic><topic>Variation</topic><toplevel>online_resources</toplevel><creatorcontrib>Luo, Shengda</creatorcontrib><creatorcontrib>Leung, Alex P</creatorcontrib><creatorcontrib>Hui, C Y</creatorcontrib><creatorcontrib>Li, K L</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Luo, Shengda</au><au>Leung, Alex P</au><au>Hui, C Y</au><au>Li, K L</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>An investigation on the factors affecting machine learning classifications in \(\gamma\)-ray astronomy</atitle><jtitle>arXiv.org</jtitle><date>2020-01-15</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>We have investigated a number of factors that can have significant impacts on the classification performance of \(\gamma\)-ray sources detected by Fermi Large Area Telescope (LAT) with machine learning techniques. We show that a framework of automatic feature selection can construct a simple model with a small set of features which yields better performance over previous results. Secondly, because of the small sample size of the training/test sets of certain classes in \(\gamma\)-ray, nested re-sampling and cross-validations are suggested for quantifying the statistical fluctuations of the quoted accuracy. We have also constructed a test set by cross-matching the identified active galactic nuclei (AGNs) and the pulsars (PSRs) in the Fermi LAT eight-year point source catalog (4FGL) with those unidentified sources in the previous 3\(^{\rm rd}\) Fermi LAT Source Catalog (3FGL). Using this cross-matched set, we show that some features used for building classification model with the identified source can suffer from the problem of covariate shift, which can be a result of various observational effects. This can possibly hamper the actual performance when one applies such model in classifying unidentified sources. Using our framework, both AGN/PSR and young pulsar (YNG)/millisecond pulsar (MSP) classifiers are automatically updated with the new features and the enlarged training samples in 4FGL catalog incorporated. Using a two-layer model with these updated classifiers, we have selected 20 promising MSP candidates with confidence scores \(&gt;98\%\) from the unidentified sources in 4FGL catalog which can provide inputs for a multi-wavelength identification campaign.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2001.04081</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2020-01
issn 2331-8422
language eng
recordid cdi_proquest_journals_2337685677
source Freely Accessible Journals
subjects Active galactic nuclei
Astronomy
Classification
Classifiers
Confidence
Gamma rays
Machine learning
Millisecond pulsars
Statistical methods
Test sets
Training
Variation
title An investigation on the factors affecting machine learning classifications in \(\gamma\)-ray astronomy
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T19%3A13%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=An%20investigation%20on%20the%20factors%20affecting%20machine%20learning%20classifications%20in%20%5C(%5Cgamma%5C)-ray%20astronomy&rft.jtitle=arXiv.org&rft.au=Luo,%20Shengda&rft.date=2020-01-15&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2001.04081&rft_dat=%3Cproquest%3E2337685677%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2337685677&rft_id=info:pmid/&rfr_iscdi=true