Accurately Classifying Out-Of-Distribution Data in Facial Recognition

Standard classification theory assumes that the distribution of images in the test and training sets are identical. Unfortunately, real-life scenarios typically feature unseen data (``out-of-distribution data") which is different from data in the training distribution (``in-distribution")....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-10
Hauptverfasser: Barone, Gianluca, Cunchala, Aashrit, Nunez, Rudy
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Barone, Gianluca
Cunchala, Aashrit
Nunez, Rudy
description Standard classification theory assumes that the distribution of images in the test and training sets are identical. Unfortunately, real-life scenarios typically feature unseen data (``out-of-distribution data") which is different from data in the training distribution (``in-distribution"). This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data. This may result in a model returning confidently wrong decisions and predictions. We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data? We approach this problem by incorporating the Outlier Exposure model and investigate how the model's performance changes when other datasets of facial images were implemented. We observe that the accuracy and other metrics of the model can be increased by applying Outlier Exposure, incorporating a trainable weight parameter to increase the machine's emphasis on outlier images, and by re-weighting the importance of different class labels. We also experimented with whether sorting the images and determining outliers via image features would have more of an effect on the metrics than sorting by average pixel value, and found no conclusive results. Our goal was to make models not only more accurate but also more fair by scanning a more expanded range of images. Utilizing Python and the Pytorch package, we found models utilizing outlier exposure could result in more fair classification.
doi_str_mv 10.48550/arxiv.2404.03876
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2404_03876</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3034544804</sourcerecordid><originalsourceid>FETCH-LOGICAL-a524-fba9885aef21ffd2efe5cecff0e79a6773877e394831420f666ad4afc3b5cfbb3</originalsourceid><addsrcrecordid>eNotj8tqwzAUREWh0JDmA7qqoGu7sl6Wl8FJ2kLAULI314puUHDtVLJL_ffNo6tZzDCcQ8hTxlJplGKvEH79T8olkykTJtd3ZMaFyBIjOX8gixiPjDGuc66UmJH10toxwODaiZYtxOhx8t2BVuOQVJisfByCb8bB9x1dwQDUd3QD1kNLP53tD52_VI_kHqGNbvGfc7LbrHfle7Kt3j7K5TYBxWWCDRTGKHDIM8Q9d-iUdRaRubwAnedn3NyJQhqRSc5Qaw17CWhFoyw2jZiT59vt1bE-Bf8FYaovrvXV9bx4uS1Oof8eXRzqYz-G7sxUCyakktIwKf4AYRRXuA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3034544804</pqid></control><display><type>article</type><title>Accurately Classifying Out-Of-Distribution Data in Facial Recognition</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Barone, Gianluca ; Cunchala, Aashrit ; Nunez, Rudy</creator><creatorcontrib>Barone, Gianluca ; Cunchala, Aashrit ; Nunez, Rudy</creatorcontrib><description>Standard classification theory assumes that the distribution of images in the test and training sets are identical. Unfortunately, real-life scenarios typically feature unseen data (``out-of-distribution data") which is different from data in the training distribution (``in-distribution"). This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data. This may result in a model returning confidently wrong decisions and predictions. We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data? We approach this problem by incorporating the Outlier Exposure model and investigate how the model's performance changes when other datasets of facial images were implemented. We observe that the accuracy and other metrics of the model can be increased by applying Outlier Exposure, incorporating a trainable weight parameter to increase the machine's emphasis on outlier images, and by re-weighting the importance of different class labels. We also experimented with whether sorting the images and determining outliers via image features would have more of an effect on the metrics than sorting by average pixel value, and found no conclusive results. Our goal was to make models not only more accurate but also more fair by scanning a more expanded range of images. Utilizing Python and the Pytorch package, we found models utilizing outlier exposure could result in more fair classification.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2404.03876</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Computers and Society ; Computer Science - Learning ; Datasets ; Face recognition ; Model accuracy ; Neural networks ; Outliers (statistics)</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,781,882,27906</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.03876$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1137/24S1649848$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Barone, Gianluca</creatorcontrib><creatorcontrib>Cunchala, Aashrit</creatorcontrib><creatorcontrib>Nunez, Rudy</creatorcontrib><title>Accurately Classifying Out-Of-Distribution Data in Facial Recognition</title><title>arXiv.org</title><description>Standard classification theory assumes that the distribution of images in the test and training sets are identical. Unfortunately, real-life scenarios typically feature unseen data (``out-of-distribution data") which is different from data in the training distribution (``in-distribution"). This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data. This may result in a model returning confidently wrong decisions and predictions. We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data? We approach this problem by incorporating the Outlier Exposure model and investigate how the model's performance changes when other datasets of facial images were implemented. We observe that the accuracy and other metrics of the model can be increased by applying Outlier Exposure, incorporating a trainable weight parameter to increase the machine's emphasis on outlier images, and by re-weighting the importance of different class labels. We also experimented with whether sorting the images and determining outliers via image features would have more of an effect on the metrics than sorting by average pixel value, and found no conclusive results. Our goal was to make models not only more accurate but also more fair by scanning a more expanded range of images. Utilizing Python and the Pytorch package, we found models utilizing outlier exposure could result in more fair classification.</description><subject>Classification</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Computers and Society</subject><subject>Computer Science - Learning</subject><subject>Datasets</subject><subject>Face recognition</subject><subject>Model accuracy</subject><subject>Neural networks</subject><subject>Outliers (statistics)</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj8tqwzAUREWh0JDmA7qqoGu7sl6Wl8FJ2kLAULI314puUHDtVLJL_ffNo6tZzDCcQ8hTxlJplGKvEH79T8olkykTJtd3ZMaFyBIjOX8gixiPjDGuc66UmJH10toxwODaiZYtxOhx8t2BVuOQVJisfByCb8bB9x1dwQDUd3QD1kNLP53tD52_VI_kHqGNbvGfc7LbrHfle7Kt3j7K5TYBxWWCDRTGKHDIM8Q9d-iUdRaRubwAnedn3NyJQhqRSc5Qaw17CWhFoyw2jZiT59vt1bE-Bf8FYaovrvXV9bx4uS1Oof8eXRzqYz-G7sxUCyakktIwKf4AYRRXuA</recordid><startdate>20241011</startdate><enddate>20241011</enddate><creator>Barone, Gianluca</creator><creator>Cunchala, Aashrit</creator><creator>Nunez, Rudy</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241011</creationdate><title>Accurately Classifying Out-Of-Distribution Data in Facial Recognition</title><author>Barone, Gianluca ; Cunchala, Aashrit ; Nunez, Rudy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a524-fba9885aef21ffd2efe5cecff0e79a6773877e394831420f666ad4afc3b5cfbb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Computers and Society</topic><topic>Computer Science - Learning</topic><topic>Datasets</topic><topic>Face recognition</topic><topic>Model accuracy</topic><topic>Neural networks</topic><topic>Outliers (statistics)</topic><toplevel>online_resources</toplevel><creatorcontrib>Barone, Gianluca</creatorcontrib><creatorcontrib>Cunchala, Aashrit</creatorcontrib><creatorcontrib>Nunez, Rudy</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Barone, Gianluca</au><au>Cunchala, Aashrit</au><au>Nunez, Rudy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accurately Classifying Out-Of-Distribution Data in Facial Recognition</atitle><jtitle>arXiv.org</jtitle><date>2024-10-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Standard classification theory assumes that the distribution of images in the test and training sets are identical. Unfortunately, real-life scenarios typically feature unseen data (``out-of-distribution data") which is different from data in the training distribution (``in-distribution"). This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data. This may result in a model returning confidently wrong decisions and predictions. We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data? We approach this problem by incorporating the Outlier Exposure model and investigate how the model's performance changes when other datasets of facial images were implemented. We observe that the accuracy and other metrics of the model can be increased by applying Outlier Exposure, incorporating a trainable weight parameter to increase the machine's emphasis on outlier images, and by re-weighting the importance of different class labels. We also experimented with whether sorting the images and determining outliers via image features would have more of an effect on the metrics than sorting by average pixel value, and found no conclusive results. Our goal was to make models not only more accurate but also more fair by scanning a more expanded range of images. Utilizing Python and the Pytorch package, we found models utilizing outlier exposure could result in more fair classification.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2404.03876</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-10
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2404_03876
source arXiv.org; Free E- Journals
subjects Classification
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Computers and Society
Computer Science - Learning
Datasets
Face recognition
Model accuracy
Neural networks
Outliers (statistics)
title Accurately Classifying Out-Of-Distribution Data in Facial Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T01%3A24%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accurately%20Classifying%20Out-Of-Distribution%20Data%20in%20Facial%20Recognition&rft.jtitle=arXiv.org&rft.au=Barone,%20Gianluca&rft.date=2024-10-11&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2404.03876&rft_dat=%3Cproquest_arxiv%3E3034544804%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3034544804&rft_id=info:pmid/&rfr_iscdi=true