A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science
Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an acces...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-10 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Bischoff, Sebastian Darcher, Alana Deistler, Michael Gao, Richard Gerken, Franziska Gloeckler, Manuel Haxel, Lisa Kapoor, Jaivardhan Lappalainen, Janne K Macke, Jakob H Moss, Guy Pals, Matthijs Pei, Felix Rapp, Rachel A Erdem Sağtekin Schröder, Cornelius Schulz, Auguste Stefanidi, Zinovia Toyota, Shoji Ulmer, Linda Vetter, Julius |
description | Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular sample-based statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fréchet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2969147419</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2969147419</sourcerecordid><originalsourceid>FETCH-proquest_journals_29691474193</originalsourceid><addsrcrecordid>eNqNit0KgjAYQEcQJOU7fNC1oPMvL6OsboLA7mXNz5iszbbp8zeoB-jqHDhnQQKapkm0yyhdkdDaIY5jWpQ0z9OA8D3cDONOcCbhPIkOwWlo2GuUGD2YxQ4ax5yw3-PohSmOFnptoJ6ZnHxUTzijQuN1RrjqDqUFoaDhAv28IcueSYvhj2uyPdX3wyUajX5PaF076Mkon1paFVWSlVlSpf9dH2yKRXw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2969147419</pqid></control><display><type>article</type><title>A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science</title><source>Free E- Journals</source><creator>Bischoff, Sebastian ; Darcher, Alana ; Deistler, Michael ; Gao, Richard ; Gerken, Franziska ; Gloeckler, Manuel ; Haxel, Lisa ; Kapoor, Jaivardhan ; Lappalainen, Janne K ; Macke, Jakob H ; Moss, Guy ; Pals, Matthijs ; Pei, Felix ; Rapp, Rachel ; A Erdem Sağtekin ; Schröder, Cornelius ; Schulz, Auguste ; Stefanidi, Zinovia ; Toyota, Shoji ; Ulmer, Linda ; Vetter, Julius</creator><creatorcontrib>Bischoff, Sebastian ; Darcher, Alana ; Deistler, Michael ; Gao, Richard ; Gerken, Franziska ; Gloeckler, Manuel ; Haxel, Lisa ; Kapoor, Jaivardhan ; Lappalainen, Janne K ; Macke, Jakob H ; Moss, Guy ; Pals, Matthijs ; Pei, Felix ; Rapp, Rachel ; A Erdem Sağtekin ; Schröder, Cornelius ; Schulz, Auguste ; Stefanidi, Zinovia ; Toyota, Shoji ; Ulmer, Linda ; Vetter, Julius</creatorcontrib><description>Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular sample-based statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fréchet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classifiers ; Medical imaging ; Neural networks</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Bischoff, Sebastian</creatorcontrib><creatorcontrib>Darcher, Alana</creatorcontrib><creatorcontrib>Deistler, Michael</creatorcontrib><creatorcontrib>Gao, Richard</creatorcontrib><creatorcontrib>Gerken, Franziska</creatorcontrib><creatorcontrib>Gloeckler, Manuel</creatorcontrib><creatorcontrib>Haxel, Lisa</creatorcontrib><creatorcontrib>Kapoor, Jaivardhan</creatorcontrib><creatorcontrib>Lappalainen, Janne K</creatorcontrib><creatorcontrib>Macke, Jakob H</creatorcontrib><creatorcontrib>Moss, Guy</creatorcontrib><creatorcontrib>Pals, Matthijs</creatorcontrib><creatorcontrib>Pei, Felix</creatorcontrib><creatorcontrib>Rapp, Rachel</creatorcontrib><creatorcontrib>A Erdem Sağtekin</creatorcontrib><creatorcontrib>Schröder, Cornelius</creatorcontrib><creatorcontrib>Schulz, Auguste</creatorcontrib><creatorcontrib>Stefanidi, Zinovia</creatorcontrib><creatorcontrib>Toyota, Shoji</creatorcontrib><creatorcontrib>Ulmer, Linda</creatorcontrib><creatorcontrib>Vetter, Julius</creatorcontrib><title>A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science</title><title>arXiv.org</title><description>Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular sample-based statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fréchet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.</description><subject>Classifiers</subject><subject>Medical imaging</subject><subject>Neural networks</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNit0KgjAYQEcQJOU7fNC1oPMvL6OsboLA7mXNz5iszbbp8zeoB-jqHDhnQQKapkm0yyhdkdDaIY5jWpQ0z9OA8D3cDONOcCbhPIkOwWlo2GuUGD2YxQ4ax5yw3-PohSmOFnptoJ6ZnHxUTzijQuN1RrjqDqUFoaDhAv28IcueSYvhj2uyPdX3wyUajX5PaF076Mkon1paFVWSlVlSpf9dH2yKRXw</recordid><startdate>20241010</startdate><enddate>20241010</enddate><creator>Bischoff, Sebastian</creator><creator>Darcher, Alana</creator><creator>Deistler, Michael</creator><creator>Gao, Richard</creator><creator>Gerken, Franziska</creator><creator>Gloeckler, Manuel</creator><creator>Haxel, Lisa</creator><creator>Kapoor, Jaivardhan</creator><creator>Lappalainen, Janne K</creator><creator>Macke, Jakob H</creator><creator>Moss, Guy</creator><creator>Pals, Matthijs</creator><creator>Pei, Felix</creator><creator>Rapp, Rachel</creator><creator>A Erdem Sağtekin</creator><creator>Schröder, Cornelius</creator><creator>Schulz, Auguste</creator><creator>Stefanidi, Zinovia</creator><creator>Toyota, Shoji</creator><creator>Ulmer, Linda</creator><creator>Vetter, Julius</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241010</creationdate><title>A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science</title><author>Bischoff, Sebastian ; Darcher, Alana ; Deistler, Michael ; Gao, Richard ; Gerken, Franziska ; Gloeckler, Manuel ; Haxel, Lisa ; Kapoor, Jaivardhan ; Lappalainen, Janne K ; Macke, Jakob H ; Moss, Guy ; Pals, Matthijs ; Pei, Felix ; Rapp, Rachel ; A Erdem Sağtekin ; Schröder, Cornelius ; Schulz, Auguste ; Stefanidi, Zinovia ; Toyota, Shoji ; Ulmer, Linda ; Vetter, Julius</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29691474193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classifiers</topic><topic>Medical imaging</topic><topic>Neural networks</topic><toplevel>online_resources</toplevel><creatorcontrib>Bischoff, Sebastian</creatorcontrib><creatorcontrib>Darcher, Alana</creatorcontrib><creatorcontrib>Deistler, Michael</creatorcontrib><creatorcontrib>Gao, Richard</creatorcontrib><creatorcontrib>Gerken, Franziska</creatorcontrib><creatorcontrib>Gloeckler, Manuel</creatorcontrib><creatorcontrib>Haxel, Lisa</creatorcontrib><creatorcontrib>Kapoor, Jaivardhan</creatorcontrib><creatorcontrib>Lappalainen, Janne K</creatorcontrib><creatorcontrib>Macke, Jakob H</creatorcontrib><creatorcontrib>Moss, Guy</creatorcontrib><creatorcontrib>Pals, Matthijs</creatorcontrib><creatorcontrib>Pei, Felix</creatorcontrib><creatorcontrib>Rapp, Rachel</creatorcontrib><creatorcontrib>A Erdem Sağtekin</creatorcontrib><creatorcontrib>Schröder, Cornelius</creatorcontrib><creatorcontrib>Schulz, Auguste</creatorcontrib><creatorcontrib>Stefanidi, Zinovia</creatorcontrib><creatorcontrib>Toyota, Shoji</creatorcontrib><creatorcontrib>Ulmer, Linda</creatorcontrib><creatorcontrib>Vetter, Julius</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bischoff, Sebastian</au><au>Darcher, Alana</au><au>Deistler, Michael</au><au>Gao, Richard</au><au>Gerken, Franziska</au><au>Gloeckler, Manuel</au><au>Haxel, Lisa</au><au>Kapoor, Jaivardhan</au><au>Lappalainen, Janne K</au><au>Macke, Jakob H</au><au>Moss, Guy</au><au>Pals, Matthijs</au><au>Pei, Felix</au><au>Rapp, Rachel</au><au>A Erdem Sağtekin</au><au>Schröder, Cornelius</au><au>Schulz, Auguste</au><au>Stefanidi, Zinovia</au><au>Toyota, Shoji</au><au>Ulmer, Linda</au><au>Vetter, Julius</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science</atitle><jtitle>arXiv.org</jtitle><date>2024-10-10</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular sample-based statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fréchet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2969147419 |
source | Free E- Journals |
subjects | Classifiers Medical imaging Neural networks |
title | A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T23%3A24%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=A%20Practical%20Guide%20to%20Sample-based%20Statistical%20Distances%20for%20Evaluating%20Generative%20Models%20in%20Science&rft.jtitle=arXiv.org&rft.au=Bischoff,%20Sebastian&rft.date=2024-10-10&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2969147419%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2969147419&rft_id=info:pmid/&rfr_iscdi=true |