A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science

Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an acces...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-10
Hauptverfasser: Bischoff, Sebastian, Darcher, Alana, Deistler, Michael, Gao, Richard, Gerken, Franziska, Gloeckler, Manuel, Haxel, Lisa, Kapoor, Jaivardhan, Lappalainen, Janne K, Macke, Jakob H, Moss, Guy, Pals, Matthijs, Pei, Felix, Rapp, Rachel, A Erdem Sağtekin, Schröder, Cornelius, Schulz, Auguste, Stefanidi, Zinovia, Toyota, Shoji, Ulmer, Linda, Vetter, Julius
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Bischoff, Sebastian
Darcher, Alana
Deistler, Michael
Gao, Richard
Gerken, Franziska
Gloeckler, Manuel
Haxel, Lisa
Kapoor, Jaivardhan
Lappalainen, Janne K
Macke, Jakob H
Moss, Guy
Pals, Matthijs
Pei, Felix
Rapp, Rachel
A Erdem Sağtekin
Schröder, Cornelius
Schulz, Auguste
Stefanidi, Zinovia
Toyota, Shoji
Ulmer, Linda
Vetter, Julius
description Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular sample-based statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fréchet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2969147419</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2969147419</sourcerecordid><originalsourceid>FETCH-proquest_journals_29691474193</originalsourceid><addsrcrecordid>eNqNit0KgjAYQEcQJOU7fNC1oPMvL6OsboLA7mXNz5iszbbp8zeoB-jqHDhnQQKapkm0yyhdkdDaIY5jWpQ0z9OA8D3cDONOcCbhPIkOwWlo2GuUGD2YxQ4ax5yw3-PohSmOFnptoJ6ZnHxUTzijQuN1RrjqDqUFoaDhAv28IcueSYvhj2uyPdX3wyUajX5PaF076Mkon1paFVWSlVlSpf9dH2yKRXw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2969147419</pqid></control><display><type>article</type><title>A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science</title><source>Free E- Journals</source><creator>Bischoff, Sebastian ; Darcher, Alana ; Deistler, Michael ; Gao, Richard ; Gerken, Franziska ; Gloeckler, Manuel ; Haxel, Lisa ; Kapoor, Jaivardhan ; Lappalainen, Janne K ; Macke, Jakob H ; Moss, Guy ; Pals, Matthijs ; Pei, Felix ; Rapp, Rachel ; A Erdem Sağtekin ; Schröder, Cornelius ; Schulz, Auguste ; Stefanidi, Zinovia ; Toyota, Shoji ; Ulmer, Linda ; Vetter, Julius</creator><creatorcontrib>Bischoff, Sebastian ; Darcher, Alana ; Deistler, Michael ; Gao, Richard ; Gerken, Franziska ; Gloeckler, Manuel ; Haxel, Lisa ; Kapoor, Jaivardhan ; Lappalainen, Janne K ; Macke, Jakob H ; Moss, Guy ; Pals, Matthijs ; Pei, Felix ; Rapp, Rachel ; A Erdem Sağtekin ; Schröder, Cornelius ; Schulz, Auguste ; Stefanidi, Zinovia ; Toyota, Shoji ; Ulmer, Linda ; Vetter, Julius</creatorcontrib><description>Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular sample-based statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fréchet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classifiers ; Medical imaging ; Neural networks</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Bischoff, Sebastian</creatorcontrib><creatorcontrib>Darcher, Alana</creatorcontrib><creatorcontrib>Deistler, Michael</creatorcontrib><creatorcontrib>Gao, Richard</creatorcontrib><creatorcontrib>Gerken, Franziska</creatorcontrib><creatorcontrib>Gloeckler, Manuel</creatorcontrib><creatorcontrib>Haxel, Lisa</creatorcontrib><creatorcontrib>Kapoor, Jaivardhan</creatorcontrib><creatorcontrib>Lappalainen, Janne K</creatorcontrib><creatorcontrib>Macke, Jakob H</creatorcontrib><creatorcontrib>Moss, Guy</creatorcontrib><creatorcontrib>Pals, Matthijs</creatorcontrib><creatorcontrib>Pei, Felix</creatorcontrib><creatorcontrib>Rapp, Rachel</creatorcontrib><creatorcontrib>A Erdem Sağtekin</creatorcontrib><creatorcontrib>Schröder, Cornelius</creatorcontrib><creatorcontrib>Schulz, Auguste</creatorcontrib><creatorcontrib>Stefanidi, Zinovia</creatorcontrib><creatorcontrib>Toyota, Shoji</creatorcontrib><creatorcontrib>Ulmer, Linda</creatorcontrib><creatorcontrib>Vetter, Julius</creatorcontrib><title>A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science</title><title>arXiv.org</title><description>Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular sample-based statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fréchet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.</description><subject>Classifiers</subject><subject>Medical imaging</subject><subject>Neural networks</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNit0KgjAYQEcQJOU7fNC1oPMvL6OsboLA7mXNz5iszbbp8zeoB-jqHDhnQQKapkm0yyhdkdDaIY5jWpQ0z9OA8D3cDONOcCbhPIkOwWlo2GuUGD2YxQ4ax5yw3-PohSmOFnptoJ6ZnHxUTzijQuN1RrjqDqUFoaDhAv28IcueSYvhj2uyPdX3wyUajX5PaF076Mkon1paFVWSlVlSpf9dH2yKRXw</recordid><startdate>20241010</startdate><enddate>20241010</enddate><creator>Bischoff, Sebastian</creator><creator>Darcher, Alana</creator><creator>Deistler, Michael</creator><creator>Gao, Richard</creator><creator>Gerken, Franziska</creator><creator>Gloeckler, Manuel</creator><creator>Haxel, Lisa</creator><creator>Kapoor, Jaivardhan</creator><creator>Lappalainen, Janne K</creator><creator>Macke, Jakob H</creator><creator>Moss, Guy</creator><creator>Pals, Matthijs</creator><creator>Pei, Felix</creator><creator>Rapp, Rachel</creator><creator>A Erdem Sağtekin</creator><creator>Schröder, Cornelius</creator><creator>Schulz, Auguste</creator><creator>Stefanidi, Zinovia</creator><creator>Toyota, Shoji</creator><creator>Ulmer, Linda</creator><creator>Vetter, Julius</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241010</creationdate><title>A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science</title><author>Bischoff, Sebastian ; Darcher, Alana ; Deistler, Michael ; Gao, Richard ; Gerken, Franziska ; Gloeckler, Manuel ; Haxel, Lisa ; Kapoor, Jaivardhan ; Lappalainen, Janne K ; Macke, Jakob H ; Moss, Guy ; Pals, Matthijs ; Pei, Felix ; Rapp, Rachel ; A Erdem Sağtekin ; Schröder, Cornelius ; Schulz, Auguste ; Stefanidi, Zinovia ; Toyota, Shoji ; Ulmer, Linda ; Vetter, Julius</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29691474193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classifiers</topic><topic>Medical imaging</topic><topic>Neural networks</topic><toplevel>online_resources</toplevel><creatorcontrib>Bischoff, Sebastian</creatorcontrib><creatorcontrib>Darcher, Alana</creatorcontrib><creatorcontrib>Deistler, Michael</creatorcontrib><creatorcontrib>Gao, Richard</creatorcontrib><creatorcontrib>Gerken, Franziska</creatorcontrib><creatorcontrib>Gloeckler, Manuel</creatorcontrib><creatorcontrib>Haxel, Lisa</creatorcontrib><creatorcontrib>Kapoor, Jaivardhan</creatorcontrib><creatorcontrib>Lappalainen, Janne K</creatorcontrib><creatorcontrib>Macke, Jakob H</creatorcontrib><creatorcontrib>Moss, Guy</creatorcontrib><creatorcontrib>Pals, Matthijs</creatorcontrib><creatorcontrib>Pei, Felix</creatorcontrib><creatorcontrib>Rapp, Rachel</creatorcontrib><creatorcontrib>A Erdem Sağtekin</creatorcontrib><creatorcontrib>Schröder, Cornelius</creatorcontrib><creatorcontrib>Schulz, Auguste</creatorcontrib><creatorcontrib>Stefanidi, Zinovia</creatorcontrib><creatorcontrib>Toyota, Shoji</creatorcontrib><creatorcontrib>Ulmer, Linda</creatorcontrib><creatorcontrib>Vetter, Julius</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bischoff, Sebastian</au><au>Darcher, Alana</au><au>Deistler, Michael</au><au>Gao, Richard</au><au>Gerken, Franziska</au><au>Gloeckler, Manuel</au><au>Haxel, Lisa</au><au>Kapoor, Jaivardhan</au><au>Lappalainen, Janne K</au><au>Macke, Jakob H</au><au>Moss, Guy</au><au>Pals, Matthijs</au><au>Pei, Felix</au><au>Rapp, Rachel</au><au>A Erdem Sağtekin</au><au>Schröder, Cornelius</au><au>Schulz, Auguste</au><au>Stefanidi, Zinovia</au><au>Toyota, Shoji</au><au>Ulmer, Linda</au><au>Vetter, Julius</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science</atitle><jtitle>arXiv.org</jtitle><date>2024-10-10</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular sample-based statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fréchet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-10
issn 2331-8422
language eng
recordid cdi_proquest_journals_2969147419
source Free E- Journals
subjects Classifiers
Medical imaging
Neural networks
title A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T23%3A24%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=A%20Practical%20Guide%20to%20Sample-based%20Statistical%20Distances%20for%20Evaluating%20Generative%20Models%20in%20Science&rft.jtitle=arXiv.org&rft.au=Bischoff,%20Sebastian&rft.date=2024-10-10&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2969147419%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2969147419&rft_id=info:pmid/&rfr_iscdi=true