Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients
Discovering causal effects is at the core of scientific investigation but remains challenging when only observational data is available. In practice, causal networks are difficult to learn and interpret, and limited to relatively small datasets. We report a more reliable and scalable causal discover...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Ribeiro-Dantas, Marcel da Câmara Li, Honghao Cabeli, Vincent Dupuis, Louise Simon, Franck Hettal, Liza Hamy, Anne-Sophie Isambert, Hervé |
description | Discovering causal effects is at the core of scientific investigation but
remains challenging when only observational data is available. In practice,
causal networks are difficult to learn and interpret, and limited to relatively
small datasets. We report a more reliable and scalable causal discovery method
(iMIIC), based on a general mutual information supremum principle, which
greatly improves the precision of inferred causal relations while
distinguishing genuine causes from putative and latent causal effects. We
showcase iMIIC on synthetic and real-life healthcare data from 396,179 breast
cancer patients from the US Surveillance, Epidemiology, and End Results
program. More than 90\% of predicted causal effects appear correct, while the
remaining unexpected direct and indirect causal effects can be interpreted in
terms of diagnostic procedures, therapeutic timing, patient preference or
socio-economic disparity. iMIIC's unique capabilities open up new avenues to
discover reliable and interpretable causal networks across a range of research
fields. |
doi_str_mv | 10.48550/arxiv.2303.06423 |
format | Article |
fullrecord | <record><control><sourceid>hal_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2303_06423</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_04047794v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1013-484876cd043a98aec9e71f9d288690fd8aa9d32865ee29153408ed61139a7b573</originalsourceid><addsrcrecordid>eNo9kLFOwzAURbMwoMIHMPFWpLY8x05ij1UFFCkSC8zRS_xSItwksk2hI39O2iKmK12de4eTJDcCl0pnGd6T_-72y1SiXGKuUnmZ_JRMvu_6LXR9ZD96jlQ7hoY-AznoOX4N_iNA64cd7NkfwJHfMliKFDiGOdA4uq6h2A09xAEU4hwRYcd2ah14bgZvAwwt1J4pxOm6b9jDOE24j-EquWjJBb7-y1ny9vjwut4sypen5_WqXJBAIRdKK13kjUUlyWjixnAhWmNTrXODrdVExspU5xlzakQmFWq2uRDSUFFnhZwld-ffd3LV6Lsd-UM1UFdtVmV17FChKgqj9mJib8_sSdg_fRRXncTJX7MJaHk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients</title><source>arXiv.org</source><creator>Ribeiro-Dantas, Marcel da Câmara ; Li, Honghao ; Cabeli, Vincent ; Dupuis, Louise ; Simon, Franck ; Hettal, Liza ; Hamy, Anne-Sophie ; Isambert, Hervé</creator><creatorcontrib>Ribeiro-Dantas, Marcel da Câmara ; Li, Honghao ; Cabeli, Vincent ; Dupuis, Louise ; Simon, Franck ; Hettal, Liza ; Hamy, Anne-Sophie ; Isambert, Hervé</creatorcontrib><description>Discovering causal effects is at the core of scientific investigation but
remains challenging when only observational data is available. In practice,
causal networks are difficult to learn and interpret, and limited to relatively
small datasets. We report a more reliable and scalable causal discovery method
(iMIIC), based on a general mutual information supremum principle, which
greatly improves the precision of inferred causal relations while
distinguishing genuine causes from putative and latent causal effects. We
showcase iMIIC on synthetic and real-life healthcare data from 396,179 breast
cancer patients from the US Surveillance, Epidemiology, and End Results
program. More than 90\% of predicted causal effects appear correct, while the
remaining unexpected direct and indirect causal effects can be interpreted in
terms of diagnostic procedures, therapeutic timing, patient preference or
socio-economic disparity. iMIIC's unique capabilities open up new avenues to
discover reliable and interpretable causal networks across a range of research
fields.</description><identifier>DOI: 10.48550/arxiv.2303.06423</identifier><language>eng</language><subject>Computer Science - Learning ; Life Sciences ; Physics - Data Analysis, Statistics and Probability ; Quantitative Biology - Molecular Networks ; Quantitative Biology - Quantitative Methods ; Statistics - Methodology</subject><creationdate>2023-03</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-9638-8545</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2303.06423$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2303.06423$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-04047794$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Ribeiro-Dantas, Marcel da Câmara</creatorcontrib><creatorcontrib>Li, Honghao</creatorcontrib><creatorcontrib>Cabeli, Vincent</creatorcontrib><creatorcontrib>Dupuis, Louise</creatorcontrib><creatorcontrib>Simon, Franck</creatorcontrib><creatorcontrib>Hettal, Liza</creatorcontrib><creatorcontrib>Hamy, Anne-Sophie</creatorcontrib><creatorcontrib>Isambert, Hervé</creatorcontrib><title>Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients</title><description>Discovering causal effects is at the core of scientific investigation but
remains challenging when only observational data is available. In practice,
causal networks are difficult to learn and interpret, and limited to relatively
small datasets. We report a more reliable and scalable causal discovery method
(iMIIC), based on a general mutual information supremum principle, which
greatly improves the precision of inferred causal relations while
distinguishing genuine causes from putative and latent causal effects. We
showcase iMIIC on synthetic and real-life healthcare data from 396,179 breast
cancer patients from the US Surveillance, Epidemiology, and End Results
program. More than 90\% of predicted causal effects appear correct, while the
remaining unexpected direct and indirect causal effects can be interpreted in
terms of diagnostic procedures, therapeutic timing, patient preference or
socio-economic disparity. iMIIC's unique capabilities open up new avenues to
discover reliable and interpretable causal networks across a range of research
fields.</description><subject>Computer Science - Learning</subject><subject>Life Sciences</subject><subject>Physics - Data Analysis, Statistics and Probability</subject><subject>Quantitative Biology - Molecular Networks</subject><subject>Quantitative Biology - Quantitative Methods</subject><subject>Statistics - Methodology</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo9kLFOwzAURbMwoMIHMPFWpLY8x05ij1UFFCkSC8zRS_xSItwksk2hI39O2iKmK12de4eTJDcCl0pnGd6T_-72y1SiXGKuUnmZ_JRMvu_6LXR9ZD96jlQ7hoY-AznoOX4N_iNA64cd7NkfwJHfMliKFDiGOdA4uq6h2A09xAEU4hwRYcd2ah14bgZvAwwt1J4pxOm6b9jDOE24j-EquWjJBb7-y1ny9vjwut4sypen5_WqXJBAIRdKK13kjUUlyWjixnAhWmNTrXODrdVExspU5xlzakQmFWq2uRDSUFFnhZwld-ffd3LV6Lsd-UM1UFdtVmV17FChKgqj9mJib8_sSdg_fRRXncTJX7MJaHk</recordid><startdate>20230311</startdate><enddate>20230311</enddate><creator>Ribeiro-Dantas, Marcel da Câmara</creator><creator>Li, Honghao</creator><creator>Cabeli, Vincent</creator><creator>Dupuis, Louise</creator><creator>Simon, Franck</creator><creator>Hettal, Liza</creator><creator>Hamy, Anne-Sophie</creator><creator>Isambert, Hervé</creator><scope>AKY</scope><scope>ALC</scope><scope>EPD</scope><scope>GOX</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0001-9638-8545</orcidid></search><sort><creationdate>20230311</creationdate><title>Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients</title><author>Ribeiro-Dantas, Marcel da Câmara ; Li, Honghao ; Cabeli, Vincent ; Dupuis, Louise ; Simon, Franck ; Hettal, Liza ; Hamy, Anne-Sophie ; Isambert, Hervé</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1013-484876cd043a98aec9e71f9d288690fd8aa9d32865ee29153408ed61139a7b573</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Life Sciences</topic><topic>Physics - Data Analysis, Statistics and Probability</topic><topic>Quantitative Biology - Molecular Networks</topic><topic>Quantitative Biology - Quantitative Methods</topic><topic>Statistics - Methodology</topic><toplevel>online_resources</toplevel><creatorcontrib>Ribeiro-Dantas, Marcel da Câmara</creatorcontrib><creatorcontrib>Li, Honghao</creatorcontrib><creatorcontrib>Cabeli, Vincent</creatorcontrib><creatorcontrib>Dupuis, Louise</creatorcontrib><creatorcontrib>Simon, Franck</creatorcontrib><creatorcontrib>Hettal, Liza</creatorcontrib><creatorcontrib>Hamy, Anne-Sophie</creatorcontrib><creatorcontrib>Isambert, Hervé</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Quantitative Biology</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><collection>Hyper Article en Ligne (HAL)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ribeiro-Dantas, Marcel da Câmara</au><au>Li, Honghao</au><au>Cabeli, Vincent</au><au>Dupuis, Louise</au><au>Simon, Franck</au><au>Hettal, Liza</au><au>Hamy, Anne-Sophie</au><au>Isambert, Hervé</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients</atitle><date>2023-03-11</date><risdate>2023</risdate><abstract>Discovering causal effects is at the core of scientific investigation but
remains challenging when only observational data is available. In practice,
causal networks are difficult to learn and interpret, and limited to relatively
small datasets. We report a more reliable and scalable causal discovery method
(iMIIC), based on a general mutual information supremum principle, which
greatly improves the precision of inferred causal relations while
distinguishing genuine causes from putative and latent causal effects. We
showcase iMIIC on synthetic and real-life healthcare data from 396,179 breast
cancer patients from the US Surveillance, Epidemiology, and End Results
program. More than 90\% of predicted causal effects appear correct, while the
remaining unexpected direct and indirect causal effects can be interpreted in
terms of diagnostic procedures, therapeutic timing, patient preference or
socio-economic disparity. iMIIC's unique capabilities open up new avenues to
discover reliable and interpretable causal networks across a range of research
fields.</abstract><doi>10.48550/arxiv.2303.06423</doi><orcidid>https://orcid.org/0000-0001-9638-8545</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2303.06423 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2303_06423 |
source | arXiv.org |
subjects | Computer Science - Learning Life Sciences Physics - Data Analysis, Statistics and Probability Quantitative Biology - Molecular Networks Quantitative Biology - Quantitative Methods Statistics - Methodology |
title | Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T13%3A58%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20interpretable%20causal%20networks%20from%20very%20large%20datasets,%20application%20to%20400,000%20medical%20records%20of%20breast%20cancer%20patients&rft.au=Ribeiro-Dantas,%20Marcel%20da%20C%C3%A2mara&rft.date=2023-03-11&rft_id=info:doi/10.48550/arxiv.2303.06423&rft_dat=%3Chal_GOX%3Eoai_HAL_hal_04047794v1%3C/hal_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |