DATA GAP MITIGATION

Disclosed embodiments provide techniques for estimating imputation algorithm performance. Multiple imputer algorithms are selected, and an evaluation of how well each of the imputer algorithms can estimate the missing data is performed. Disclosed embodiments obtain an imputer candidate dataset (ICD)...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yashchin, Emmanuel, Iyengar, Arun Kwangil, Patel, Dhavalkumar C, Bhamidipaty, Anuradha, Zhou, Nianjun, Shrivastava, Shrey
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Yashchin, Emmanuel
Iyengar, Arun Kwangil
Patel, Dhavalkumar C
Bhamidipaty, Anuradha
Zhou, Nianjun
Shrivastava, Shrey
description Disclosed embodiments provide techniques for estimating imputation algorithm performance. Multiple imputer algorithms are selected, and an evaluation of how well each of the imputer algorithms can estimate the missing data is performed. Disclosed embodiments obtain an imputer candidate dataset (ICD). The imputer candidate dataset is compared to the incomplete data range, and a similarity metric is determined between the data range and the ICD. When the similarity metric exceeds a predetermined threshold, an imputer evaluation dataset (IED) is created from the ICD by removing one or more data points from the ICD. Each imputer algorithm is evaluated by applying the IED to it, and computing an imputer evaluation metric based on its performance. The multiple imputer algorithms are ranked based on the imputer evaluation metric. The best ranked imputer algorithm can then be selected for use on the incomplete data range within the measurement dataset.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2024152492A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2024152492A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2024152492A13</originalsourceid><addsrcrecordid>eNrjZBB2cQxxVHB3DFDw9QzxdHcM8fT342FgTUvMKU7lhdLcDMpuriHOHrqpBfnxqcUFicmpeakl8aHBRgZGJoamRiaWRo6GxsSpAgDAPR8T</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>DATA GAP MITIGATION</title><source>esp@cenet</source><creator>Yashchin, Emmanuel ; Iyengar, Arun Kwangil ; Patel, Dhavalkumar C ; Bhamidipaty, Anuradha ; Zhou, Nianjun ; Shrivastava, Shrey</creator><creatorcontrib>Yashchin, Emmanuel ; Iyengar, Arun Kwangil ; Patel, Dhavalkumar C ; Bhamidipaty, Anuradha ; Zhou, Nianjun ; Shrivastava, Shrey</creatorcontrib><description>Disclosed embodiments provide techniques for estimating imputation algorithm performance. Multiple imputer algorithms are selected, and an evaluation of how well each of the imputer algorithms can estimate the missing data is performed. Disclosed embodiments obtain an imputer candidate dataset (ICD). The imputer candidate dataset is compared to the incomplete data range, and a similarity metric is determined between the data range and the ICD. When the similarity metric exceeds a predetermined threshold, an imputer evaluation dataset (IED) is created from the ICD by removing one or more data points from the ICD. Each imputer algorithm is evaluated by applying the IED to it, and computing an imputer evaluation metric based on its performance. The multiple imputer algorithms are ranked based on the imputer evaluation metric. The best ranked imputer algorithm can then be selected for use on the incomplete data range within the measurement dataset.</description><language>eng</language><subject>CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20240509&amp;DB=EPODOC&amp;CC=US&amp;NR=2024152492A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,777,882,25545,76296</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20240509&amp;DB=EPODOC&amp;CC=US&amp;NR=2024152492A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Yashchin, Emmanuel</creatorcontrib><creatorcontrib>Iyengar, Arun Kwangil</creatorcontrib><creatorcontrib>Patel, Dhavalkumar C</creatorcontrib><creatorcontrib>Bhamidipaty, Anuradha</creatorcontrib><creatorcontrib>Zhou, Nianjun</creatorcontrib><creatorcontrib>Shrivastava, Shrey</creatorcontrib><title>DATA GAP MITIGATION</title><description>Disclosed embodiments provide techniques for estimating imputation algorithm performance. Multiple imputer algorithms are selected, and an evaluation of how well each of the imputer algorithms can estimate the missing data is performed. Disclosed embodiments obtain an imputer candidate dataset (ICD). The imputer candidate dataset is compared to the incomplete data range, and a similarity metric is determined between the data range and the ICD. When the similarity metric exceeds a predetermined threshold, an imputer evaluation dataset (IED) is created from the ICD by removing one or more data points from the ICD. Each imputer algorithm is evaluated by applying the IED to it, and computing an imputer evaluation metric based on its performance. The multiple imputer algorithms are ranked based on the imputer evaluation metric. The best ranked imputer algorithm can then be selected for use on the incomplete data range within the measurement dataset.</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZBB2cQxxVHB3DFDw9QzxdHcM8fT342FgTUvMKU7lhdLcDMpuriHOHrqpBfnxqcUFicmpeakl8aHBRgZGJoamRiaWRo6GxsSpAgDAPR8T</recordid><startdate>20240509</startdate><enddate>20240509</enddate><creator>Yashchin, Emmanuel</creator><creator>Iyengar, Arun Kwangil</creator><creator>Patel, Dhavalkumar C</creator><creator>Bhamidipaty, Anuradha</creator><creator>Zhou, Nianjun</creator><creator>Shrivastava, Shrey</creator><scope>EVB</scope></search><sort><creationdate>20240509</creationdate><title>DATA GAP MITIGATION</title><author>Yashchin, Emmanuel ; Iyengar, Arun Kwangil ; Patel, Dhavalkumar C ; Bhamidipaty, Anuradha ; Zhou, Nianjun ; Shrivastava, Shrey</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2024152492A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2024</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>Yashchin, Emmanuel</creatorcontrib><creatorcontrib>Iyengar, Arun Kwangil</creatorcontrib><creatorcontrib>Patel, Dhavalkumar C</creatorcontrib><creatorcontrib>Bhamidipaty, Anuradha</creatorcontrib><creatorcontrib>Zhou, Nianjun</creatorcontrib><creatorcontrib>Shrivastava, Shrey</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yashchin, Emmanuel</au><au>Iyengar, Arun Kwangil</au><au>Patel, Dhavalkumar C</au><au>Bhamidipaty, Anuradha</au><au>Zhou, Nianjun</au><au>Shrivastava, Shrey</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>DATA GAP MITIGATION</title><date>2024-05-09</date><risdate>2024</risdate><abstract>Disclosed embodiments provide techniques for estimating imputation algorithm performance. Multiple imputer algorithms are selected, and an evaluation of how well each of the imputer algorithms can estimate the missing data is performed. Disclosed embodiments obtain an imputer candidate dataset (ICD). The imputer candidate dataset is compared to the incomplete data range, and a similarity metric is determined between the data range and the ICD. When the similarity metric exceeds a predetermined threshold, an imputer evaluation dataset (IED) is created from the ICD by removing one or more data points from the ICD. Each imputer algorithm is evaluated by applying the IED to it, and computing an imputer evaluation metric based on its performance. The multiple imputer algorithms are ranked based on the imputer evaluation metric. The best ranked imputer algorithm can then be selected for use on the incomplete data range within the measurement dataset.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US2024152492A1
source esp@cenet
subjects CALCULATING
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
PHYSICS
title DATA GAP MITIGATION
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T02%3A01%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Yashchin,%20Emmanuel&rft.date=2024-05-09&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2024152492A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true