System and Process for Record Duplication Analysis

A system and process for record duplication analysis that relies on a multi-membership Bayesian analysis to determine the probability that records within a data set are matches. The Bayesian calculation may rely on objective data describing the data set as well as subjective assessments of the data...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: DECARO MICHAEL, NAEYMI-RAD FRANK, CHARLOT REGIS, HAINES DAVID, CARDWELL MATTHEW C
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator DECARO MICHAEL
NAEYMI-RAD FRANK
CHARLOT REGIS
HAINES DAVID
CARDWELL MATTHEW C
description A system and process for record duplication analysis that relies on a multi-membership Bayesian analysis to determine the probability that records within a data set are matches. The Bayesian calculation may rely on objective data describing the data set as well as subjective assessments of the data set. In addition, a system and process for record duplication analysis may rely on the predetermination of probabilistic patterns, where the system only searches for patterns exceeding a chosen threshold. Work flow may include selecting which fields within each record should be analyzed, normalizing the values within those fields and removing default data, calculating possible patterns and their match probabilities, analyzing record pairs to determine which have patterns exceeding a chosen threshold to determine the presence of duplicates, and merging duplicates, closing transactions reflecting non-duplicates, identifying records having insufficient data to determine the existence or lack of a match, and/or rolling back accidental merges.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2011004626A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2011004626A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2011004626A13</originalsourceid><addsrcrecordid>eNrjZDAKriwuSc1VSMxLUQgoyk9OLS5WSMsvUghKTc4vSlFwKS3IyUxOLMnMz1NwzEvMqSzOLOZhYE1LzClO5YXS3AzKbq4hzh66qQX58anFBYnJqXmpJfGhwUYGhoYGBiZmRmaOhsbEqQIAWvkstg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>System and Process for Record Duplication Analysis</title><source>esp@cenet</source><creator>DECARO MICHAEL ; NAEYMI-RAD FRANK ; CHARLOT REGIS ; HAINES DAVID ; CARDWELL MATTHEW C</creator><creatorcontrib>DECARO MICHAEL ; NAEYMI-RAD FRANK ; CHARLOT REGIS ; HAINES DAVID ; CARDWELL MATTHEW C</creatorcontrib><description>A system and process for record duplication analysis that relies on a multi-membership Bayesian analysis to determine the probability that records within a data set are matches. The Bayesian calculation may rely on objective data describing the data set as well as subjective assessments of the data set. In addition, a system and process for record duplication analysis may rely on the predetermination of probabilistic patterns, where the system only searches for patterns exceeding a chosen threshold. Work flow may include selecting which fields within each record should be analyzed, normalizing the values within those fields and removing default data, calculating possible patterns and their match probabilities, analyzing record pairs to determine which have patterns exceeding a chosen threshold to determine the presence of duplicates, and merging duplicates, closing transactions reflecting non-duplicates, identifying records having insufficient data to determine the existence or lack of a match, and/or rolling back accidental merges.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2011</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20110106&amp;DB=EPODOC&amp;CC=US&amp;NR=2011004626A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76290</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20110106&amp;DB=EPODOC&amp;CC=US&amp;NR=2011004626A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>DECARO MICHAEL</creatorcontrib><creatorcontrib>NAEYMI-RAD FRANK</creatorcontrib><creatorcontrib>CHARLOT REGIS</creatorcontrib><creatorcontrib>HAINES DAVID</creatorcontrib><creatorcontrib>CARDWELL MATTHEW C</creatorcontrib><title>System and Process for Record Duplication Analysis</title><description>A system and process for record duplication analysis that relies on a multi-membership Bayesian analysis to determine the probability that records within a data set are matches. The Bayesian calculation may rely on objective data describing the data set as well as subjective assessments of the data set. In addition, a system and process for record duplication analysis may rely on the predetermination of probabilistic patterns, where the system only searches for patterns exceeding a chosen threshold. Work flow may include selecting which fields within each record should be analyzed, normalizing the values within those fields and removing default data, calculating possible patterns and their match probabilities, analyzing record pairs to determine which have patterns exceeding a chosen threshold to determine the presence of duplicates, and merging duplicates, closing transactions reflecting non-duplicates, identifying records having insufficient data to determine the existence or lack of a match, and/or rolling back accidental merges.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2011</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDAKriwuSc1VSMxLUQgoyk9OLS5WSMsvUghKTc4vSlFwKS3IyUxOLMnMz1NwzEvMqSzOLOZhYE1LzClO5YXS3AzKbq4hzh66qQX58anFBYnJqXmpJfGhwUYGhoYGBiZmRmaOhsbEqQIAWvkstg</recordid><startdate>20110106</startdate><enddate>20110106</enddate><creator>DECARO MICHAEL</creator><creator>NAEYMI-RAD FRANK</creator><creator>CHARLOT REGIS</creator><creator>HAINES DAVID</creator><creator>CARDWELL MATTHEW C</creator><scope>EVB</scope></search><sort><creationdate>20110106</creationdate><title>System and Process for Record Duplication Analysis</title><author>DECARO MICHAEL ; NAEYMI-RAD FRANK ; CHARLOT REGIS ; HAINES DAVID ; CARDWELL MATTHEW C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2011004626A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2011</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>DECARO MICHAEL</creatorcontrib><creatorcontrib>NAEYMI-RAD FRANK</creatorcontrib><creatorcontrib>CHARLOT REGIS</creatorcontrib><creatorcontrib>HAINES DAVID</creatorcontrib><creatorcontrib>CARDWELL MATTHEW C</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>DECARO MICHAEL</au><au>NAEYMI-RAD FRANK</au><au>CHARLOT REGIS</au><au>HAINES DAVID</au><au>CARDWELL MATTHEW C</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>System and Process for Record Duplication Analysis</title><date>2011-01-06</date><risdate>2011</risdate><abstract>A system and process for record duplication analysis that relies on a multi-membership Bayesian analysis to determine the probability that records within a data set are matches. The Bayesian calculation may rely on objective data describing the data set as well as subjective assessments of the data set. In addition, a system and process for record duplication analysis may rely on the predetermination of probabilistic patterns, where the system only searches for patterns exceeding a chosen threshold. Work flow may include selecting which fields within each record should be analyzed, normalizing the values within those fields and removing default data, calculating possible patterns and their match probabilities, analyzing record pairs to determine which have patterns exceeding a chosen threshold to determine the presence of duplicates, and merging duplicates, closing transactions reflecting non-duplicates, identifying records having insufficient data to determine the existence or lack of a match, and/or rolling back accidental merges.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US2011004626A1
source esp@cenet
subjects CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
PHYSICS
title System and Process for Record Duplication Analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T14%3A49%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=DECARO%20MICHAEL&rft.date=2011-01-06&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2011004626A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true