ONLINE SAMPLING ANALYSIS

Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transacti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: RANJAN, Rakesh Kumar, SAXENA, Siddhartha, PATIL, Deepak Chandrakant, DESHMUKH, Om Dadaji, DAS, Shibsankar
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator RANJAN, Rakesh Kumar
SAXENA, Siddhartha
PATIL, Deepak Chandrakant
DESHMUKH, Om Dadaji
DAS, Shibsankar
description Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transaction records that contain least frequently appeared text segments. The system determines a respective vector representation for each selected transaction record. The system can measure similarity between transaction records based on the vector representations. The system assigns the selected transaction records to multiple clusters based on the vector representations and designated dimensions of importance. The system identifies one or more anchors that include transaction records on boundaries between clusters. The system filters the subset of transaction records by removing transaction records that are close to the anchors. The system then provides the filtered subset as a representative set of samples to a sample consumer.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_EP3762884A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EP3762884A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_EP3762884A13</originalsourceid><addsrcrecordid>eNrjZJDw9_Px9HNVCHb0DQAy3BUc_Rx9IoM9g3kYWNMSc4pTeaE0N4OCm2uIs4duakF-fGpxQWJyal5qSbxrgLG5mZGFhYmjoTERSgDAIB93</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>ONLINE SAMPLING ANALYSIS</title><source>esp@cenet</source><creator>RANJAN, Rakesh Kumar ; SAXENA, Siddhartha ; PATIL, Deepak Chandrakant ; DESHMUKH, Om Dadaji ; DAS, Shibsankar</creator><creatorcontrib>RANJAN, Rakesh Kumar ; SAXENA, Siddhartha ; PATIL, Deepak Chandrakant ; DESHMUKH, Om Dadaji ; DAS, Shibsankar</creatorcontrib><description>Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transaction records that contain least frequently appeared text segments. The system determines a respective vector representation for each selected transaction record. The system can measure similarity between transaction records based on the vector representations. The system assigns the selected transaction records to multiple clusters based on the vector representations and designated dimensions of importance. The system identifies one or more anchors that include transaction records on boundaries between clusters. The system filters the subset of transaction records by removing transaction records that are close to the anchors. The system then provides the filtered subset as a representative set of samples to a sample consumer.</description><language>eng ; fre ; ger</language><subject>CALCULATING ; COMPUTING ; COUNTING ; DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FORADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORYOR FORECASTING PURPOSES ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS ; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE,COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTINGPURPOSES, NOT OTHERWISE PROVIDED FOR</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210113&amp;DB=EPODOC&amp;CC=EP&amp;NR=3762884A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210113&amp;DB=EPODOC&amp;CC=EP&amp;NR=3762884A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>RANJAN, Rakesh Kumar</creatorcontrib><creatorcontrib>SAXENA, Siddhartha</creatorcontrib><creatorcontrib>PATIL, Deepak Chandrakant</creatorcontrib><creatorcontrib>DESHMUKH, Om Dadaji</creatorcontrib><creatorcontrib>DAS, Shibsankar</creatorcontrib><title>ONLINE SAMPLING ANALYSIS</title><description>Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transaction records that contain least frequently appeared text segments. The system determines a respective vector representation for each selected transaction record. The system can measure similarity between transaction records based on the vector representations. The system assigns the selected transaction records to multiple clusters based on the vector representations and designated dimensions of importance. The system identifies one or more anchors that include transaction records on boundaries between clusters. The system filters the subset of transaction records by removing transaction records that are close to the anchors. The system then provides the filtered subset as a representative set of samples to a sample consumer.</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FORADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORYOR FORECASTING PURPOSES</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><subject>SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE,COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTINGPURPOSES, NOT OTHERWISE PROVIDED FOR</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZJDw9_Px9HNVCHb0DQAy3BUc_Rx9IoM9g3kYWNMSc4pTeaE0N4OCm2uIs4duakF-fGpxQWJyal5qSbxrgLG5mZGFhYmjoTERSgDAIB93</recordid><startdate>20210113</startdate><enddate>20210113</enddate><creator>RANJAN, Rakesh Kumar</creator><creator>SAXENA, Siddhartha</creator><creator>PATIL, Deepak Chandrakant</creator><creator>DESHMUKH, Om Dadaji</creator><creator>DAS, Shibsankar</creator><scope>EVB</scope></search><sort><creationdate>20210113</creationdate><title>ONLINE SAMPLING ANALYSIS</title><author>RANJAN, Rakesh Kumar ; SAXENA, Siddhartha ; PATIL, Deepak Chandrakant ; DESHMUKH, Om Dadaji ; DAS, Shibsankar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_EP3762884A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng ; fre ; ger</language><creationdate>2021</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FORADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORYOR FORECASTING PURPOSES</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><topic>SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE,COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTINGPURPOSES, NOT OTHERWISE PROVIDED FOR</topic><toplevel>online_resources</toplevel><creatorcontrib>RANJAN, Rakesh Kumar</creatorcontrib><creatorcontrib>SAXENA, Siddhartha</creatorcontrib><creatorcontrib>PATIL, Deepak Chandrakant</creatorcontrib><creatorcontrib>DESHMUKH, Om Dadaji</creatorcontrib><creatorcontrib>DAS, Shibsankar</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>RANJAN, Rakesh Kumar</au><au>SAXENA, Siddhartha</au><au>PATIL, Deepak Chandrakant</au><au>DESHMUKH, Om Dadaji</au><au>DAS, Shibsankar</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>ONLINE SAMPLING ANALYSIS</title><date>2021-01-13</date><risdate>2021</risdate><abstract>Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transaction records that contain least frequently appeared text segments. The system determines a respective vector representation for each selected transaction record. The system can measure similarity between transaction records based on the vector representations. The system assigns the selected transaction records to multiple clusters based on the vector representations and designated dimensions of importance. The system identifies one or more anchors that include transaction records on boundaries between clusters. The system filters the subset of transaction records by removing transaction records that are close to the anchors. The system then provides the filtered subset as a representative set of samples to a sample consumer.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng ; fre ; ger
recordid cdi_epo_espacenet_EP3762884A1
source esp@cenet
subjects CALCULATING
COMPUTING
COUNTING
DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FORADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORYOR FORECASTING PURPOSES
ELECTRIC DIGITAL DATA PROCESSING
PHYSICS
SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE,COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTINGPURPOSES, NOT OTHERWISE PROVIDED FOR
title ONLINE SAMPLING ANALYSIS
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T18%3A03%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=RANJAN,%20Rakesh%20Kumar&rft.date=2021-01-13&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EEP3762884A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true