ONLINE SAMPLING ANALYSIS
Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transacti...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Patil, Deepak Chandrakant Das, Shibsankar Deshmukh, Om Dadaji Ranjan, Rakesh Kumar Saxena, Siddhartha |
description | Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transaction records that contain least frequently appeared text segments. The system determines a respective vector representation for each selected transaction record. The system can measure similarity between transaction records based on the vector representations. The system assigns the selected transaction records to multiple clusters based on the vector representations and designated dimensions of importance. The system identifies one or more anchors that include transaction records on boundaries between clusters. The system filters the subset of transaction records by removing transaction records that are close to the anchors. The system then provides the filtered subset as a representative set of samples to a sample consumer. |
format | Patent |
fullrecord | <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2019272482A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2019272482A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2019272482A13</originalsourceid><addsrcrecordid>eNrjZJDw9_Px9HNVCHb0DQAy3BUc_Rx9IoM9g3kYWNMSc4pTeaE0N4Oym2uIs4duakF-fGpxQWJyal5qSXxosJGBoaWRuZGJhZGjoTFxqgB7FSDB</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>ONLINE SAMPLING ANALYSIS</title><source>esp@cenet</source><creator>Patil, Deepak Chandrakant ; Das, Shibsankar ; Deshmukh, Om Dadaji ; Ranjan, Rakesh Kumar ; Saxena, Siddhartha</creator><creatorcontrib>Patil, Deepak Chandrakant ; Das, Shibsankar ; Deshmukh, Om Dadaji ; Ranjan, Rakesh Kumar ; Saxena, Siddhartha</creatorcontrib><description>Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transaction records that contain least frequently appeared text segments. The system determines a respective vector representation for each selected transaction record. The system can measure similarity between transaction records based on the vector representations. The system assigns the selected transaction records to multiple clusters based on the vector representations and designated dimensions of importance. The system identifies one or more anchors that include transaction records on boundaries between clusters. The system filters the subset of transaction records by removing transaction records that are close to the anchors. The system then provides the filtered subset as a representative set of samples to a sample consumer.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; PHYSICS</subject><creationdate>2019</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20190905&DB=EPODOC&CC=US&NR=2019272482A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20190905&DB=EPODOC&CC=US&NR=2019272482A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Patil, Deepak Chandrakant</creatorcontrib><creatorcontrib>Das, Shibsankar</creatorcontrib><creatorcontrib>Deshmukh, Om Dadaji</creatorcontrib><creatorcontrib>Ranjan, Rakesh Kumar</creatorcontrib><creatorcontrib>Saxena, Siddhartha</creatorcontrib><title>ONLINE SAMPLING ANALYSIS</title><description>Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transaction records that contain least frequently appeared text segments. The system determines a respective vector representation for each selected transaction record. The system can measure similarity between transaction records based on the vector representations. The system assigns the selected transaction records to multiple clusters based on the vector representations and designated dimensions of importance. The system identifies one or more anchors that include transaction records on boundaries between clusters. The system filters the subset of transaction records by removing transaction records that are close to the anchors. The system then provides the filtered subset as a representative set of samples to a sample consumer.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2019</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZJDw9_Px9HNVCHb0DQAy3BUc_Rx9IoM9g3kYWNMSc4pTeaE0N4Oym2uIs4duakF-fGpxQWJyal5qSXxosJGBoaWRuZGJhZGjoTFxqgB7FSDB</recordid><startdate>20190905</startdate><enddate>20190905</enddate><creator>Patil, Deepak Chandrakant</creator><creator>Das, Shibsankar</creator><creator>Deshmukh, Om Dadaji</creator><creator>Ranjan, Rakesh Kumar</creator><creator>Saxena, Siddhartha</creator><scope>EVB</scope></search><sort><creationdate>20190905</creationdate><title>ONLINE SAMPLING ANALYSIS</title><author>Patil, Deepak Chandrakant ; Das, Shibsankar ; Deshmukh, Om Dadaji ; Ranjan, Rakesh Kumar ; Saxena, Siddhartha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2019272482A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2019</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>Patil, Deepak Chandrakant</creatorcontrib><creatorcontrib>Das, Shibsankar</creatorcontrib><creatorcontrib>Deshmukh, Om Dadaji</creatorcontrib><creatorcontrib>Ranjan, Rakesh Kumar</creatorcontrib><creatorcontrib>Saxena, Siddhartha</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Patil, Deepak Chandrakant</au><au>Das, Shibsankar</au><au>Deshmukh, Om Dadaji</au><au>Ranjan, Rakesh Kumar</au><au>Saxena, Siddhartha</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>ONLINE SAMPLING ANALYSIS</title><date>2019-09-05</date><risdate>2019</risdate><abstract>Methods, systems and computer program products generating diverse and representative set of samples from a large amount of transaction data are disclosed. A data sampling system receives transaction records. Each transaction record has multiple text segments. The system selects a subset of transaction records that contain least frequently appeared text segments. The system determines a respective vector representation for each selected transaction record. The system can measure similarity between transaction records based on the vector representations. The system assigns the selected transaction records to multiple clusters based on the vector representations and designated dimensions of importance. The system identifies one or more anchors that include transaction records on boundaries between clusters. The system filters the subset of transaction records by removing transaction records that are close to the anchors. The system then provides the filtered subset as a representative set of samples to a sample consumer.</abstract><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | eng |
recordid | cdi_epo_espacenet_US2019272482A1 |
source | esp@cenet |
subjects | CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS |
title | ONLINE SAMPLING ANALYSIS |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T18%3A17%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Patil,%20Deepak%20Chandrakant&rft.date=2019-09-05&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2019272482A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |