Normalizing electronic communications using feature sets

Electronic communications can be normalized using feature sets. For example, an electronic representation of a noncanonical communication can be received, and multiple candidate canonical versions of the noncanonical communication can be determined. A first feature set representative of the noncanon...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: JIN NING, COX JAMES ALLEN
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator JIN NING
COX JAMES ALLEN
description Electronic communications can be normalized using feature sets. For example, an electronic representation of a noncanonical communication can be received, and multiple candidate canonical versions of the noncanonical communication can be determined. A first feature set representative of the noncanonical communication can be determined by splitting the noncanonical communication into at least one n-gram and at least one k-skip-n-gram. Multiple comparison feature sets can be determined by splitting multiple terms in training data into respective comparison feature sets. Multiple Jaccard index values can be determined using the first feature set and the multiple comparison feature sets. A subset of the multiple terms in the training data in which an associated Jaccard index value exceeds a threshold can be selected. The subset of the multiple terms can be included in the multiple candidate canonical versions. A normalized version of the noncanonical communication can be selected from the multiple candidate canonical versions.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US9280747B1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US9280747B1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US9280747B13</originalsourceid><addsrcrecordid>eNrjZLDwyy_KTczJrMrMS1dIzUlNLinKz8tMVkjOz80tBTISSzLz84oVSotB8mmpiSWlRakKxaklxTwMrGmJOcWpvFCam0HBzTXE2UM3tSA_PrW4IDE5NS-1JD402NLIwsDcxNzJ0JgIJQAgZS8F</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Normalizing electronic communications using feature sets</title><source>esp@cenet</source><creator>JIN NING ; COX JAMES ALLEN</creator><creatorcontrib>JIN NING ; COX JAMES ALLEN</creatorcontrib><description>Electronic communications can be normalized using feature sets. For example, an electronic representation of a noncanonical communication can be received, and multiple candidate canonical versions of the noncanonical communication can be determined. A first feature set representative of the noncanonical communication can be determined by splitting the noncanonical communication into at least one n-gram and at least one k-skip-n-gram. Multiple comparison feature sets can be determined by splitting multiple terms in training data into respective comparison feature sets. Multiple Jaccard index values can be determined using the first feature set and the multiple comparison feature sets. A subset of the multiple terms in the training data in which an associated Jaccard index value exceeds a threshold can be selected. The subset of the multiple terms can be included in the multiple candidate canonical versions. A normalized version of the noncanonical communication can be selected from the multiple candidate canonical versions.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC COMMUNICATION TECHNIQUE ; ELECTRIC DIGITAL DATA PROCESSING ; ELECTRICITY ; PHYSICS ; TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHICCOMMUNICATION</subject><creationdate>2016</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20160308&amp;DB=EPODOC&amp;CC=US&amp;NR=9280747B1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25562,76317</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20160308&amp;DB=EPODOC&amp;CC=US&amp;NR=9280747B1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>JIN NING</creatorcontrib><creatorcontrib>COX JAMES ALLEN</creatorcontrib><title>Normalizing electronic communications using feature sets</title><description>Electronic communications can be normalized using feature sets. For example, an electronic representation of a noncanonical communication can be received, and multiple candidate canonical versions of the noncanonical communication can be determined. A first feature set representative of the noncanonical communication can be determined by splitting the noncanonical communication into at least one n-gram and at least one k-skip-n-gram. Multiple comparison feature sets can be determined by splitting multiple terms in training data into respective comparison feature sets. Multiple Jaccard index values can be determined using the first feature set and the multiple comparison feature sets. A subset of the multiple terms in the training data in which an associated Jaccard index value exceeds a threshold can be selected. The subset of the multiple terms can be included in the multiple candidate canonical versions. A normalized version of the noncanonical communication can be selected from the multiple candidate canonical versions.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC COMMUNICATION TECHNIQUE</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>ELECTRICITY</subject><subject>PHYSICS</subject><subject>TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHICCOMMUNICATION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2016</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLDwyy_KTczJrMrMS1dIzUlNLinKz8tMVkjOz80tBTISSzLz84oVSotB8mmpiSWlRakKxaklxTwMrGmJOcWpvFCam0HBzTXE2UM3tSA_PrW4IDE5NS-1JD402NLIwsDcxNzJ0JgIJQAgZS8F</recordid><startdate>20160308</startdate><enddate>20160308</enddate><creator>JIN NING</creator><creator>COX JAMES ALLEN</creator><scope>EVB</scope></search><sort><creationdate>20160308</creationdate><title>Normalizing electronic communications using feature sets</title><author>JIN NING ; COX JAMES ALLEN</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US9280747B13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2016</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC COMMUNICATION TECHNIQUE</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>ELECTRICITY</topic><topic>PHYSICS</topic><topic>TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHICCOMMUNICATION</topic><toplevel>online_resources</toplevel><creatorcontrib>JIN NING</creatorcontrib><creatorcontrib>COX JAMES ALLEN</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>JIN NING</au><au>COX JAMES ALLEN</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Normalizing electronic communications using feature sets</title><date>2016-03-08</date><risdate>2016</risdate><abstract>Electronic communications can be normalized using feature sets. For example, an electronic representation of a noncanonical communication can be received, and multiple candidate canonical versions of the noncanonical communication can be determined. A first feature set representative of the noncanonical communication can be determined by splitting the noncanonical communication into at least one n-gram and at least one k-skip-n-gram. Multiple comparison feature sets can be determined by splitting multiple terms in training data into respective comparison feature sets. Multiple Jaccard index values can be determined using the first feature set and the multiple comparison feature sets. A subset of the multiple terms in the training data in which an associated Jaccard index value exceeds a threshold can be selected. The subset of the multiple terms can be included in the multiple candidate canonical versions. A normalized version of the noncanonical communication can be selected from the multiple candidate canonical versions.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US9280747B1
source esp@cenet
subjects CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
ELECTRIC COMMUNICATION TECHNIQUE
ELECTRIC DIGITAL DATA PROCESSING
ELECTRICITY
PHYSICS
TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHICCOMMUNICATION
title Normalizing electronic communications using feature sets
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T15%3A28%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=JIN%20NING&rft.date=2016-03-08&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS9280747B1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true