Dynamic extraction of contextually-coherent text blocks

Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic docu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Izhaki-Allerhand, Liron, Mizrachi, Ran, Asi, Abedelkader, Ronen, Royi, Jassin, Ohad
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Izhaki-Allerhand, Liron
Mizrachi, Ran
Asi, Abedelkader
Ronen, Royi
Jassin, Ohad
description Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US11031003B2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US11031003B2</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US11031003B23</originalsourceid><addsrcrecordid>eNrjZDB3qcxLzM1MVkitKClKTC7JzM9TyE9TSM7PKwGKlCbm5FTqJudnpBal5pUogIQUknLyk7OLeRhY0xJzilN5oTQ3g6Kba4izh25qQX58anFBYnJqXmpJfGiwoaGBsaGBgbGTkTExagAdmC6T</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Dynamic extraction of contextually-coherent text blocks</title><source>esp@cenet</source><creator>Izhaki-Allerhand, Liron ; Mizrachi, Ran ; Asi, Abedelkader ; Ronen, Royi ; Jassin, Ohad</creator><creatorcontrib>Izhaki-Allerhand, Liron ; Mizrachi, Ran ; Asi, Abedelkader ; Ronen, Royi ; Jassin, Ohad</creatorcontrib><description>Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210608&amp;DB=EPODOC&amp;CC=US&amp;NR=11031003B2$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76290</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210608&amp;DB=EPODOC&amp;CC=US&amp;NR=11031003B2$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Izhaki-Allerhand, Liron</creatorcontrib><creatorcontrib>Mizrachi, Ran</creatorcontrib><creatorcontrib>Asi, Abedelkader</creatorcontrib><creatorcontrib>Ronen, Royi</creatorcontrib><creatorcontrib>Jassin, Ohad</creatorcontrib><title>Dynamic extraction of contextually-coherent text blocks</title><description>Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDB3qcxLzM1MVkitKClKTC7JzM9TyE9TSM7PKwGKlCbm5FTqJudnpBal5pUogIQUknLyk7OLeRhY0xJzilN5oTQ3g6Kba4izh25qQX58anFBYnJqXmpJfGiwoaGBsaGBgbGTkTExagAdmC6T</recordid><startdate>20210608</startdate><enddate>20210608</enddate><creator>Izhaki-Allerhand, Liron</creator><creator>Mizrachi, Ran</creator><creator>Asi, Abedelkader</creator><creator>Ronen, Royi</creator><creator>Jassin, Ohad</creator><scope>EVB</scope></search><sort><creationdate>20210608</creationdate><title>Dynamic extraction of contextually-coherent text blocks</title><author>Izhaki-Allerhand, Liron ; Mizrachi, Ran ; Asi, Abedelkader ; Ronen, Royi ; Jassin, Ohad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US11031003B23</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2021</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>Izhaki-Allerhand, Liron</creatorcontrib><creatorcontrib>Mizrachi, Ran</creatorcontrib><creatorcontrib>Asi, Abedelkader</creatorcontrib><creatorcontrib>Ronen, Royi</creatorcontrib><creatorcontrib>Jassin, Ohad</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Izhaki-Allerhand, Liron</au><au>Mizrachi, Ran</au><au>Asi, Abedelkader</au><au>Ronen, Royi</au><au>Jassin, Ohad</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Dynamic extraction of contextually-coherent text blocks</title><date>2021-06-08</date><risdate>2021</risdate><abstract>Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US11031003B2
source esp@cenet
subjects ACOUSTICS
CALCULATING
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
MUSICAL INSTRUMENTS
PHYSICS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
title Dynamic extraction of contextually-coherent text blocks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T20%3A10%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Izhaki-Allerhand,%20Liron&rft.date=2021-06-08&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS11031003B2%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true