Dynamic extraction of contextually-coherent text blocks

Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic docu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Izhaki-Allerhand, Liron, Mizrachi, Ran, Asi, Abedelkader, Ronen, Royi, Jassin, Ohad
Format:	Patent
Sprache:	eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Izhaki-Allerhand, Liron Mizrachi, Ran Asi, Abedelkader Ronen, Royi Jassin, Ohad
description	Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US11031003B2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US11031003B2</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US11031003B23</originalsourceid><addsrcrecordid>eNrjZDB3qcxLzM1MVkitKClKTC7JzM9TyE9TSM7PKwGKlCbm5FTqJudnpBal5pUogIQUknLyk7OLeRhY0xJzilN5oTQ3g6Kba4izh25qQX58anFBYnJqXmpJfGiwoaGBsaGBgbGTkTExagAdmC6T</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Dynamic extraction of contextually-coherent text blocks</title><source>esp@cenet</source><creator>Izhaki-Allerhand, Liron ; Mizrachi, Ran ; Asi, Abedelkader ; Ronen, Royi ; Jassin, Ohad</creator><creatorcontrib>Izhaki-Allerhand, Liron ; Mizrachi, Ran ; Asi, Abedelkader ; Ronen, Royi ; Jassin, Ohad</creatorcontrib><description>Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210608&DB=EPODOC&CC=US&NR=11031003B2$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76290</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210608&DB=EPODOC&CC=US&NR=11031003B2$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Izhaki-Allerhand, Liron</creatorcontrib><creatorcontrib>Mizrachi, Ran</creatorcontrib><creatorcontrib>Asi, Abedelkader</creatorcontrib><creatorcontrib>Ronen, Royi</creatorcontrib><creatorcontrib>Jassin, Ohad</creatorcontrib><title>Dynamic extraction of contextually-coherent text blocks</title><description>Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDB3qcxLzM1MVkitKClKTC7JzM9TyE9TSM7PKwGKlCbm5FTqJudnpBal5pUogIQUknLyk7OLeRhY0xJzilN5oTQ3g6Kba4izh25qQX58anFBYnJqXmpJfGiwoaGBsaGBgbGTkTExagAdmC6T</recordid><startdate>20210608</startdate><enddate>20210608</enddate><creator>Izhaki-Allerhand, Liron</creator><creator>Mizrachi, Ran</creator><creator>Asi, Abedelkader</creator><creator>Ronen, Royi</creator><creator>Jassin, Ohad</creator><scope>EVB</scope></search><sort><creationdate>20210608</creationdate><title>Dynamic extraction of contextually-coherent text blocks</title><author>Izhaki-Allerhand, Liron ; Mizrachi, Ran ; Asi, Abedelkader ; Ronen, Royi ; Jassin, Ohad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US11031003B23</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2021</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>Izhaki-Allerhand, Liron</creatorcontrib><creatorcontrib>Mizrachi, Ran</creatorcontrib><creatorcontrib>Asi, Abedelkader</creatorcontrib><creatorcontrib>Ronen, Royi</creatorcontrib><creatorcontrib>Jassin, Ohad</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Izhaki-Allerhand, Liron</au><au>Mizrachi, Ran</au><au>Asi, Abedelkader</au><au>Ronen, Royi</au><au>Jassin, Ohad</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Dynamic extraction of contextually-coherent text blocks</title><date>2021-06-08</date><risdate>2021</risdate><abstract>Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US11031003B2
source	esp@cenet
subjects	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	Dynamic extraction of contextually-coherent text blocks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T20%3A10%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Izhaki-Allerhand,%20Liron&rft.date=2021-06-08&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS11031003B2%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true