TEXT-IMAGE-LAYOUT TRANSFORMER (TILT)

Systems and methods are disclosed for generating a Natural Language Processing (NLP) model through iterative training. A method involves processing a plurality of real-world documents, each containing text data, layout data, and image data, using at least one hardware processor. An initial predictio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	DWOJAK, Tomasz, BORCHMANN, Lukasz Konrad, PALKA, Gabriela Klaudia, PIETRUSZKA, Michal Waldemar, JURKIEWICZ, Dawid Andrzej
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	CALCULATING COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	DWOJAK, Tomasz BORCHMANN, Lukasz Konrad PALKA, Gabriela Klaudia PIETRUSZKA, Michal Waldemar JURKIEWICZ, Dawid Andrzej
description	Systems and methods are disclosed for generating a Natural Language Processing (NLP) model through iterative training. A method involves processing a plurality of real-world documents, each containing text data, layout data, and image data, using at least one hardware processor. An initial prediction for data points within the documents is generated using a neural network. The initial prediction is then validated by comparing extracted values with the information present in the documents and correcting any discrepancies. The quality of the NLP model is evaluated based on the validated predictions, and upon satisfying a quality constraint, the NLP model is configured to process new documents to extract data points without further validation. This method streamlines the extraction of information from diverse document formats, enhancing the efficiency and accuracy of data retrieval in automated systems.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_EP4295266A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EP4295266A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_EP4295266A13</originalsourceid><addsrcrecordid>eNrjZFAJcY0I0fX0dXR31fVxjPQPDVEICXL0C3bzD_J1DVLQCPH0CdHkYWBNS8wpTuWF0twMCm6uIc4euqkF-fGpxQWJyal5qSXxrgEmRpamRmZmjobGRCgBAEWMIrg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>TEXT-IMAGE-LAYOUT TRANSFORMER (TILT)</title><source>esp@cenet</source><creator>DWOJAK, Tomasz ; BORCHMANN, Lukasz Konrad ; PALKA, Gabriela Klaudia ; PIETRUSZKA, Michal Waldemar ; JURKIEWICZ, Dawid Andrzej</creator><creatorcontrib>DWOJAK, Tomasz ; BORCHMANN, Lukasz Konrad ; PALKA, Gabriela Klaudia ; PIETRUSZKA, Michal Waldemar ; JURKIEWICZ, Dawid Andrzej</creatorcontrib><description>Systems and methods are disclosed for generating a Natural Language Processing (NLP) model through iterative training. A method involves processing a plurality of real-world documents, each containing text data, layout data, and image data, using at least one hardware processor. An initial prediction for data points within the documents is generated using a neural network. The initial prediction is then validated by comparing extracted values with the information present in the documents and correcting any discrepancies. The quality of the NLP model is evaluated based on the validated predictions, and upon satisfying a quality constraint, the NLP model is configured to process new documents to extract data points without further validation. This method streamlines the extraction of information from diverse document formats, enhancing the efficiency and accuracy of data retrieval in automated systems.</description><language>eng ; fre ; ger</language><subject>CALCULATING ; COMPUTING ; COUNTING ; PHYSICS</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20231227&DB=EPODOC&CC=EP&NR=4295266A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76516</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20231227&DB=EPODOC&CC=EP&NR=4295266A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>DWOJAK, Tomasz</creatorcontrib><creatorcontrib>BORCHMANN, Lukasz Konrad</creatorcontrib><creatorcontrib>PALKA, Gabriela Klaudia</creatorcontrib><creatorcontrib>PIETRUSZKA, Michal Waldemar</creatorcontrib><creatorcontrib>JURKIEWICZ, Dawid Andrzej</creatorcontrib><title>TEXT-IMAGE-LAYOUT TRANSFORMER (TILT)</title><description>Systems and methods are disclosed for generating a Natural Language Processing (NLP) model through iterative training. A method involves processing a plurality of real-world documents, each containing text data, layout data, and image data, using at least one hardware processor. An initial prediction for data points within the documents is generated using a neural network. The initial prediction is then validated by comparing extracted values with the information present in the documents and correcting any discrepancies. The quality of the NLP model is evaluated based on the validated predictions, and upon satisfying a quality constraint, the NLP model is configured to process new documents to extract data points without further validation. This method streamlines the extraction of information from diverse document formats, enhancing the efficiency and accuracy of data retrieval in automated systems.</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2023</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZFAJcY0I0fX0dXR31fVxjPQPDVEICXL0C3bzD_J1DVLQCPH0CdHkYWBNS8wpTuWF0twMCm6uIc4euqkF-fGpxQWJyal5qSXxrgEmRpamRmZmjobGRCgBAEWMIrg</recordid><startdate>20231227</startdate><enddate>20231227</enddate><creator>DWOJAK, Tomasz</creator><creator>BORCHMANN, Lukasz Konrad</creator><creator>PALKA, Gabriela Klaudia</creator><creator>PIETRUSZKA, Michal Waldemar</creator><creator>JURKIEWICZ, Dawid Andrzej</creator><scope>EVB</scope></search><sort><creationdate>20231227</creationdate><title>TEXT-IMAGE-LAYOUT TRANSFORMER (TILT)</title><author>DWOJAK, Tomasz ; BORCHMANN, Lukasz Konrad ; PALKA, Gabriela Klaudia ; PIETRUSZKA, Michal Waldemar ; JURKIEWICZ, Dawid Andrzej</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_EP4295266A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng ; fre ; ger</language><creationdate>2023</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>DWOJAK, Tomasz</creatorcontrib><creatorcontrib>BORCHMANN, Lukasz Konrad</creatorcontrib><creatorcontrib>PALKA, Gabriela Klaudia</creatorcontrib><creatorcontrib>PIETRUSZKA, Michal Waldemar</creatorcontrib><creatorcontrib>JURKIEWICZ, Dawid Andrzej</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>DWOJAK, Tomasz</au><au>BORCHMANN, Lukasz Konrad</au><au>PALKA, Gabriela Klaudia</au><au>PIETRUSZKA, Michal Waldemar</au><au>JURKIEWICZ, Dawid Andrzej</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>TEXT-IMAGE-LAYOUT TRANSFORMER (TILT)</title><date>2023-12-27</date><risdate>2023</risdate><abstract>Systems and methods are disclosed for generating a Natural Language Processing (NLP) model through iterative training. A method involves processing a plurality of real-world documents, each containing text data, layout data, and image data, using at least one hardware processor. An initial prediction for data points within the documents is generated using a neural network. The initial prediction is then validated by comparing extracted values with the information present in the documents and correcting any discrepancies. The quality of the NLP model is evaluated based on the validated predictions, and upon satisfying a quality constraint, the NLP model is configured to process new documents to extract data points without further validation. This method streamlines the extraction of information from diverse document formats, enhancing the efficiency and accuracy of data retrieval in automated systems.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng ; fre ; ger
recordid	cdi_epo_espacenet_EP4295266A1
source	esp@cenet
subjects	CALCULATING COMPUTING COUNTING PHYSICS
title	TEXT-IMAGE-LAYOUT TRANSFORMER (TILT)
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T16%3A21%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=DWOJAK,%20Tomasz&rft.date=2023-12-27&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EEP4295266A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true