Language Model Integration for the Recognition of Handwritten Medieval Documents

Building recognition systems for historical documents is a difficult task. Especially, when it comes to medieval scripts. The complexity is mainly affected by the poor quality and the small quantity of the data available. In this paper we apply an HMM based recognition system to medieval manuscripts...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wuthrich, M., Liwicki, M., Fischer, A., Indermuhle, E., Bunke, H., Viehhauser, G., Stolz, M.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Computer science Data processing Digital images Handwriting recognition Hidden Markov models Historical Documents HMM Language Model Mathematics Natural languages Overfitting Software libraries Text analysis Writing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	215
container_issue
container_start_page	211
container_title
container_volume
creator	Wuthrich, M. Liwicki, M. Fischer, A. Indermuhle, E. Bunke, H. Viehhauser, G. Stolz, M.
description	Building recognition systems for historical documents is a difficult task. Especially, when it comes to medieval scripts. The complexity is mainly affected by the poor quality and the small quantity of the data available. In this paper we apply an HMM based recognition system to medieval manuscripts from the 13th century written in Middle High German. The recognition system, which was originally developed for modern scripts, has been adapted to medieval scripts. Beside the data processing, one of the major challenges is to create a suitable language model. Because of the lack of appropriate independent text corpora for medieval languages, the language model has to be created on the base of a rather small number of manuscripts only. Due to the small size of the corpus, optimizing the language model parameters can quickly lead to the problem of overfitting. In this paper we describe a strategy to integrate all available information into the language model and to optimize the language model parameters without suffering from this problem.
doi_str_mv	10.1109/ICDAR.2009.17
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5277727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5277727</ieee_id><sourcerecordid>5277727</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-dea316258dafe25096c5a3c892b698cc87fa5c4b04dfbdc3f8828ff0e181fb8f3</originalsourceid><addsrcrecordid>eNotzMtOAjEUgOF6SwRk6cpNX2Dw9DZtlwQvkEA0hD3ptKdjDXTMTNH49ibq6k--xU_ILYMZY2DvV4uH-XbGAeyM6TMytdqArq0Smit-TkZcaFtxJuGCjJnkUkoFAJdkxBSHSolaXJPxMLwDMGttPSKva5fbk2uRbrqAB7rKBdveldRlGrueljekW_Rdm9OvdZEuXQ5ffSoFM91gSPjpDvSh86cj5jLckKvoDgNO_zshu6fH3WJZrV-eV4v5ukoWShXQCVZzZYKLyBXY2isnvLG8qa3x3ujolJcNyBCb4EU0hpsYAZlhsTFRTMjd3zYh4v6jT0fXf-8V11pzLX4AG1tTaw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Language Model Integration for the Recognition of Handwritten Medieval Documents</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Wuthrich, M. ; Liwicki, M. ; Fischer, A. ; Indermuhle, E. ; Bunke, H. ; Viehhauser, G. ; Stolz, M.</creator><creatorcontrib>Wuthrich, M. ; Liwicki, M. ; Fischer, A. ; Indermuhle, E. ; Bunke, H. ; Viehhauser, G. ; Stolz, M.</creatorcontrib><description>Building recognition systems for historical documents is a difficult task. Especially, when it comes to medieval scripts. The complexity is mainly affected by the poor quality and the small quantity of the data available. In this paper we apply an HMM based recognition system to medieval manuscripts from the 13th century written in Middle High German. The recognition system, which was originally developed for modern scripts, has been adapted to medieval scripts. Beside the data processing, one of the major challenges is to create a suitable language model. Because of the lack of appropriate independent text corpora for medieval languages, the language model has to be created on the base of a rather small number of manuscripts only. Due to the small size of the corpus, optimizing the language model parameters can quickly lead to the problem of overfitting. In this paper we describe a strategy to integrate all available information into the language model and to optimize the language model parameters without suffering from this problem.</description><identifier>ISSN: 1520-5363</identifier><identifier>ISBN: 1424445000</identifier><identifier>ISBN: 9781424445004</identifier><identifier>EISSN: 2379-2140</identifier><identifier>EISBN: 9780769537252</identifier><identifier>EISBN: 0769537251</identifier><identifier>DOI: 10.1109/ICDAR.2009.17</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computer science ; Data processing ; Digital images ; Handwriting recognition ; Hidden Markov models ; Historical Documents ; HMM ; Language Model ; Mathematics ; Natural languages ; Overfitting ; Software libraries ; Text analysis ; Writing</subject><ispartof>2009 10th International Conference on Document Analysis and Recognition, 2009, p.211-215</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5277727$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5277727$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wuthrich, M.</creatorcontrib><creatorcontrib>Liwicki, M.</creatorcontrib><creatorcontrib>Fischer, A.</creatorcontrib><creatorcontrib>Indermuhle, E.</creatorcontrib><creatorcontrib>Bunke, H.</creatorcontrib><creatorcontrib>Viehhauser, G.</creatorcontrib><creatorcontrib>Stolz, M.</creatorcontrib><title>Language Model Integration for the Recognition of Handwritten Medieval Documents</title><title>2009 10th International Conference on Document Analysis and Recognition</title><addtitle>ICDAR</addtitle><description>Building recognition systems for historical documents is a difficult task. Especially, when it comes to medieval scripts. The complexity is mainly affected by the poor quality and the small quantity of the data available. In this paper we apply an HMM based recognition system to medieval manuscripts from the 13th century written in Middle High German. The recognition system, which was originally developed for modern scripts, has been adapted to medieval scripts. Beside the data processing, one of the major challenges is to create a suitable language model. Because of the lack of appropriate independent text corpora for medieval languages, the language model has to be created on the base of a rather small number of manuscripts only. Due to the small size of the corpus, optimizing the language model parameters can quickly lead to the problem of overfitting. In this paper we describe a strategy to integrate all available information into the language model and to optimize the language model parameters without suffering from this problem.</description><subject>Computer science</subject><subject>Data processing</subject><subject>Digital images</subject><subject>Handwriting recognition</subject><subject>Hidden Markov models</subject><subject>Historical Documents</subject><subject>HMM</subject><subject>Language Model</subject><subject>Mathematics</subject><subject>Natural languages</subject><subject>Overfitting</subject><subject>Software libraries</subject><subject>Text analysis</subject><subject>Writing</subject><issn>1520-5363</issn><issn>2379-2140</issn><isbn>1424445000</isbn><isbn>9781424445004</isbn><isbn>9780769537252</isbn><isbn>0769537251</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotzMtOAjEUgOF6SwRk6cpNX2Dw9DZtlwQvkEA0hD3ptKdjDXTMTNH49ibq6k--xU_ILYMZY2DvV4uH-XbGAeyM6TMytdqArq0Smit-TkZcaFtxJuGCjJnkUkoFAJdkxBSHSolaXJPxMLwDMGttPSKva5fbk2uRbrqAB7rKBdveldRlGrueljekW_Rdm9OvdZEuXQ5ffSoFM91gSPjpDvSh86cj5jLckKvoDgNO_zshu6fH3WJZrV-eV4v5ukoWShXQCVZzZYKLyBXY2isnvLG8qa3x3ujolJcNyBCb4EU0hpsYAZlhsTFRTMjd3zYh4v6jT0fXf-8V11pzLX4AG1tTaw</recordid><startdate>200907</startdate><enddate>200907</enddate><creator>Wuthrich, M.</creator><creator>Liwicki, M.</creator><creator>Fischer, A.</creator><creator>Indermuhle, E.</creator><creator>Bunke, H.</creator><creator>Viehhauser, G.</creator><creator>Stolz, M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200907</creationdate><title>Language Model Integration for the Recognition of Handwritten Medieval Documents</title><author>Wuthrich, M. ; Liwicki, M. ; Fischer, A. ; Indermuhle, E. ; Bunke, H. ; Viehhauser, G. ; Stolz, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-dea316258dafe25096c5a3c892b698cc87fa5c4b04dfbdc3f8828ff0e181fb8f3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Computer science</topic><topic>Data processing</topic><topic>Digital images</topic><topic>Handwriting recognition</topic><topic>Hidden Markov models</topic><topic>Historical Documents</topic><topic>HMM</topic><topic>Language Model</topic><topic>Mathematics</topic><topic>Natural languages</topic><topic>Overfitting</topic><topic>Software libraries</topic><topic>Text analysis</topic><topic>Writing</topic><toplevel>online_resources</toplevel><creatorcontrib>Wuthrich, M.</creatorcontrib><creatorcontrib>Liwicki, M.</creatorcontrib><creatorcontrib>Fischer, A.</creatorcontrib><creatorcontrib>Indermuhle, E.</creatorcontrib><creatorcontrib>Bunke, H.</creatorcontrib><creatorcontrib>Viehhauser, G.</creatorcontrib><creatorcontrib>Stolz, M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wuthrich, M.</au><au>Liwicki, M.</au><au>Fischer, A.</au><au>Indermuhle, E.</au><au>Bunke, H.</au><au>Viehhauser, G.</au><au>Stolz, M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Language Model Integration for the Recognition of Handwritten Medieval Documents</atitle><btitle>2009 10th International Conference on Document Analysis and Recognition</btitle><stitle>ICDAR</stitle><date>2009-07</date><risdate>2009</risdate><spage>211</spage><epage>215</epage><pages>211-215</pages><issn>1520-5363</issn><eissn>2379-2140</eissn><isbn>1424445000</isbn><isbn>9781424445004</isbn><eisbn>9780769537252</eisbn><eisbn>0769537251</eisbn><abstract>Building recognition systems for historical documents is a difficult task. Especially, when it comes to medieval scripts. The complexity is mainly affected by the poor quality and the small quantity of the data available. In this paper we apply an HMM based recognition system to medieval manuscripts from the 13th century written in Middle High German. The recognition system, which was originally developed for modern scripts, has been adapted to medieval scripts. Beside the data processing, one of the major challenges is to create a suitable language model. Because of the lack of appropriate independent text corpora for medieval languages, the language model has to be created on the base of a rather small number of manuscripts only. Due to the small size of the corpus, optimizing the language model parameters can quickly lead to the problem of overfitting. In this paper we describe a strategy to integrate all available information into the language model and to optimize the language model parameters without suffering from this problem.</abstract><pub>IEEE</pub><doi>10.1109/ICDAR.2009.17</doi><tpages>5</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-5363
ispartof	2009 10th International Conference on Document Analysis and Recognition, 2009, p.211-215
issn	1520-5363 2379-2140
language	eng
recordid	cdi_ieee_primary_5277727
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Computer science Data processing Digital images Handwriting recognition Hidden Markov models Historical Documents HMM Language Model Mathematics Natural languages Overfitting Software libraries Text analysis Writing
title	Language Model Integration for the Recognition of Handwritten Medieval Documents
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T08%3A43%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Language%20Model%20Integration%20for%20the%20Recognition%20of%20Handwritten%20Medieval%20Documents&rft.btitle=2009%2010th%20International%20Conference%20on%20Document%20Analysis%20and%20Recognition&rft.au=Wuthrich,%20M.&rft.date=2009-07&rft.spage=211&rft.epage=215&rft.pages=211-215&rft.issn=1520-5363&rft.eissn=2379-2140&rft.isbn=1424445000&rft.isbn_list=9781424445004&rft_id=info:doi/10.1109/ICDAR.2009.17&rft_dat=%3Cieee_6IE%3E5277727%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780769537252&rft.eisbn_list=0769537251&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5277727&rfr_iscdi=true