Text Reuse Detection in Handwritten Documents

Plagiarism detection in scholar assignments becomes more and more relevant nowadays. Rapidly growing popularity of online education, active expansion of online educational platforms for secondary and high school education create demand for development of an automatic reuse detection system for handw...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Doklady. Mathematics 2023-12, Vol.108 (Suppl 2), p.S424-S433
Hauptverfasser: Grabovoy, A. V., Kaprielova, M. S., Kildyakov, A. S., Potyashin, I. O., Seyil, T. B., Finogeev, E. L., Chekhovich, Yu. V.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page S433
container_issue Suppl 2
container_start_page S424
container_title Doklady. Mathematics
container_volume 108
creator Grabovoy, A. V.
Kaprielova, M. S.
Kildyakov, A. S.
Potyashin, I. O.
Seyil, T. B.
Finogeev, E. L.
Chekhovich, Yu. V.
description Plagiarism detection in scholar assignments becomes more and more relevant nowadays. Rapidly growing popularity of online education, active expansion of online educational platforms for secondary and high school education create demand for development of an automatic reuse detection system for handwritten assignments. The existing approaches to this problem are not usable for searching for potential sources of reuse on large collections, which significantly limits their applicability. Moreover, real-life data are likely to be low-quality photographs taken with mobile devices. We propose an approach that allows detecting text reuse in handwritten documents. Each document is a picture and the search is performed on a large collection of potential sources. The proposed method consists of three stages: handwritten text recognition, candidate search and precise source retrieval. We represent experimental results for the quality and latency estimation of our system. The recall reaches 83.3% in case of better quality pictures and 77.4% in case of pictures of lower quality. The average search time is 3.2 s per document on CPU. The results show that the created system is scalable and can be used in production, where fast reuse detection for hundreds of thousands of scholar assignments on large collection of potential reuse sources is needed. All the experiments were held on HWR200 public dataset.
doi_str_mv 10.1134/S106456242370120X
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2985941247</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2985941247</sourcerecordid><originalsourceid>FETCH-LOGICAL-c268t-8f41e20387a1d3400b256b895fbe33a197acd55aa00895c43e9468e6435a1593</originalsourceid><addsrcrecordid>eNp1kE9Lw0AQxRdRsFY_gLeA5-jM_svmKK22QkHQHryFbTKRFLupuxvUb--WCB7E0wzzfu8NPMYuEa4Rhbx5RtBSaS65KAA5vByxCSqBuRGaH6c9yflBP2VnIWwBpOIAE5av6TNmTzQEyuYUqY5d77LOZUvrmg_fxUgum_f1sCMXwzk7ae1boIufOWXr-7v1bJmvHhcPs9tVXnNtYm5aicRBmMJiIyTAhiu9MaVqNySExbKwdaOUtQDpWEtBpdSGtBTKoirFlF2NsXvfvw8UYrXtB-_Sx4qXRpUSuSwShSNV-z4ET221993O-q8KoTqUUv0pJXn46AmJda_kf5P_N30DNrJhDg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2985941247</pqid></control><display><type>article</type><title>Text Reuse Detection in Handwritten Documents</title><source>SpringerNature Journals</source><creator>Grabovoy, A. V. ; Kaprielova, M. S. ; Kildyakov, A. S. ; Potyashin, I. O. ; Seyil, T. B. ; Finogeev, E. L. ; Chekhovich, Yu. V.</creator><creatorcontrib>Grabovoy, A. V. ; Kaprielova, M. S. ; Kildyakov, A. S. ; Potyashin, I. O. ; Seyil, T. B. ; Finogeev, E. L. ; Chekhovich, Yu. V.</creatorcontrib><description>Plagiarism detection in scholar assignments becomes more and more relevant nowadays. Rapidly growing popularity of online education, active expansion of online educational platforms for secondary and high school education create demand for development of an automatic reuse detection system for handwritten assignments. The existing approaches to this problem are not usable for searching for potential sources of reuse on large collections, which significantly limits their applicability. Moreover, real-life data are likely to be low-quality photographs taken with mobile devices. We propose an approach that allows detecting text reuse in handwritten documents. Each document is a picture and the search is performed on a large collection of potential sources. The proposed method consists of three stages: handwritten text recognition, candidate search and precise source retrieval. We represent experimental results for the quality and latency estimation of our system. The recall reaches 83.3% in case of better quality pictures and 77.4% in case of pictures of lower quality. The average search time is 3.2 s per document on CPU. The results show that the created system is scalable and can be used in production, where fast reuse detection for hundreds of thousands of scholar assignments on large collection of potential reuse sources is needed. All the experiments were held on HWR200 public dataset.</description><identifier>ISSN: 1064-5624</identifier><identifier>EISSN: 1531-8362</identifier><identifier>DOI: 10.1134/S106456242370120X</identifier><language>eng</language><publisher>Moscow: Pleiades Publishing</publisher><subject>Documents ; Education ; Handwriting recognition ; Mathematics ; Mathematics and Statistics ; Pictures ; Searching</subject><ispartof>Doklady. Mathematics, 2023-12, Vol.108 (Suppl 2), p.S424-S433</ispartof><rights>Pleiades Publishing, Ltd. 2023. ISSN 1064-5624, Doklady Mathematics, 2023, Vol. 108, Suppl. 2, pp. S424–S433. © Pleiades Publishing, Ltd., 2023. ISSN 1064-5624, Doklady Mathematics, 2023. © Pleiades Publishing, Ltd., 2023. Russian Text © The Author(s), 2023, published in Doklady Rossiiskoi Akademii Nauk. Matematika, Informatika, Protsessy Upravleniya, 2023, Vol. 514, No. 2, pp. 297–307.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c268t-8f41e20387a1d3400b256b895fbe33a197acd55aa00895c43e9468e6435a1593</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1134/S106456242370120X$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1134/S106456242370120X$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Grabovoy, A. V.</creatorcontrib><creatorcontrib>Kaprielova, M. S.</creatorcontrib><creatorcontrib>Kildyakov, A. S.</creatorcontrib><creatorcontrib>Potyashin, I. O.</creatorcontrib><creatorcontrib>Seyil, T. B.</creatorcontrib><creatorcontrib>Finogeev, E. L.</creatorcontrib><creatorcontrib>Chekhovich, Yu. V.</creatorcontrib><title>Text Reuse Detection in Handwritten Documents</title><title>Doklady. Mathematics</title><addtitle>Dokl. Math</addtitle><description>Plagiarism detection in scholar assignments becomes more and more relevant nowadays. Rapidly growing popularity of online education, active expansion of online educational platforms for secondary and high school education create demand for development of an automatic reuse detection system for handwritten assignments. The existing approaches to this problem are not usable for searching for potential sources of reuse on large collections, which significantly limits their applicability. Moreover, real-life data are likely to be low-quality photographs taken with mobile devices. We propose an approach that allows detecting text reuse in handwritten documents. Each document is a picture and the search is performed on a large collection of potential sources. The proposed method consists of three stages: handwritten text recognition, candidate search and precise source retrieval. We represent experimental results for the quality and latency estimation of our system. The recall reaches 83.3% in case of better quality pictures and 77.4% in case of pictures of lower quality. The average search time is 3.2 s per document on CPU. The results show that the created system is scalable and can be used in production, where fast reuse detection for hundreds of thousands of scholar assignments on large collection of potential reuse sources is needed. All the experiments were held on HWR200 public dataset.</description><subject>Documents</subject><subject>Education</subject><subject>Handwriting recognition</subject><subject>Mathematics</subject><subject>Mathematics and Statistics</subject><subject>Pictures</subject><subject>Searching</subject><issn>1064-5624</issn><issn>1531-8362</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp1kE9Lw0AQxRdRsFY_gLeA5-jM_svmKK22QkHQHryFbTKRFLupuxvUb--WCB7E0wzzfu8NPMYuEa4Rhbx5RtBSaS65KAA5vByxCSqBuRGaH6c9yflBP2VnIWwBpOIAE5av6TNmTzQEyuYUqY5d77LOZUvrmg_fxUgum_f1sCMXwzk7ae1boIufOWXr-7v1bJmvHhcPs9tVXnNtYm5aicRBmMJiIyTAhiu9MaVqNySExbKwdaOUtQDpWEtBpdSGtBTKoirFlF2NsXvfvw8UYrXtB-_Sx4qXRpUSuSwShSNV-z4ET221993O-q8KoTqUUv0pJXn46AmJda_kf5P_N30DNrJhDg</recordid><startdate>20231201</startdate><enddate>20231201</enddate><creator>Grabovoy, A. V.</creator><creator>Kaprielova, M. S.</creator><creator>Kildyakov, A. S.</creator><creator>Potyashin, I. O.</creator><creator>Seyil, T. B.</creator><creator>Finogeev, E. L.</creator><creator>Chekhovich, Yu. V.</creator><general>Pleiades Publishing</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20231201</creationdate><title>Text Reuse Detection in Handwritten Documents</title><author>Grabovoy, A. V. ; Kaprielova, M. S. ; Kildyakov, A. S. ; Potyashin, I. O. ; Seyil, T. B. ; Finogeev, E. L. ; Chekhovich, Yu. V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c268t-8f41e20387a1d3400b256b895fbe33a197acd55aa00895c43e9468e6435a1593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Documents</topic><topic>Education</topic><topic>Handwriting recognition</topic><topic>Mathematics</topic><topic>Mathematics and Statistics</topic><topic>Pictures</topic><topic>Searching</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Grabovoy, A. V.</creatorcontrib><creatorcontrib>Kaprielova, M. S.</creatorcontrib><creatorcontrib>Kildyakov, A. S.</creatorcontrib><creatorcontrib>Potyashin, I. O.</creatorcontrib><creatorcontrib>Seyil, T. B.</creatorcontrib><creatorcontrib>Finogeev, E. L.</creatorcontrib><creatorcontrib>Chekhovich, Yu. V.</creatorcontrib><collection>CrossRef</collection><jtitle>Doklady. Mathematics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Grabovoy, A. V.</au><au>Kaprielova, M. S.</au><au>Kildyakov, A. S.</au><au>Potyashin, I. O.</au><au>Seyil, T. B.</au><au>Finogeev, E. L.</au><au>Chekhovich, Yu. V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text Reuse Detection in Handwritten Documents</atitle><jtitle>Doklady. Mathematics</jtitle><stitle>Dokl. Math</stitle><date>2023-12-01</date><risdate>2023</risdate><volume>108</volume><issue>Suppl 2</issue><spage>S424</spage><epage>S433</epage><pages>S424-S433</pages><issn>1064-5624</issn><eissn>1531-8362</eissn><abstract>Plagiarism detection in scholar assignments becomes more and more relevant nowadays. Rapidly growing popularity of online education, active expansion of online educational platforms for secondary and high school education create demand for development of an automatic reuse detection system for handwritten assignments. The existing approaches to this problem are not usable for searching for potential sources of reuse on large collections, which significantly limits their applicability. Moreover, real-life data are likely to be low-quality photographs taken with mobile devices. We propose an approach that allows detecting text reuse in handwritten documents. Each document is a picture and the search is performed on a large collection of potential sources. The proposed method consists of three stages: handwritten text recognition, candidate search and precise source retrieval. We represent experimental results for the quality and latency estimation of our system. The recall reaches 83.3% in case of better quality pictures and 77.4% in case of pictures of lower quality. The average search time is 3.2 s per document on CPU. The results show that the created system is scalable and can be used in production, where fast reuse detection for hundreds of thousands of scholar assignments on large collection of potential reuse sources is needed. All the experiments were held on HWR200 public dataset.</abstract><cop>Moscow</cop><pub>Pleiades Publishing</pub><doi>10.1134/S106456242370120X</doi></addata></record>
fulltext fulltext
identifier ISSN: 1064-5624
ispartof Doklady. Mathematics, 2023-12, Vol.108 (Suppl 2), p.S424-S433
issn 1064-5624
1531-8362
language eng
recordid cdi_proquest_journals_2985941247
source SpringerNature Journals
subjects Documents
Education
Handwriting recognition
Mathematics
Mathematics and Statistics
Pictures
Searching
title Text Reuse Detection in Handwritten Documents
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T20%3A51%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text%20Reuse%20Detection%20in%20Handwritten%20Documents&rft.jtitle=Doklady.%20Mathematics&rft.au=Grabovoy,%20A.%20V.&rft.date=2023-12-01&rft.volume=108&rft.issue=Suppl%202&rft.spage=S424&rft.epage=S433&rft.pages=S424-S433&rft.issn=1064-5624&rft.eissn=1531-8362&rft_id=info:doi/10.1134/S106456242370120X&rft_dat=%3Cproquest_cross%3E2985941247%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2985941247&rft_id=info:pmid/&rfr_iscdi=true