Sentence alignment in DPC: maximizing precision, minimizing human effort

A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus where all sentences are aligned at very high precision with a minimal human effort involved. The experiments on a combina...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Trushkina, Julia, Macken, Lieve, Paulussen, Hans
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Trushkina, Julia
Macken, Lieve
Paulussen, Hans
description A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus where all sentences are aligned at very high precision with a minimal human effort involved. The experiments on a combination of sentence aligners with different underlying algorithms described in this paper showed that by verifying only those links which were not recognized by at least two aligners, an error rate can be reduced by 93.76% as compared to the performance of the best aligner. Such manual involvement concerned only a small portion of all data (6%). This significantly reduces a load of manual work necessary to achieve nearly 100% accuracy of alignment.
format Conference Proceeding
fullrecord <record><control><sourceid>ghent_ADGLB</sourceid><recordid>TN_cdi_ghent_librecat_oai_archive_ugent_be_437761</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_archive_ugent_be_437761</sourcerecordid><originalsourceid>FETCH-ghent_librecat_oai_archive_ugent_be_4377613</originalsourceid><addsrcrecordid>eNrjZPAITs0rSc1LTlVIzMlMz8sF8hQy8xRcApytFHITKzJzM6sy89IVCopSkzOLM_PzdBRyM_NgohmluYl5CqlpaflFJTwMrGmJOcWpvFCam8HQzTXE2UM3PQNoZnxOZhLQiMSS-PzEzPjEouSMzLLU-NJ0kFRSaryJsbm5maExOXoAwDFDMA</addsrcrecordid><sourcetype>Institutional Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Sentence alignment in DPC: maximizing precision, minimizing human effort</title><source>Ghent University Academic Bibliography</source><creator>Trushkina, Julia ; Macken, Lieve ; Paulussen, Hans</creator><creatorcontrib>Trushkina, Julia ; Macken, Lieve ; Paulussen, Hans</creatorcontrib><description>A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus where all sentences are aligned at very high precision with a minimal human effort involved. The experiments on a combination of sentence aligners with different underlying algorithms described in this paper showed that by verifying only those links which were not recognized by at least two aligners, an error rate can be reduced by 93.76% as compared to the performance of the best aligner. Such manual involvement concerned only a small portion of all data (6%). This significantly reduces a load of manual work necessary to achieve nearly 100% accuracy of alignment.</description><language>eng</language><publisher>European Language Resources Association (ELRA)</publisher><subject>Languages and Literatures ; Sentence alignment</subject><creationdate>2008</creationdate><rights>No license (in copyright) info:eu-repo/semantics/openAccess</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,315,780,4050,27860</link.rule.ids><linktorsrc>$$Uhttp://hdl.handle.net/1854/LU-437761$$EView_record_in_Ghent_University$$FView_record_in_$$GGhent_University$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Trushkina, Julia</creatorcontrib><creatorcontrib>Macken, Lieve</creatorcontrib><creatorcontrib>Paulussen, Hans</creatorcontrib><title>Sentence alignment in DPC: maximizing precision, minimizing human effort</title><description>A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus where all sentences are aligned at very high precision with a minimal human effort involved. The experiments on a combination of sentence aligners with different underlying algorithms described in this paper showed that by verifying only those links which were not recognized by at least two aligners, an error rate can be reduced by 93.76% as compared to the performance of the best aligner. Such manual involvement concerned only a small portion of all data (6%). This significantly reduces a load of manual work necessary to achieve nearly 100% accuracy of alignment.</description><subject>Languages and Literatures</subject><subject>Sentence alignment</subject><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>ADGLB</sourceid><recordid>eNrjZPAITs0rSc1LTlVIzMlMz8sF8hQy8xRcApytFHITKzJzM6sy89IVCopSkzOLM_PzdBRyM_NgohmluYl5CqlpaflFJTwMrGmJOcWpvFCam8HQzTXE2UM3PQNoZnxOZhLQiMSS-PzEzPjEouSMzLLU-NJ0kFRSaryJsbm5maExOXoAwDFDMA</recordid><startdate>2008</startdate><enddate>2008</enddate><creator>Trushkina, Julia</creator><creator>Macken, Lieve</creator><creator>Paulussen, Hans</creator><general>European Language Resources Association (ELRA)</general><scope>ADGLB</scope></search><sort><creationdate>2008</creationdate><title>Sentence alignment in DPC: maximizing precision, minimizing human effort</title><author>Trushkina, Julia ; Macken, Lieve ; Paulussen, Hans</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ghent_librecat_oai_archive_ugent_be_4377613</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Languages and Literatures</topic><topic>Sentence alignment</topic><toplevel>online_resources</toplevel><creatorcontrib>Trushkina, Julia</creatorcontrib><creatorcontrib>Macken, Lieve</creatorcontrib><creatorcontrib>Paulussen, Hans</creatorcontrib><collection>Ghent University Academic Bibliography</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Trushkina, Julia</au><au>Macken, Lieve</au><au>Paulussen, Hans</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Sentence alignment in DPC: maximizing precision, minimizing human effort</atitle><date>2008</date><risdate>2008</risdate><abstract>A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus where all sentences are aligned at very high precision with a minimal human effort involved. The experiments on a combination of sentence aligners with different underlying algorithms described in this paper showed that by verifying only those links which were not recognized by at least two aligners, an error rate can be reduced by 93.76% as compared to the performance of the best aligner. Such manual involvement concerned only a small portion of all data (6%). This significantly reduces a load of manual work necessary to achieve nearly 100% accuracy of alignment.</abstract><pub>European Language Resources Association (ELRA)</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_ghent_librecat_oai_archive_ugent_be_437761
source Ghent University Academic Bibliography
subjects Languages and Literatures
Sentence alignment
title Sentence alignment in DPC: maximizing precision, minimizing human effort
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T13%3A23%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ghent_ADGLB&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Sentence%20alignment%20in%20DPC:%20maximizing%20precision,%20minimizing%20human%20effort&rft.au=Trushkina,%20Julia&rft.date=2008&rft_id=info:doi/&rft_dat=%3Cghent_ADGLB%3Eoai_archive_ugent_be_437761%3C/ghent_ADGLB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true