A Generative Model for Punctuation in Dependency Trees
Treebanks traditionally treat punctuation marks as ordinary words, but linguists have suggested that a tree's "true" punctuation marks are not observed (Nunberg, 1990). These latent "underlying" marks serve to delimit or separate constituents in the syntax tree. When the tre...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Li, Xiang Lisa Wang, Dingquan Eisner, Jason |
description | Treebanks traditionally treat punctuation marks as ordinary words, but
linguists have suggested that a tree's "true" punctuation marks are not
observed (Nunberg, 1990). These latent "underlying" marks serve to delimit or
separate constituents in the syntax tree. When the tree's yield is rendered as
a written sentence, a string rewriting mechanism transduces the underlying
marks into "surface" marks, which are part of the observed (surface) string but
should not be regarded as part of the tree. We formalize this idea in a
generative model of punctuation that admits efficient dynamic programming. We
train it without observing the underlying marks, by locally maximizing the
incomplete data likelihood (similarly to EM). When we use the trained model to
reconstruct the tree's underlying punctuation, the results appear plausible
across 5 languages, and in particular, are consistent with Nunberg's analysis
of English. We show that our generative model can be used to beat baselines on
punctuation restoration. Also, our reconstruction of a sentence's underlying
punctuation lets us appropriately render the surface punctuation (via our
trained underlying-to-surface mechanism) when we syntactically transform the
sentence. |
doi_str_mv | 10.48550/arxiv.1906.11298 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1906_11298</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1906_11298</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-ef038ed6a91c58419654bc64d0b13202a13074fde77037d9375698da5ad41f7b3</originalsourceid><addsrcrecordid>eNotj71uwjAURr0wVCkP0Am_QFLf-H9EUCgSVRmyRzfxtRSJOsj8CN6eljId6RuOvsPYG4hKOa3FO-brcKnAC1MB1N69MDPna0qU8TRciH-NgfY8jpnvzqk_nX_XMfEh8SUdKAVK_Y03mej4yiYR90eaPlmwZvXRLD7L7fd6s5hvSzTWlRSFdBQMeui1U-CNVl1vVBAdyFrUCFJYFQNZK6QNXlptvAuoMSiItpMFm_1rH8fbQx5-MN_av4D2ESDvg80_Fg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Generative Model for Punctuation in Dependency Trees</title><source>arXiv.org</source><creator>Li, Xiang Lisa ; Wang, Dingquan ; Eisner, Jason</creator><creatorcontrib>Li, Xiang Lisa ; Wang, Dingquan ; Eisner, Jason</creatorcontrib><description>Treebanks traditionally treat punctuation marks as ordinary words, but
linguists have suggested that a tree's "true" punctuation marks are not
observed (Nunberg, 1990). These latent "underlying" marks serve to delimit or
separate constituents in the syntax tree. When the tree's yield is rendered as
a written sentence, a string rewriting mechanism transduces the underlying
marks into "surface" marks, which are part of the observed (surface) string but
should not be regarded as part of the tree. We formalize this idea in a
generative model of punctuation that admits efficient dynamic programming. We
train it without observing the underlying marks, by locally maximizing the
incomplete data likelihood (similarly to EM). When we use the trained model to
reconstruct the tree's underlying punctuation, the results appear plausible
across 5 languages, and in particular, are consistent with Nunberg's analysis
of English. We show that our generative model can be used to beat baselines on
punctuation restoration. Also, our reconstruction of a sentence's underlying
punctuation lets us appropriately render the surface punctuation (via our
trained underlying-to-surface mechanism) when we syntactically transform the
sentence.</description><identifier>DOI: 10.48550/arxiv.1906.11298</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2019-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1906.11298$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1906.11298$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Xiang Lisa</creatorcontrib><creatorcontrib>Wang, Dingquan</creatorcontrib><creatorcontrib>Eisner, Jason</creatorcontrib><title>A Generative Model for Punctuation in Dependency Trees</title><description>Treebanks traditionally treat punctuation marks as ordinary words, but
linguists have suggested that a tree's "true" punctuation marks are not
observed (Nunberg, 1990). These latent "underlying" marks serve to delimit or
separate constituents in the syntax tree. When the tree's yield is rendered as
a written sentence, a string rewriting mechanism transduces the underlying
marks into "surface" marks, which are part of the observed (surface) string but
should not be regarded as part of the tree. We formalize this idea in a
generative model of punctuation that admits efficient dynamic programming. We
train it without observing the underlying marks, by locally maximizing the
incomplete data likelihood (similarly to EM). When we use the trained model to
reconstruct the tree's underlying punctuation, the results appear plausible
across 5 languages, and in particular, are consistent with Nunberg's analysis
of English. We show that our generative model can be used to beat baselines on
punctuation restoration. Also, our reconstruction of a sentence's underlying
punctuation lets us appropriately render the surface punctuation (via our
trained underlying-to-surface mechanism) when we syntactically transform the
sentence.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71uwjAURr0wVCkP0Am_QFLf-H9EUCgSVRmyRzfxtRSJOsj8CN6eljId6RuOvsPYG4hKOa3FO-brcKnAC1MB1N69MDPna0qU8TRciH-NgfY8jpnvzqk_nX_XMfEh8SUdKAVK_Y03mej4yiYR90eaPlmwZvXRLD7L7fd6s5hvSzTWlRSFdBQMeui1U-CNVl1vVBAdyFrUCFJYFQNZK6QNXlptvAuoMSiItpMFm_1rH8fbQx5-MN_av4D2ESDvg80_Fg</recordid><startdate>20190626</startdate><enddate>20190626</enddate><creator>Li, Xiang Lisa</creator><creator>Wang, Dingquan</creator><creator>Eisner, Jason</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190626</creationdate><title>A Generative Model for Punctuation in Dependency Trees</title><author>Li, Xiang Lisa ; Wang, Dingquan ; Eisner, Jason</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-ef038ed6a91c58419654bc64d0b13202a13074fde77037d9375698da5ad41f7b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiang Lisa</creatorcontrib><creatorcontrib>Wang, Dingquan</creatorcontrib><creatorcontrib>Eisner, Jason</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Xiang Lisa</au><au>Wang, Dingquan</au><au>Eisner, Jason</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Generative Model for Punctuation in Dependency Trees</atitle><date>2019-06-26</date><risdate>2019</risdate><abstract>Treebanks traditionally treat punctuation marks as ordinary words, but
linguists have suggested that a tree's "true" punctuation marks are not
observed (Nunberg, 1990). These latent "underlying" marks serve to delimit or
separate constituents in the syntax tree. When the tree's yield is rendered as
a written sentence, a string rewriting mechanism transduces the underlying
marks into "surface" marks, which are part of the observed (surface) string but
should not be regarded as part of the tree. We formalize this idea in a
generative model of punctuation that admits efficient dynamic programming. We
train it without observing the underlying marks, by locally maximizing the
incomplete data likelihood (similarly to EM). When we use the trained model to
reconstruct the tree's underlying punctuation, the results appear plausible
across 5 languages, and in particular, are consistent with Nunberg's analysis
of English. We show that our generative model can be used to beat baselines on
punctuation restoration. Also, our reconstruction of a sentence's underlying
punctuation lets us appropriately render the surface punctuation (via our
trained underlying-to-surface mechanism) when we syntactically transform the
sentence.</abstract><doi>10.48550/arxiv.1906.11298</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1906.11298 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1906_11298 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Learning |
title | A Generative Model for Punctuation in Dependency Trees |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T23%3A42%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Generative%20Model%20for%20Punctuation%20in%20Dependency%20Trees&rft.au=Li,%20Xiang%20Lisa&rft.date=2019-06-26&rft_id=info:doi/10.48550/arxiv.1906.11298&rft_dat=%3Carxiv_GOX%3E1906_11298%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |