Paraphrase Thought: Sentence Embedding Module Imitating Human Language Recognition

Sentence embedding is an important research topic in natural language processing. It is essential to generate a good embedding vector that fully reflects the semantic meaning of a sentence in order to achieve an enhanced performance for various natural language processing tasks, such as machine tran...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2018-09
Hauptverfasser:	Jang, Myeongjun, Kang, Pilsung
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Coherence Data mining Embedding Machine translation Natural language processing Performance enhancement Recognition Semantics Sentences Sentiment analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Jang, Myeongjun Kang, Pilsung
description	Sentence embedding is an important research topic in natural language processing. It is essential to generate a good embedding vector that fully reflects the semantic meaning of a sentence in order to achieve an enhanced performance for various natural language processing tasks, such as machine translation and document classification. Thus far, various sentence embedding models have been proposed, and their feasibility has been demonstrated through good performances on tasks following embedding, such as sentiment analysis and sentence classification. However, because the performances of sentence classification and sentiment analysis can be enhanced by using a simple sentence representation method, it is not sufficient to claim that these models fully reflect the meanings of sentences based on good performances for such tasks. In this paper, inspired by human language recognition, we propose the following concept of semantic coherence, which should be satisfied for a good sentence embedding method: similar sentences should be located close to each other in the embedding space. Then, we propose the Paraphrase-Thought (P-thought) model to pursue semantic coherence as much as possible. Experimental results on two paraphrase identification datasets (MS COCO and STS benchmark) show that the P-thought models outperform the benchmarked sentence embedding methods.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2092775684</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2092775684</sourcerecordid><originalsourceid>FETCH-proquest_journals_20927756843</originalsourceid><addsrcrecordid>eNqNit0KgjAYQEcQJOU7DLoW1uZf3YZhUBDmvSz9mhPdbD_vX0EP0NWBc84CBZSxXZTHlK5QaO1ACKFpRpOEBai6ccPn3nALuO61F7074DsoB6oFXEwP6DqpBL7qzo-Az5N03H1F6Seu8IUr4bkAXEGrhZJOarVByycfLYQ_rtH2VNTHMpqNfnmwrhm0N-qTGkr2NMuSNI_Zf9cb5uM_vQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2092775684</pqid></control><display><type>article</type><title>Paraphrase Thought: Sentence Embedding Module Imitating Human Language Recognition</title><source>Free E- Journals</source><creator>Jang, Myeongjun ; Kang, Pilsung</creator><creatorcontrib>Jang, Myeongjun ; Kang, Pilsung</creatorcontrib><description>Sentence embedding is an important research topic in natural language processing. It is essential to generate a good embedding vector that fully reflects the semantic meaning of a sentence in order to achieve an enhanced performance for various natural language processing tasks, such as machine translation and document classification. Thus far, various sentence embedding models have been proposed, and their feasibility has been demonstrated through good performances on tasks following embedding, such as sentiment analysis and sentence classification. However, because the performances of sentence classification and sentiment analysis can be enhanced by using a simple sentence representation method, it is not sufficient to claim that these models fully reflect the meanings of sentences based on good performances for such tasks. In this paper, inspired by human language recognition, we propose the following concept of semantic coherence, which should be satisfied for a good sentence embedding method: similar sentences should be located close to each other in the embedding space. Then, we propose the Paraphrase-Thought (P-thought) model to pursue semantic coherence as much as possible. Experimental results on two paraphrase identification datasets (MS COCO and STS benchmark) show that the P-thought models outperform the benchmarked sentence embedding methods.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Coherence ; Data mining ; Embedding ; Machine translation ; Natural language processing ; Performance enhancement ; Recognition ; Semantics ; Sentences ; Sentiment analysis</subject><ispartof>arXiv.org, 2018-09</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Jang, Myeongjun</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><title>Paraphrase Thought: Sentence Embedding Module Imitating Human Language Recognition</title><title>arXiv.org</title><description>Sentence embedding is an important research topic in natural language processing. It is essential to generate a good embedding vector that fully reflects the semantic meaning of a sentence in order to achieve an enhanced performance for various natural language processing tasks, such as machine translation and document classification. Thus far, various sentence embedding models have been proposed, and their feasibility has been demonstrated through good performances on tasks following embedding, such as sentiment analysis and sentence classification. However, because the performances of sentence classification and sentiment analysis can be enhanced by using a simple sentence representation method, it is not sufficient to claim that these models fully reflect the meanings of sentences based on good performances for such tasks. In this paper, inspired by human language recognition, we propose the following concept of semantic coherence, which should be satisfied for a good sentence embedding method: similar sentences should be located close to each other in the embedding space. Then, we propose the Paraphrase-Thought (P-thought) model to pursue semantic coherence as much as possible. Experimental results on two paraphrase identification datasets (MS COCO and STS benchmark) show that the P-thought models outperform the benchmarked sentence embedding methods.</description><subject>Classification</subject><subject>Coherence</subject><subject>Data mining</subject><subject>Embedding</subject><subject>Machine translation</subject><subject>Natural language processing</subject><subject>Performance enhancement</subject><subject>Recognition</subject><subject>Semantics</subject><subject>Sentences</subject><subject>Sentiment analysis</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNit0KgjAYQEcQJOU7DLoW1uZf3YZhUBDmvSz9mhPdbD_vX0EP0NWBc84CBZSxXZTHlK5QaO1ACKFpRpOEBai6ccPn3nALuO61F7074DsoB6oFXEwP6DqpBL7qzo-Az5N03H1F6Seu8IUr4bkAXEGrhZJOarVByycfLYQ_rtH2VNTHMpqNfnmwrhm0N-qTGkr2NMuSNI_Zf9cb5uM_vQ</recordid><startdate>20180912</startdate><enddate>20180912</enddate><creator>Jang, Myeongjun</creator><creator>Kang, Pilsung</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20180912</creationdate><title>Paraphrase Thought: Sentence Embedding Module Imitating Human Language Recognition</title><author>Jang, Myeongjun ; Kang, Pilsung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20927756843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Classification</topic><topic>Coherence</topic><topic>Data mining</topic><topic>Embedding</topic><topic>Machine translation</topic><topic>Natural language processing</topic><topic>Performance enhancement</topic><topic>Recognition</topic><topic>Semantics</topic><topic>Sentences</topic><topic>Sentiment analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Jang, Myeongjun</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jang, Myeongjun</au><au>Kang, Pilsung</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Paraphrase Thought: Sentence Embedding Module Imitating Human Language Recognition</atitle><jtitle>arXiv.org</jtitle><date>2018-09-12</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>Sentence embedding is an important research topic in natural language processing. It is essential to generate a good embedding vector that fully reflects the semantic meaning of a sentence in order to achieve an enhanced performance for various natural language processing tasks, such as machine translation and document classification. Thus far, various sentence embedding models have been proposed, and their feasibility has been demonstrated through good performances on tasks following embedding, such as sentiment analysis and sentence classification. However, because the performances of sentence classification and sentiment analysis can be enhanced by using a simple sentence representation method, it is not sufficient to claim that these models fully reflect the meanings of sentences based on good performances for such tasks. In this paper, inspired by human language recognition, we propose the following concept of semantic coherence, which should be satisfied for a good sentence embedding method: similar sentences should be located close to each other in the embedding space. Then, we propose the Paraphrase-Thought (P-thought) model to pursue semantic coherence as much as possible. Experimental results on two paraphrase identification datasets (MS COCO and STS benchmark) show that the P-thought models outperform the benchmarked sentence embedding methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2018-09
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2092775684
source	Free E- Journals
subjects	Classification Coherence Data mining Embedding Machine translation Natural language processing Performance enhancement Recognition Semantics Sentences Sentiment analysis
title	Paraphrase Thought: Sentence Embedding Module Imitating Human Language Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T01%3A42%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Paraphrase%20Thought:%20Sentence%20Embedding%20Module%20Imitating%20Human%20Language%20Recognition&rft.jtitle=arXiv.org&rft.au=Jang,%20Myeongjun&rft.date=2018-09-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2092775684%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2092775684&rft_id=info:pmid/&rfr_iscdi=true