Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity

•Encoding sentence via multiple pre-trained word embeddings.•Evaluating sentence pairs via multi-levels comparison.•The approach achieves strong performances on semantic textual similarity tasks.•The approach does not rely on linguistic resources. Recently, using a pretrained word embedding to repre...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing & management 2019-11, Vol.56 (6), p.102090, Article 102090
Hauptverfasser:	Tien, Nguyen Huy, Le, Nguyen Minh, Tomohiro, Yamasaki, Tatsuya, Izuha
Format:	Artikel
Sprache:	eng
Schlagworte:	Coders Dependence Embedding Multi-level comparison Multiple word embeddings Natural language processing Semantic Semantics Sentence embedding Sentences Similarity Similarity measures Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	6
container_start_page	102090
container_title	Information processing & management
container_volume	56
creator	Tien, Nguyen Huy Le, Nguyen Minh Tomohiro, Yamasaki Tatsuya, Izuha
description	•Encoding sentence via multiple pre-trained word embeddings.•Evaluating sentence pairs via multi-levels comparison.•The approach achieves strong performances on semantic textual similarity tasks.•The approach does not rely on linguistic resources. Recently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic properties. However, the Semantic Textual Similarity task, which evaluates similarity/relation between two sentences, requires to take into account of these linguistic aspects. Therefore, this research aims to encode various characteristics from multiple sets of word embeddings into one embedding and then learn similarity/relation between sentences via this novel embedding. Representing each word by multiple word embeddings, the proposed MaxLSTM-CNN encoder generates a novel sentence embedding. We then learn the similarity/relation between our sentence embeddings via Multi-level comparison. Our method M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., measure textual similarity, identify paraphrase, recognize textual entailment). Our model does not use hand-crafted features (e.g., alignment features, Ngram overlaps, dependency features) as well as does not require pre-trained word embeddings to have the same dimension.
doi_str_mv	10.1016/j.ipm.2019.102090
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2306797645</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0306457319301335</els_id><sourcerecordid>2306797645</sourcerecordid><originalsourceid>FETCH-LOGICAL-c325t-ac7e3c8fc4be44e3c9f433e331a0654e052029ca88969da3a843bc12aa20c1753</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AG8Bz13z1S88yeIXLHhQzyGbTiUlbWqSru6_N6WePc0M7_vODA9C15RsKKHFbbcxY79hhNZpZqQmJ2hFq5JnOS_pKVoRTopM5CU_RxchdIQQkVO2QuYNhgiDBty7BqwZPvHBKNxPNprRAv52vsHQ76FpkhawGppFzCwcwGLt-lF5E9yAW-dxgF4N0Wgc4SdOyuJgemOTIR4v0VmrbICrv7pGH48P79vnbPf69LK932WaszxmSpfAddVqsQchUlu3gnPgnCpS5AJIzgirtaqquqgbxVUl-F5TphQjmpY5X6ObZe_o3dcEIcrOTX5IJyVLEMq6LMTsootLexeCh1aO3vTKHyUlciYqO5mIypmoXIimzN2SgfT-wYCXQZuZXWM86CgbZ_5J_wJc8n_u</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2306797645</pqid></control><display><type>article</type><title>Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Tien, Nguyen Huy ; Le, Nguyen Minh ; Tomohiro, Yamasaki ; Tatsuya, Izuha</creator><creatorcontrib>Tien, Nguyen Huy ; Le, Nguyen Minh ; Tomohiro, Yamasaki ; Tatsuya, Izuha</creatorcontrib><description>•Encoding sentence via multiple pre-trained word embeddings.•Evaluating sentence pairs via multi-levels comparison.•The approach achieves strong performances on semantic textual similarity tasks.•The approach does not rely on linguistic resources. Recently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic properties. However, the Semantic Textual Similarity task, which evaluates similarity/relation between two sentences, requires to take into account of these linguistic aspects. Therefore, this research aims to encode various characteristics from multiple sets of word embeddings into one embedding and then learn similarity/relation between sentences via this novel embedding. Representing each word by multiple word embeddings, the proposed MaxLSTM-CNN encoder generates a novel sentence embedding. We then learn the similarity/relation between our sentence embeddings via Multi-level comparison. Our method M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., measure textual similarity, identify paraphrase, recognize textual entailment). Our model does not use hand-crafted features (e.g., alignment features, Ngram overlaps, dependency features) as well as does not require pre-trained word embeddings to have the same dimension.</description><identifier>ISSN: 0306-4573</identifier><identifier>EISSN: 1873-5371</identifier><identifier>DOI: 10.1016/j.ipm.2019.102090</identifier><language>eng</language><publisher>Oxford: Elsevier Ltd</publisher><subject>Coders ; Dependence ; Embedding ; Multi-level comparison ; Multiple word embeddings ; Natural language processing ; Semantic ; Semantics ; Sentence embedding ; Sentences ; Similarity ; Similarity measures ; Words (language)</subject><ispartof>Information processing & management, 2019-11, Vol.56 (6), p.102090, Article 102090</ispartof><rights>2019 Elsevier Ltd</rights><rights>Copyright Pergamon Press Inc. Nov 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c325t-ac7e3c8fc4be44e3c9f433e331a0654e052029ca88969da3a843bc12aa20c1753</citedby><cites>FETCH-LOGICAL-c325t-ac7e3c8fc4be44e3c9f433e331a0654e052029ca88969da3a843bc12aa20c1753</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.ipm.2019.102090$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Tien, Nguyen Huy</creatorcontrib><creatorcontrib>Le, Nguyen Minh</creatorcontrib><creatorcontrib>Tomohiro, Yamasaki</creatorcontrib><creatorcontrib>Tatsuya, Izuha</creatorcontrib><title>Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity</title><title>Information processing & management</title><description>•Encoding sentence via multiple pre-trained word embeddings.•Evaluating sentence pairs via multi-levels comparison.•The approach achieves strong performances on semantic textual similarity tasks.•The approach does not rely on linguistic resources. Recently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic properties. However, the Semantic Textual Similarity task, which evaluates similarity/relation between two sentences, requires to take into account of these linguistic aspects. Therefore, this research aims to encode various characteristics from multiple sets of word embeddings into one embedding and then learn similarity/relation between sentences via this novel embedding. Representing each word by multiple word embeddings, the proposed MaxLSTM-CNN encoder generates a novel sentence embedding. We then learn the similarity/relation between our sentence embeddings via Multi-level comparison. Our method M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., measure textual similarity, identify paraphrase, recognize textual entailment). Our model does not use hand-crafted features (e.g., alignment features, Ngram overlaps, dependency features) as well as does not require pre-trained word embeddings to have the same dimension.</description><subject>Coders</subject><subject>Dependence</subject><subject>Embedding</subject><subject>Multi-level comparison</subject><subject>Multiple word embeddings</subject><subject>Natural language processing</subject><subject>Semantic</subject><subject>Semantics</subject><subject>Sentence embedding</subject><subject>Sentences</subject><subject>Similarity</subject><subject>Similarity measures</subject><subject>Words (language)</subject><issn>0306-4573</issn><issn>1873-5371</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-AG8Bz13z1S88yeIXLHhQzyGbTiUlbWqSru6_N6WePc0M7_vODA9C15RsKKHFbbcxY79hhNZpZqQmJ2hFq5JnOS_pKVoRTopM5CU_RxchdIQQkVO2QuYNhgiDBty7BqwZPvHBKNxPNprRAv52vsHQ76FpkhawGppFzCwcwGLt-lF5E9yAW-dxgF4N0Wgc4SdOyuJgemOTIR4v0VmrbICrv7pGH48P79vnbPf69LK932WaszxmSpfAddVqsQchUlu3gnPgnCpS5AJIzgirtaqquqgbxVUl-F5TphQjmpY5X6ObZe_o3dcEIcrOTX5IJyVLEMq6LMTsootLexeCh1aO3vTKHyUlciYqO5mIypmoXIimzN2SgfT-wYCXQZuZXWM86CgbZ_5J_wJc8n_u</recordid><startdate>201911</startdate><enddate>201911</enddate><creator>Tien, Nguyen Huy</creator><creator>Le, Nguyen Minh</creator><creator>Tomohiro, Yamasaki</creator><creator>Tatsuya, Izuha</creator><general>Elsevier Ltd</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope></search><sort><creationdate>201911</creationdate><title>Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity</title><author>Tien, Nguyen Huy ; Le, Nguyen Minh ; Tomohiro, Yamasaki ; Tatsuya, Izuha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c325t-ac7e3c8fc4be44e3c9f433e331a0654e052029ca88969da3a843bc12aa20c1753</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Coders</topic><topic>Dependence</topic><topic>Embedding</topic><topic>Multi-level comparison</topic><topic>Multiple word embeddings</topic><topic>Natural language processing</topic><topic>Semantic</topic><topic>Semantics</topic><topic>Sentence embedding</topic><topic>Sentences</topic><topic>Similarity</topic><topic>Similarity measures</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tien, Nguyen Huy</creatorcontrib><creatorcontrib>Le, Nguyen Minh</creatorcontrib><creatorcontrib>Tomohiro, Yamasaki</creatorcontrib><creatorcontrib>Tatsuya, Izuha</creatorcontrib><collection>CrossRef</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><jtitle>Information processing & management</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tien, Nguyen Huy</au><au>Le, Nguyen Minh</au><au>Tomohiro, Yamasaki</au><au>Tatsuya, Izuha</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity</atitle><jtitle>Information processing & management</jtitle><date>2019-11</date><risdate>2019</risdate><volume>56</volume><issue>6</issue><spage>102090</spage><pages>102090-</pages><artnum>102090</artnum><issn>0306-4573</issn><eissn>1873-5371</eissn><abstract>•Encoding sentence via multiple pre-trained word embeddings.•Evaluating sentence pairs via multi-levels comparison.•The approach achieves strong performances on semantic textual similarity tasks.•The approach does not rely on linguistic resources. Recently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic properties. However, the Semantic Textual Similarity task, which evaluates similarity/relation between two sentences, requires to take into account of these linguistic aspects. Therefore, this research aims to encode various characteristics from multiple sets of word embeddings into one embedding and then learn similarity/relation between sentences via this novel embedding. Representing each word by multiple word embeddings, the proposed MaxLSTM-CNN encoder generates a novel sentence embedding. We then learn the similarity/relation between our sentence embeddings via Multi-level comparison. Our method M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., measure textual similarity, identify paraphrase, recognize textual entailment). Our model does not use hand-crafted features (e.g., alignment features, Ngram overlaps, dependency features) as well as does not require pre-trained word embeddings to have the same dimension.</abstract><cop>Oxford</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.ipm.2019.102090</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0306-4573
ispartof	Information processing & management, 2019-11, Vol.56 (6), p.102090, Article 102090
issn	0306-4573 1873-5371
language	eng
recordid	cdi_proquest_journals_2306797645
source	Elsevier ScienceDirect Journals Complete
subjects	Coders Dependence Embedding Multi-level comparison Multiple word embeddings Natural language processing Semantic Semantics Sentence embedding Sentences Similarity Similarity measures Words (language)
title	Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T09%3A31%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sentence%20modeling%20via%20multiple%20word%20embeddings%20and%20multi-level%20comparison%20for%20semantic%20textual%20similarity&rft.jtitle=Information%20processing%20&%20management&rft.au=Tien,%20Nguyen%20Huy&rft.date=2019-11&rft.volume=56&rft.issue=6&rft.spage=102090&rft.pages=102090-&rft.artnum=102090&rft.issn=0306-4573&rft.eissn=1873-5371&rft_id=info:doi/10.1016/j.ipm.2019.102090&rft_dat=%3Cproquest_cross%3E2306797645%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2306797645&rft_id=info:pmid/&rft_els_id=S0306457319301335&rfr_iscdi=true