Supervised Paragraph Vector: Distributed Representations of Words, Documents and Class Labels

While the traditional method of deriving representations for documents was bag-of-words, they suffered from high dimensionality and sparsity. Recently, many methods to obtain lower dimensional and densely distributed representations were proposed. Paragraph Vector is one of such algorithms, which ex...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2019, Vol.7, p.29051-29064
Hauptverfasser:	Park, Eunjeong L., Cho, Sungzoon, Kang, Pilsung
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Cats Class label Classification Computational efficiency Computational modeling Computer architecture Computing time distributed representations document embedding Labels Neural networks Prediction algorithms Principal components analysis representation learning Representations Task analysis Training word embedding
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	29064
container_issue
container_start_page	29051
container_title	IEEE access
container_volume	7
creator	Park, Eunjeong L. Cho, Sungzoon Kang, Pilsung
description	While the traditional method of deriving representations for documents was bag-of-words, they suffered from high dimensionality and sparsity. Recently, many methods to obtain lower dimensional and densely distributed representations were proposed. Paragraph Vector is one of such algorithms, which extends the word2vec algorithm by considering the paragraph as an additional word. However, it generates a single representation for all tasks, while different tasks may require different representations. In this paper, we propose a Supervised Paragraph Vector, a task-specific variant of Paragraph Vector for situations where class labels exist. Essentially, Supervised Paragraph Vector uses class labels along with words and documents and obtains corresponding representations with respect to the particular classification task. In order to prove the benefits of the proposed algorithm, three performance criteria are used: interpretability, discriminative power, and computational efficiency. To test interpretability, we find words that are close and far to class vectors and demonstrate that such words are closely related to the corresponding class. We also use principal component analysis to visualize all words, documents, and class labels at the same time and show that our method effectively displays the related words and documents for each class label. To evaluate discriminative power and computational efficiency, we perform document classification on four commonly used datasets with various classifiers and achieve comparable classification accuracies to bag-of-words and Paragraph Vector.
doi_str_mv	10.1109/ACCESS.2019.2901933
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2455608457</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8653834</ieee_id><doaj_id>oai_doaj_org_article_9c771438c0fa4df1b506e18cbea44d4b</doaj_id><sourcerecordid>2455608457</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-f19cd822bbf3dbbb0101acdf85d51f2ece79878050213743239f03a64b9a8c933</originalsourceid><addsrcrecordid>eNpNUU1rHDEMHUoLDUl-QS6GXrtbf87YvYVJ0gYWGrJNeipGtuV0ls16as8U-u_rZEKoDpKQ9J4kXtOcMbpmjJpP531_ud2uOWVmzU31QrxpjjhrzUoo0b79L3_fnJayo9V0LanuqPm5nUfMf4aCgdxAhocM4y9yj35K-TO5GMqUBzdPtXuLY8aChwmmIR0KSZH8SDmUj-Qi-fmxNgqBQyD9HkohG3C4LyfNuwj7gqcv8bi5u7r83n9dbb59ue7PNysvqZ5WkRkfNOfORRGcc5RRBj5ErYJikaPHzuhOU0U5E50UXJhIBbTSGdC-_nvcXC-8IcHOjnl4hPzXJhjscyHlBwt5GvwerfFdx6TQnkaQITKnaItMe4cgZZCucn1YuMacfs9YJrtLcz7U8y2XSrVUS9XVKbFM-ZxKyRhftzJqn2Sxiyz2SRb7IktFnS2oARFfEbpVQgsp_gHZIokZ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455608457</pqid></control><display><type>article</type><title>Supervised Paragraph Vector: Distributed Representations of Words, Documents and Class Labels</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Park, Eunjeong L. ; Cho, Sungzoon ; Kang, Pilsung</creator><creatorcontrib>Park, Eunjeong L. ; Cho, Sungzoon ; Kang, Pilsung</creatorcontrib><description>While the traditional method of deriving representations for documents was bag-of-words, they suffered from high dimensionality and sparsity. Recently, many methods to obtain lower dimensional and densely distributed representations were proposed. Paragraph Vector is one of such algorithms, which extends the word2vec algorithm by considering the paragraph as an additional word. However, it generates a single representation for all tasks, while different tasks may require different representations. In this paper, we propose a Supervised Paragraph Vector, a task-specific variant of Paragraph Vector for situations where class labels exist. Essentially, Supervised Paragraph Vector uses class labels along with words and documents and obtains corresponding representations with respect to the particular classification task. In order to prove the benefits of the proposed algorithm, three performance criteria are used: interpretability, discriminative power, and computational efficiency. To test interpretability, we find words that are close and far to class vectors and demonstrate that such words are closely related to the corresponding class. We also use principal component analysis to visualize all words, documents, and class labels at the same time and show that our method effectively displays the related words and documents for each class label. To evaluate discriminative power and computational efficiency, we perform document classification on four commonly used datasets with various classifiers and achieve comparable classification accuracies to bag-of-words and Paragraph Vector.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2019.2901933</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Cats ; Class label ; Classification ; Computational efficiency ; Computational modeling ; Computer architecture ; Computing time ; distributed representations ; document embedding ; Labels ; Neural networks ; Prediction algorithms ; Principal components analysis ; representation learning ; Representations ; Task analysis ; Training ; word embedding</subject><ispartof>IEEE access, 2019, Vol.7, p.29051-29064</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-f19cd822bbf3dbbb0101acdf85d51f2ece79878050213743239f03a64b9a8c933</citedby><cites>FETCH-LOGICAL-c408t-f19cd822bbf3dbbb0101acdf85d51f2ece79878050213743239f03a64b9a8c933</cites><orcidid>0000-0001-7663-3937</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8653834$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,782,786,866,2104,4026,27640,27930,27931,27932,54940</link.rule.ids></links><search><creatorcontrib>Park, Eunjeong L.</creatorcontrib><creatorcontrib>Cho, Sungzoon</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><title>Supervised Paragraph Vector: Distributed Representations of Words, Documents and Class Labels</title><title>IEEE access</title><addtitle>Access</addtitle><description>While the traditional method of deriving representations for documents was bag-of-words, they suffered from high dimensionality and sparsity. Recently, many methods to obtain lower dimensional and densely distributed representations were proposed. Paragraph Vector is one of such algorithms, which extends the word2vec algorithm by considering the paragraph as an additional word. However, it generates a single representation for all tasks, while different tasks may require different representations. In this paper, we propose a Supervised Paragraph Vector, a task-specific variant of Paragraph Vector for situations where class labels exist. Essentially, Supervised Paragraph Vector uses class labels along with words and documents and obtains corresponding representations with respect to the particular classification task. In order to prove the benefits of the proposed algorithm, three performance criteria are used: interpretability, discriminative power, and computational efficiency. To test interpretability, we find words that are close and far to class vectors and demonstrate that such words are closely related to the corresponding class. We also use principal component analysis to visualize all words, documents, and class labels at the same time and show that our method effectively displays the related words and documents for each class label. To evaluate discriminative power and computational efficiency, we perform document classification on four commonly used datasets with various classifiers and achieve comparable classification accuracies to bag-of-words and Paragraph Vector.</description><subject>Algorithms</subject><subject>Cats</subject><subject>Class label</subject><subject>Classification</subject><subject>Computational efficiency</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Computing time</subject><subject>distributed representations</subject><subject>document embedding</subject><subject>Labels</subject><subject>Neural networks</subject><subject>Prediction algorithms</subject><subject>Principal components analysis</subject><subject>representation learning</subject><subject>Representations</subject><subject>Task analysis</subject><subject>Training</subject><subject>word embedding</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1rHDEMHUoLDUl-QS6GXrtbf87YvYVJ0gYWGrJNeipGtuV0ls16as8U-u_rZEKoDpKQ9J4kXtOcMbpmjJpP531_ud2uOWVmzU31QrxpjjhrzUoo0b79L3_fnJayo9V0LanuqPm5nUfMf4aCgdxAhocM4y9yj35K-TO5GMqUBzdPtXuLY8aChwmmIR0KSZH8SDmUj-Qi-fmxNgqBQyD9HkohG3C4LyfNuwj7gqcv8bi5u7r83n9dbb59ue7PNysvqZ5WkRkfNOfORRGcc5RRBj5ErYJikaPHzuhOU0U5E50UXJhIBbTSGdC-_nvcXC-8IcHOjnl4hPzXJhjscyHlBwt5GvwerfFdx6TQnkaQITKnaItMe4cgZZCucn1YuMacfs9YJrtLcz7U8y2XSrVUS9XVKbFM-ZxKyRhftzJqn2Sxiyz2SRb7IktFnS2oARFfEbpVQgsp_gHZIokZ</recordid><startdate>2019</startdate><enddate>2019</enddate><creator>Park, Eunjeong L.</creator><creator>Cho, Sungzoon</creator><creator>Kang, Pilsung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-7663-3937</orcidid></search><sort><creationdate>2019</creationdate><title>Supervised Paragraph Vector: Distributed Representations of Words, Documents and Class Labels</title><author>Park, Eunjeong L. ; Cho, Sungzoon ; Kang, Pilsung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-f19cd822bbf3dbbb0101acdf85d51f2ece79878050213743239f03a64b9a8c933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Cats</topic><topic>Class label</topic><topic>Classification</topic><topic>Computational efficiency</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Computing time</topic><topic>distributed representations</topic><topic>document embedding</topic><topic>Labels</topic><topic>Neural networks</topic><topic>Prediction algorithms</topic><topic>Principal components analysis</topic><topic>representation learning</topic><topic>Representations</topic><topic>Task analysis</topic><topic>Training</topic><topic>word embedding</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Park, Eunjeong L.</creatorcontrib><creatorcontrib>Cho, Sungzoon</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Park, Eunjeong L.</au><au>Cho, Sungzoon</au><au>Kang, Pilsung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Supervised Paragraph Vector: Distributed Representations of Words, Documents and Class Labels</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2019</date><risdate>2019</risdate><volume>7</volume><spage>29051</spage><epage>29064</epage><pages>29051-29064</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>While the traditional method of deriving representations for documents was bag-of-words, they suffered from high dimensionality and sparsity. Recently, many methods to obtain lower dimensional and densely distributed representations were proposed. Paragraph Vector is one of such algorithms, which extends the word2vec algorithm by considering the paragraph as an additional word. However, it generates a single representation for all tasks, while different tasks may require different representations. In this paper, we propose a Supervised Paragraph Vector, a task-specific variant of Paragraph Vector for situations where class labels exist. Essentially, Supervised Paragraph Vector uses class labels along with words and documents and obtains corresponding representations with respect to the particular classification task. In order to prove the benefits of the proposed algorithm, three performance criteria are used: interpretability, discriminative power, and computational efficiency. To test interpretability, we find words that are close and far to class vectors and demonstrate that such words are closely related to the corresponding class. We also use principal component analysis to visualize all words, documents, and class labels at the same time and show that our method effectively displays the related words and documents for each class label. To evaluate discriminative power and computational efficiency, we perform document classification on four commonly used datasets with various classifiers and achieve comparable classification accuracies to bag-of-words and Paragraph Vector.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2019.2901933</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-7663-3937</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2019, Vol.7, p.29051-29064
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_2455608457
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Algorithms Cats Class label Classification Computational efficiency Computational modeling Computer architecture Computing time distributed representations document embedding Labels Neural networks Prediction algorithms Principal components analysis representation learning Representations Task analysis Training word embedding
title	Supervised Paragraph Vector: Distributed Representations of Words, Documents and Class Labels
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-05T03%3A04%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Supervised%20Paragraph%20Vector:%20Distributed%20Representations%20of%20Words,%20Documents%20and%20Class%20Labels&rft.jtitle=IEEE%20access&rft.au=Park,%20Eunjeong%20L.&rft.date=2019&rft.volume=7&rft.spage=29051&rft.epage=29064&rft.pages=29051-29064&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2019.2901933&rft_dat=%3Cproquest_cross%3E2455608457%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455608457&rft_id=info:pmid/&rft_ieee_id=8653834&rft_doaj_id=oai_doaj_org_article_9c771438c0fa4df1b506e18cbea44d4b&rfr_iscdi=true