Mirroring Vector Space Embedding for New Words

Most embedding models used in natural language processing require retraining of the entire model to obtain the embedding value of a new word. In the current system, as retraining is repeated, the amount of data used for learning gradually increases. It is thus very inefficient to retrain the entire...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.99954-99967
Hauptverfasser: Kim, Jihye, Jeong, Ok-Ran
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 99967
container_issue
container_start_page 99954
container_title IEEE access
container_volume 9
creator Kim, Jihye
Jeong, Ok-Ran
description Most embedding models used in natural language processing require retraining of the entire model to obtain the embedding value of a new word. In the current system, as retraining is repeated, the amount of data used for learning gradually increases. It is thus very inefficient to retrain the entire model whenever some new words emerge. Moreover, since a language has a huge number of words and its characteristics change continuously over time, it is not easy to embed all words. To solve both problems, we propose a new embedding model, the Mirroring Vector Space (MVS), which enables us to obtain a new word embedding by using the previously built word embedding model without retraining it. The MVS embedding model has a convolutional neural networks (CNN) structure and presents a novel strategy to obtain word embeddings. It predicts the embedding value of a word by learning the vector space of an existing embedding model using the explanations of the word. It also provides flexibility for external resources, reusability for training times, and portability in the point that our model can be used with any models. We verify these three attributes and the novelty in our experiments.
doi_str_mv 10.1109/ACCESS.2021.3096238
format Article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2553593405</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9481109</ieee_id><doaj_id>oai_doaj_org_article_5b48ddd5f8d14c7b9cb75303e2240b56</doaj_id><sourcerecordid>2553593405</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-6d7fb695f03a96189e00f900d82524272597ef21c102942307380bbff81432d33</originalsourceid><addsrcrecordid>eNpNUMtKw0AUHUTBUvsF3QRcJ955JTPLEqoWqi7qYzlkXiWl7dRJi_j3TkwR7-ZeDudxOQhNMRQYg7yb1fV8tSoIEFxQkCWh4gKNCC5lTjktL__d12jSdRtIIxLEqxEqntoYQ2z36-zdmWOI2erQGJfNd9pZ28M-Yc_uK_sI0XY36Mo3285NznuM3u7nr_Vjvnx5WNSzZW4YiGNe2srrFOCBNrLEQjoALwGsIJwwUhEuK-cJNhiIZIRCRQVo7b3AjBJL6RgtBl8bmo06xHbXxG8Vmlb9AiGuVROPrdk6xTUT1lruhcXMVFoaXXEK1BHCQPMyed0OXocYPk-uO6pNOMV9el8RzimXlAFPLDqwTAxdF53_S8Wg-p7V0LPqe1bnnpNqOqha59yfQjLRK-gPl8d1Lg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2553593405</pqid></control><display><type>article</type><title>Mirroring Vector Space Embedding for New Words</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Kim, Jihye ; Jeong, Ok-Ran</creator><creatorcontrib>Kim, Jihye ; Jeong, Ok-Ran</creatorcontrib><description>Most embedding models used in natural language processing require retraining of the entire model to obtain the embedding value of a new word. In the current system, as retraining is repeated, the amount of data used for learning gradually increases. It is thus very inefficient to retrain the entire model whenever some new words emerge. Moreover, since a language has a huge number of words and its characteristics change continuously over time, it is not easy to embed all words. To solve both problems, we propose a new embedding model, the Mirroring Vector Space (MVS), which enables us to obtain a new word embedding by using the previously built word embedding model without retraining it. The MVS embedding model has a convolutional neural networks (CNN) structure and presents a novel strategy to obtain word embeddings. It predicts the embedding value of a word by learning the vector space of an existing embedding model using the explanations of the word. It also provides flexibility for external resources, reusability for training times, and portability in the point that our model can be used with any models. We verify these three attributes and the novelty in our experiments.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3096238</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial intelligence ; Artificial neural networks ; Bit error rate ; Data models ; deep learning ; Dictionaries ; Embedding ; Hidden Markov models ; Learning ; Natural language processing ; neural networks ; new words ; Semantics ; Task analysis ; Training ; Vector space ; word embedding</subject><ispartof>IEEE access, 2021, Vol.9, p.99954-99967</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-6d7fb695f03a96189e00f900d82524272597ef21c102942307380bbff81432d33</citedby><cites>FETCH-LOGICAL-c408t-6d7fb695f03a96189e00f900d82524272597ef21c102942307380bbff81432d33</cites><orcidid>0000-0002-3545-5991</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9481109$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Kim, Jihye</creatorcontrib><creatorcontrib>Jeong, Ok-Ran</creatorcontrib><title>Mirroring Vector Space Embedding for New Words</title><title>IEEE access</title><addtitle>Access</addtitle><description>Most embedding models used in natural language processing require retraining of the entire model to obtain the embedding value of a new word. In the current system, as retraining is repeated, the amount of data used for learning gradually increases. It is thus very inefficient to retrain the entire model whenever some new words emerge. Moreover, since a language has a huge number of words and its characteristics change continuously over time, it is not easy to embed all words. To solve both problems, we propose a new embedding model, the Mirroring Vector Space (MVS), which enables us to obtain a new word embedding by using the previously built word embedding model without retraining it. The MVS embedding model has a convolutional neural networks (CNN) structure and presents a novel strategy to obtain word embeddings. It predicts the embedding value of a word by learning the vector space of an existing embedding model using the explanations of the word. It also provides flexibility for external resources, reusability for training times, and portability in the point that our model can be used with any models. We verify these three attributes and the novelty in our experiments.</description><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Bit error rate</subject><subject>Data models</subject><subject>deep learning</subject><subject>Dictionaries</subject><subject>Embedding</subject><subject>Hidden Markov models</subject><subject>Learning</subject><subject>Natural language processing</subject><subject>neural networks</subject><subject>new words</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>Vector space</subject><subject>word embedding</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMtKw0AUHUTBUvsF3QRcJ955JTPLEqoWqi7qYzlkXiWl7dRJi_j3TkwR7-ZeDudxOQhNMRQYg7yb1fV8tSoIEFxQkCWh4gKNCC5lTjktL__d12jSdRtIIxLEqxEqntoYQ2z36-zdmWOI2erQGJfNd9pZ28M-Yc_uK_sI0XY36Mo3285NznuM3u7nr_Vjvnx5WNSzZW4YiGNe2srrFOCBNrLEQjoALwGsIJwwUhEuK-cJNhiIZIRCRQVo7b3AjBJL6RgtBl8bmo06xHbXxG8Vmlb9AiGuVROPrdk6xTUT1lruhcXMVFoaXXEK1BHCQPMyed0OXocYPk-uO6pNOMV9el8RzimXlAFPLDqwTAxdF53_S8Wg-p7V0LPqe1bnnpNqOqha59yfQjLRK-gPl8d1Lg</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Kim, Jihye</creator><creator>Jeong, Ok-Ran</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-3545-5991</orcidid></search><sort><creationdate>2021</creationdate><title>Mirroring Vector Space Embedding for New Words</title><author>Kim, Jihye ; Jeong, Ok-Ran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-6d7fb695f03a96189e00f900d82524272597ef21c102942307380bbff81432d33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Bit error rate</topic><topic>Data models</topic><topic>deep learning</topic><topic>Dictionaries</topic><topic>Embedding</topic><topic>Hidden Markov models</topic><topic>Learning</topic><topic>Natural language processing</topic><topic>neural networks</topic><topic>new words</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>Vector space</topic><topic>word embedding</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Jihye</creatorcontrib><creatorcontrib>Jeong, Ok-Ran</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Jihye</au><au>Jeong, Ok-Ran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mirroring Vector Space Embedding for New Words</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>99954</spage><epage>99967</epage><pages>99954-99967</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Most embedding models used in natural language processing require retraining of the entire model to obtain the embedding value of a new word. In the current system, as retraining is repeated, the amount of data used for learning gradually increases. It is thus very inefficient to retrain the entire model whenever some new words emerge. Moreover, since a language has a huge number of words and its characteristics change continuously over time, it is not easy to embed all words. To solve both problems, we propose a new embedding model, the Mirroring Vector Space (MVS), which enables us to obtain a new word embedding by using the previously built word embedding model without retraining it. The MVS embedding model has a convolutional neural networks (CNN) structure and presents a novel strategy to obtain word embeddings. It predicts the embedding value of a word by learning the vector space of an existing embedding model using the explanations of the word. It also provides flexibility for external resources, reusability for training times, and portability in the point that our model can be used with any models. We verify these three attributes and the novelty in our experiments.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3096238</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-3545-5991</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.99954-99967
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2553593405
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Artificial intelligence
Artificial neural networks
Bit error rate
Data models
deep learning
Dictionaries
Embedding
Hidden Markov models
Learning
Natural language processing
neural networks
new words
Semantics
Task analysis
Training
Vector space
word embedding
title Mirroring Vector Space Embedding for New Words
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T01%3A31%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mirroring%20Vector%20Space%20Embedding%20for%20New%20Words&rft.jtitle=IEEE%20access&rft.au=Kim,%20Jihye&rft.date=2021&rft.volume=9&rft.spage=99954&rft.epage=99967&rft.pages=99954-99967&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3096238&rft_dat=%3Cproquest_doaj_%3E2553593405%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2553593405&rft_id=info:pmid/&rft_ieee_id=9481109&rft_doaj_id=oai_doaj_org_article_5b48ddd5f8d14c7b9cb75303e2240b56&rfr_iscdi=true