An LP-based hyperparameter optimization model for language modeling

In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2018-03
Hauptverfasser:	Amir Hossein Akhavan Rahnama, Toloo, Mehdi, Nezer Jacob Zaidenberg
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer Science - Learning Cost function Linear programming Machine learning Mathematics - Optimization and Control Modelling Nonlinear programming Optimization Search algorithms Statistics - Machine Learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Amir Hossein Akhavan Rahnama Toloo, Mehdi Nezer Jacob Zaidenberg
description	In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrating that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.
doi_str_mv	10.48550/arxiv.1803.10927
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_1803_10927</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2071937736</sourcerecordid><originalsourceid>FETCH-LOGICAL-a526-15afa9340ce3ac72b1464d7ddbf2b0fa911cad9eb68676611dfa610a38a827533</originalsourceid><addsrcrecordid>eNotj0trhDAcxEOh0GW7H6CnBnrW5mEeHhfpY0FoD3uXvybaLGps1NLtp69dexqYGYb5IXRHSZxoIcgjhG_3FVNNeExJytQV2jDOaaQTxm7QbhxPhBAmFROCb1C273H-HpUwWoM_zoMNAwTo7GQD9sPkOvcDk_M97ryxLa59wC30zQyNXS3XN7fouoZ2tLt_3aLj89Mxe43yt5dDts8jEExGVEANKU9IZTlUipU0kYlRxpQ1K8kSUVqBSW0ptVRSUmpqkJQA16CZEpxv0f06eyEshuA6COfij7S4kC6Nh7UxBP8523EqTn4O_fKpYETRlCvFJf8FZ6pWTQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2071937736</pqid></control><display><type>article</type><title>An LP-based hyperparameter optimization model for language modeling</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Amir Hossein Akhavan Rahnama ; Toloo, Mehdi ; Nezer Jacob Zaidenberg</creator><creatorcontrib>Amir Hossein Akhavan Rahnama ; Toloo, Mehdi ; Nezer Jacob Zaidenberg</creatorcontrib><description>In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrating that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1803.10927</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Computer Science - Learning ; Cost function ; Linear programming ; Machine learning ; Mathematics - Optimization and Control ; Modelling ; Nonlinear programming ; Optimization ; Search algorithms ; Statistics - Machine Learning</subject><ispartof>arXiv.org, 2018-03</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.1007/s11227-018-2236-6$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.1803.10927$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Amir Hossein Akhavan Rahnama</creatorcontrib><creatorcontrib>Toloo, Mehdi</creatorcontrib><creatorcontrib>Nezer Jacob Zaidenberg</creatorcontrib><title>An LP-based hyperparameter optimization model for language modeling</title><title>arXiv.org</title><description>In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrating that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.</description><subject>Algorithms</subject><subject>Computer Science - Learning</subject><subject>Cost function</subject><subject>Linear programming</subject><subject>Machine learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Modelling</subject><subject>Nonlinear programming</subject><subject>Optimization</subject><subject>Search algorithms</subject><subject>Statistics - Machine Learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotj0trhDAcxEOh0GW7H6CnBnrW5mEeHhfpY0FoD3uXvybaLGps1NLtp69dexqYGYb5IXRHSZxoIcgjhG_3FVNNeExJytQV2jDOaaQTxm7QbhxPhBAmFROCb1C273H-HpUwWoM_zoMNAwTo7GQD9sPkOvcDk_M97ryxLa59wC30zQyNXS3XN7fouoZ2tLt_3aLj89Mxe43yt5dDts8jEExGVEANKU9IZTlUipU0kYlRxpQ1K8kSUVqBSW0ptVRSUmpqkJQA16CZEpxv0f06eyEshuA6COfij7S4kC6Nh7UxBP8523EqTn4O_fKpYETRlCvFJf8FZ6pWTQ</recordid><startdate>20180329</startdate><enddate>20180329</enddate><creator>Amir Hossein Akhavan Rahnama</creator><creator>Toloo, Mehdi</creator><creator>Nezer Jacob Zaidenberg</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20180329</creationdate><title>An LP-based hyperparameter optimization model for language modeling</title><author>Amir Hossein Akhavan Rahnama ; Toloo, Mehdi ; Nezer Jacob Zaidenberg</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a526-15afa9340ce3ac72b1464d7ddbf2b0fa911cad9eb68676611dfa610a38a827533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Computer Science - Learning</topic><topic>Cost function</topic><topic>Linear programming</topic><topic>Machine learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Modelling</topic><topic>Nonlinear programming</topic><topic>Optimization</topic><topic>Search algorithms</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Amir Hossein Akhavan Rahnama</creatorcontrib><creatorcontrib>Toloo, Mehdi</creatorcontrib><creatorcontrib>Nezer Jacob Zaidenberg</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Amir Hossein Akhavan Rahnama</au><au>Toloo, Mehdi</au><au>Nezer Jacob Zaidenberg</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An LP-based hyperparameter optimization model for language modeling</atitle><jtitle>arXiv.org</jtitle><date>2018-03-29</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrating that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1803.10927</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2018-03
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_1803_10927
source	arXiv.org; Free E- Journals
subjects	Algorithms Computer Science - Learning Cost function Linear programming Machine learning Mathematics - Optimization and Control Modelling Nonlinear programming Optimization Search algorithms Statistics - Machine Learning
title	An LP-based hyperparameter optimization model for language modeling
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T18%3A01%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20LP-based%20hyperparameter%20optimization%20model%20for%20language%20modeling&rft.jtitle=arXiv.org&rft.au=Amir%20Hossein%20Akhavan%20Rahnama&rft.date=2018-03-29&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1803.10927&rft_dat=%3Cproquest_arxiv%3E2071937736%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2071937736&rft_id=info:pmid/&rfr_iscdi=true