Effectiveness of Deep Networks in NLP using BiDAF as an example architecture
Question Answering with NLP has progressed through the evolution of advanced model architectures like BERT and BiDAF and earlier word, character, and context-based embeddings. As BERT has leapfrogged the accuracy of models, an element of the next frontier can be the introduction of deep networks and...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Sarkar, Soumyendu |
description | Question Answering with NLP has progressed through the evolution of advanced
model architectures like BERT and BiDAF and earlier word, character, and
context-based embeddings. As BERT has leapfrogged the accuracy of models, an
element of the next frontier can be the introduction of deep networks and an
effective way to train them. In this context, I explored the effectiveness of
deep networks focussing on the model encoder layer of BiDAF. BiDAF with its
heterogeneous layers provides the opportunity not only to explore the
effectiveness of deep networks but also to evaluate whether the refinements
made in lower layers are additive to the refinements made in the upper layers
of the model architecture. I believe the next greatest model in NLP will in
fact fold in a solid language modeling like BERT with a composite architecture
which will bring in refinements in addition to generic language modeling and
will have a more extensive layered architecture. I experimented with the Bypass
network, Residual Highway network, and DenseNet architectures. In addition, I
evaluated the effectiveness of ensembling the last few layers of the network. I
also studied the difference character embeddings make in adding them to the
word embeddings, and whether the effects are additive with deep networks. My
studies indicate that deep networks are in fact effective in giving a boost.
Also, the refinements in the lower layers like embeddings are passed on
additively to the gains made through deep networks. |
doi_str_mv | 10.48550/arxiv.2109.00074 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2109_00074</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2109_00074</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-4deb5e343d317d4d63d5dcacf4ff420dec832e7542f7a3ea0d94998fb3fba5783</originalsourceid><addsrcrecordid>eNotz71OwzAUQGEvDKjwAEzcF0hwYrtOxtIfQIoKQ_foxr4XLNo0stNS3h5omc52pE-Iu0LmujJGPmA8hWNeFrLOpZRWX4tmyUxuDEfqKSXYMyyIBljT-LWPnwlCD-vmDQ4p9O_wGBazFWAC7IFOuBu2BBjdRxh_F4dIN-KKcZvo9r8TsVktN_PnrHl9epnPmgynVmfaU2dIaeVVYb32U-WNd-hYM-tSenKVKskaXbJFRSh9reu64k5xh8ZWaiLuL9szpx1i2GH8bv9Y7ZmlfgB_e0gJ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Effectiveness of Deep Networks in NLP using BiDAF as an example architecture</title><source>arXiv.org</source><creator>Sarkar, Soumyendu</creator><creatorcontrib>Sarkar, Soumyendu</creatorcontrib><description>Question Answering with NLP has progressed through the evolution of advanced
model architectures like BERT and BiDAF and earlier word, character, and
context-based embeddings. As BERT has leapfrogged the accuracy of models, an
element of the next frontier can be the introduction of deep networks and an
effective way to train them. In this context, I explored the effectiveness of
deep networks focussing on the model encoder layer of BiDAF. BiDAF with its
heterogeneous layers provides the opportunity not only to explore the
effectiveness of deep networks but also to evaluate whether the refinements
made in lower layers are additive to the refinements made in the upper layers
of the model architecture. I believe the next greatest model in NLP will in
fact fold in a solid language modeling like BERT with a composite architecture
which will bring in refinements in addition to generic language modeling and
will have a more extensive layered architecture. I experimented with the Bypass
network, Residual Highway network, and DenseNet architectures. In addition, I
evaluated the effectiveness of ensembling the last few layers of the network. I
also studied the difference character embeddings make in adding them to the
word embeddings, and whether the effects are additive with deep networks. My
studies indicate that deep networks are in fact effective in giving a boost.
Also, the refinements in the lower layers like embeddings are passed on
additively to the gains made through deep networks.</description><identifier>DOI: 10.48550/arxiv.2109.00074</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2021-08</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2109.00074$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2109.00074$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Sarkar, Soumyendu</creatorcontrib><title>Effectiveness of Deep Networks in NLP using BiDAF as an example architecture</title><description>Question Answering with NLP has progressed through the evolution of advanced
model architectures like BERT and BiDAF and earlier word, character, and
context-based embeddings. As BERT has leapfrogged the accuracy of models, an
element of the next frontier can be the introduction of deep networks and an
effective way to train them. In this context, I explored the effectiveness of
deep networks focussing on the model encoder layer of BiDAF. BiDAF with its
heterogeneous layers provides the opportunity not only to explore the
effectiveness of deep networks but also to evaluate whether the refinements
made in lower layers are additive to the refinements made in the upper layers
of the model architecture. I believe the next greatest model in NLP will in
fact fold in a solid language modeling like BERT with a composite architecture
which will bring in refinements in addition to generic language modeling and
will have a more extensive layered architecture. I experimented with the Bypass
network, Residual Highway network, and DenseNet architectures. In addition, I
evaluated the effectiveness of ensembling the last few layers of the network. I
also studied the difference character embeddings make in adding them to the
word embeddings, and whether the effects are additive with deep networks. My
studies indicate that deep networks are in fact effective in giving a boost.
Also, the refinements in the lower layers like embeddings are passed on
additively to the gains made through deep networks.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUQGEvDKjwAEzcF0hwYrtOxtIfQIoKQ_foxr4XLNo0stNS3h5omc52pE-Iu0LmujJGPmA8hWNeFrLOpZRWX4tmyUxuDEfqKSXYMyyIBljT-LWPnwlCD-vmDQ4p9O_wGBazFWAC7IFOuBu2BBjdRxh_F4dIN-KKcZvo9r8TsVktN_PnrHl9epnPmgynVmfaU2dIaeVVYb32U-WNd-hYM-tSenKVKskaXbJFRSh9reu64k5xh8ZWaiLuL9szpx1i2GH8bv9Y7ZmlfgB_e0gJ</recordid><startdate>20210831</startdate><enddate>20210831</enddate><creator>Sarkar, Soumyendu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210831</creationdate><title>Effectiveness of Deep Networks in NLP using BiDAF as an example architecture</title><author>Sarkar, Soumyendu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-4deb5e343d317d4d63d5dcacf4ff420dec832e7542f7a3ea0d94998fb3fba5783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Sarkar, Soumyendu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sarkar, Soumyendu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effectiveness of Deep Networks in NLP using BiDAF as an example architecture</atitle><date>2021-08-31</date><risdate>2021</risdate><abstract>Question Answering with NLP has progressed through the evolution of advanced
model architectures like BERT and BiDAF and earlier word, character, and
context-based embeddings. As BERT has leapfrogged the accuracy of models, an
element of the next frontier can be the introduction of deep networks and an
effective way to train them. In this context, I explored the effectiveness of
deep networks focussing on the model encoder layer of BiDAF. BiDAF with its
heterogeneous layers provides the opportunity not only to explore the
effectiveness of deep networks but also to evaluate whether the refinements
made in lower layers are additive to the refinements made in the upper layers
of the model architecture. I believe the next greatest model in NLP will in
fact fold in a solid language modeling like BERT with a composite architecture
which will bring in refinements in addition to generic language modeling and
will have a more extensive layered architecture. I experimented with the Bypass
network, Residual Highway network, and DenseNet architectures. In addition, I
evaluated the effectiveness of ensembling the last few layers of the network. I
also studied the difference character embeddings make in adding them to the
word embeddings, and whether the effects are additive with deep networks. My
studies indicate that deep networks are in fact effective in giving a boost.
Also, the refinements in the lower layers like embeddings are passed on
additively to the gains made through deep networks.</abstract><doi>10.48550/arxiv.2109.00074</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2109.00074 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2109_00074 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Learning |
title | Effectiveness of Deep Networks in NLP using BiDAF as an example architecture |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T21%3A48%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effectiveness%20of%20Deep%20Networks%20in%20NLP%20using%20BiDAF%20as%20an%20example%20architecture&rft.au=Sarkar,%20Soumyendu&rft.date=2021-08-31&rft_id=info:doi/10.48550/arxiv.2109.00074&rft_dat=%3Carxiv_GOX%3E2109_00074%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |