Knowledge Distillation in Document Retrieval
Complex deep learning models now achieve state of the art performance for many document retrieval tasks. The best models process the query or claim jointly with the document. However for fast scalable search it is desirable to have document embeddings which are independent of the claim. In this pape...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Shakeri, Siamak Sethy, Abhinav Cheng, Cheng |
description | Complex deep learning models now achieve state of the art performance for
many document retrieval tasks. The best models process the query or claim
jointly with the document. However for fast scalable search it is desirable to
have document embeddings which are independent of the claim. In this paper we
show that knowledge distillation can be used to encourage a model that
generates claim independent document encodings to mimic the behavior of a more
complex model which generates claim dependent encodings. We explore this
approach in document retrieval for a fact extraction and verification task. We
show that by using the soft labels from a complex cross attention teacher
model, the performance of claim independent student LSTM or CNN models is
improved across all the ranking metrics. The student models we use are 12x
faster in runtime and 20x smaller in number of parameters than the teacher |
doi_str_mv | 10.48550/arxiv.1911.11065 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1911_11065</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1911_11065</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-8669d24bdcd0c99646e7252f496418e934ff249d8646a5743af03a9a9d68ccdb3</originalsourceid><addsrcrecordid>eNotzskKwjAYBOBcPIj6AJ7sA9iaNEuTo9QVBUG8l98sEkhbqXV7e9fTDAwMH0JDghMmOccTaB7-lhBFSEIIFryLxpuqvgdrTjaa-UvrQ4DW11Xkq2hW62tpqzba27bx9gahjzoOwsUO_tlDh8X8kK_i7W65zqfbGETGYymEMik7Gm2wVkowYbOUp469K5FWUeZcypSR7wV4xig4TEGBMkJqbY60h0a_2y-3ODe-hOZZfNjFl01fANY8SA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Knowledge Distillation in Document Retrieval</title><source>arXiv.org</source><creator>Shakeri, Siamak ; Sethy, Abhinav ; Cheng, Cheng</creator><creatorcontrib>Shakeri, Siamak ; Sethy, Abhinav ; Cheng, Cheng</creatorcontrib><description>Complex deep learning models now achieve state of the art performance for
many document retrieval tasks. The best models process the query or claim
jointly with the document. However for fast scalable search it is desirable to
have document embeddings which are independent of the claim. In this paper we
show that knowledge distillation can be used to encourage a model that
generates claim independent document encodings to mimic the behavior of a more
complex model which generates claim dependent encodings. We explore this
approach in document retrieval for a fact extraction and verification task. We
show that by using the soft labels from a complex cross attention teacher
model, the performance of claim independent student LSTM or CNN models is
improved across all the ranking metrics. The student models we use are 12x
faster in runtime and 20x smaller in number of parameters than the teacher</description><identifier>DOI: 10.48550/arxiv.1911.11065</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Information Retrieval ; Computer Science - Learning</subject><creationdate>2019-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1911.11065$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1911.11065$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Shakeri, Siamak</creatorcontrib><creatorcontrib>Sethy, Abhinav</creatorcontrib><creatorcontrib>Cheng, Cheng</creatorcontrib><title>Knowledge Distillation in Document Retrieval</title><description>Complex deep learning models now achieve state of the art performance for
many document retrieval tasks. The best models process the query or claim
jointly with the document. However for fast scalable search it is desirable to
have document embeddings which are independent of the claim. In this paper we
show that knowledge distillation can be used to encourage a model that
generates claim independent document encodings to mimic the behavior of a more
complex model which generates claim dependent encodings. We explore this
approach in document retrieval for a fact extraction and verification task. We
show that by using the soft labels from a complex cross attention teacher
model, the performance of claim independent student LSTM or CNN models is
improved across all the ranking metrics. The student models we use are 12x
faster in runtime and 20x smaller in number of parameters than the teacher</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzskKwjAYBOBcPIj6AJ7sA9iaNEuTo9QVBUG8l98sEkhbqXV7e9fTDAwMH0JDghMmOccTaB7-lhBFSEIIFryLxpuqvgdrTjaa-UvrQ4DW11Xkq2hW62tpqzba27bx9gahjzoOwsUO_tlDh8X8kK_i7W65zqfbGETGYymEMik7Gm2wVkowYbOUp469K5FWUeZcypSR7wV4xig4TEGBMkJqbY60h0a_2y-3ODe-hOZZfNjFl01fANY8SA</recordid><startdate>20191111</startdate><enddate>20191111</enddate><creator>Shakeri, Siamak</creator><creator>Sethy, Abhinav</creator><creator>Cheng, Cheng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20191111</creationdate><title>Knowledge Distillation in Document Retrieval</title><author>Shakeri, Siamak ; Sethy, Abhinav ; Cheng, Cheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-8669d24bdcd0c99646e7252f496418e934ff249d8646a5743af03a9a9d68ccdb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Shakeri, Siamak</creatorcontrib><creatorcontrib>Sethy, Abhinav</creatorcontrib><creatorcontrib>Cheng, Cheng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shakeri, Siamak</au><au>Sethy, Abhinav</au><au>Cheng, Cheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Knowledge Distillation in Document Retrieval</atitle><date>2019-11-11</date><risdate>2019</risdate><abstract>Complex deep learning models now achieve state of the art performance for
many document retrieval tasks. The best models process the query or claim
jointly with the document. However for fast scalable search it is desirable to
have document embeddings which are independent of the claim. In this paper we
show that knowledge distillation can be used to encourage a model that
generates claim independent document encodings to mimic the behavior of a more
complex model which generates claim dependent encodings. We explore this
approach in document retrieval for a fact extraction and verification task. We
show that by using the soft labels from a complex cross attention teacher
model, the performance of claim independent student LSTM or CNN models is
improved across all the ranking metrics. The student models we use are 12x
faster in runtime and 20x smaller in number of parameters than the teacher</abstract><doi>10.48550/arxiv.1911.11065</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1911.11065 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1911_11065 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Information Retrieval Computer Science - Learning |
title | Knowledge Distillation in Document Retrieval |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T07%3A14%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Knowledge%20Distillation%20in%20Document%20Retrieval&rft.au=Shakeri,%20Siamak&rft.date=2019-11-11&rft_id=info:doi/10.48550/arxiv.1911.11065&rft_dat=%3Carxiv_GOX%3E1911_11065%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |