Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation
Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this t...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-10 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Connor Lennox Kashyapi, Sumanta Dietz, Laura |
description | Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this task, they are known to hallucinate new information, their models are secret, hard to analyze and control. Some generative LLMs provide supporting references, yet these are often unrelated to the generated content. As an alternative, we propose to study article generation systems that integrate document retrieval, query-specific clustering, and summarization. By design, such models can provide actual citations as provenance for their generated text. In particular, we contribute an evaluation framework that allows to separately trains and evaluate each of these three components before combining them into one system. We experimentally demonstrate that a system comprised of the best-performing individual components also obtains the best F-1 overall system quality. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2879442454</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2879442454</sourcerecordid><originalsourceid>FETCH-proquest_journals_28794424543</originalsourceid><addsrcrecordid>eNqNjcsKwjAQRYMgKOo_DLgO1DTV6q6Ij63afQl1KpE00Uki6NebhR_g6sDlXM6AjUWeL3gphRixmff3LMvEciWKIh-z2xkDaXwh35roAxK_xL5XpD-4gcpCZdJmVdAvhOBgZ688OJ4ANSlttb1B5whOEenN_QNb3ekWKgq6NQgHtEjp7OyUDTtlPM5-nLD5fldvj_xB7hnRh-buYgoZ34hytZZSyELm_1lf31tHPw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2879442454</pqid></control><display><type>article</type><title>Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation</title><source>Freely Accessible Journals</source><creator>Connor Lennox ; Kashyapi, Sumanta ; Dietz, Laura</creator><creatorcontrib>Connor Lennox ; Kashyapi, Sumanta ; Dietz, Laura</creatorcontrib><description>Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this task, they are known to hallucinate new information, their models are secret, hard to analyze and control. Some generative LLMs provide supporting references, yet these are often unrelated to the generated content. As an alternative, we propose to study article generation systems that integrate document retrieval, query-specific clustering, and summarization. By design, such models can provide actual citations as provenance for their generated text. In particular, we contribute an evaluation framework that allows to separately trains and evaluate each of these three components before combining them into one system. We experimentally demonstrate that a system comprised of the best-performing individual components also obtains the best F-1 overall system quality.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Clustering ; Large language models ; Queries</subject><ispartof>arXiv.org, 2023-10</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Connor Lennox</creatorcontrib><creatorcontrib>Kashyapi, Sumanta</creatorcontrib><creatorcontrib>Dietz, Laura</creatorcontrib><title>Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation</title><title>arXiv.org</title><description>Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this task, they are known to hallucinate new information, their models are secret, hard to analyze and control. Some generative LLMs provide supporting references, yet these are often unrelated to the generated content. As an alternative, we propose to study article generation systems that integrate document retrieval, query-specific clustering, and summarization. By design, such models can provide actual citations as provenance for their generated text. In particular, we contribute an evaluation framework that allows to separately trains and evaluate each of these three components before combining them into one system. We experimentally demonstrate that a system comprised of the best-performing individual components also obtains the best F-1 overall system quality.</description><subject>Clustering</subject><subject>Large language models</subject><subject>Queries</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjcsKwjAQRYMgKOo_DLgO1DTV6q6Ij63afQl1KpE00Uki6NebhR_g6sDlXM6AjUWeL3gphRixmff3LMvEciWKIh-z2xkDaXwh35roAxK_xL5XpD-4gcpCZdJmVdAvhOBgZ688OJ4ANSlttb1B5whOEenN_QNb3ekWKgq6NQgHtEjp7OyUDTtlPM5-nLD5fldvj_xB7hnRh-buYgoZ34hytZZSyELm_1lf31tHPw</recordid><startdate>20231018</startdate><enddate>20231018</enddate><creator>Connor Lennox</creator><creator>Kashyapi, Sumanta</creator><creator>Dietz, Laura</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231018</creationdate><title>Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation</title><author>Connor Lennox ; Kashyapi, Sumanta ; Dietz, Laura</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28794424543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Clustering</topic><topic>Large language models</topic><topic>Queries</topic><toplevel>online_resources</toplevel><creatorcontrib>Connor Lennox</creatorcontrib><creatorcontrib>Kashyapi, Sumanta</creatorcontrib><creatorcontrib>Dietz, Laura</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Connor Lennox</au><au>Kashyapi, Sumanta</au><au>Dietz, Laura</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation</atitle><jtitle>arXiv.org</jtitle><date>2023-10-18</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this task, they are known to hallucinate new information, their models are secret, hard to analyze and control. Some generative LLMs provide supporting references, yet these are often unrelated to the generated content. As an alternative, we propose to study article generation systems that integrate document retrieval, query-specific clustering, and summarization. By design, such models can provide actual citations as provenance for their generated text. In particular, we contribute an evaluation framework that allows to separately trains and evaluate each of these three components before combining them into one system. We experimentally demonstrate that a system comprised of the best-performing individual components also obtains the best F-1 overall system quality.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2879442454 |
source | Freely Accessible Journals |
subjects | Clustering Large language models Queries |
title | Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T18%3A05%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Retrieve-Cluster-Summarize:%20An%20Alternative%20to%20End-to-End%20Training%20for%20Query-specific%20Article%20Generation&rft.jtitle=arXiv.org&rft.au=Connor%20Lennox&rft.date=2023-10-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2879442454%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2879442454&rft_id=info:pmid/&rfr_iscdi=true |