Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation

Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-10
Hauptverfasser: Connor Lennox, Kashyapi, Sumanta, Dietz, Laura
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Connor Lennox
Kashyapi, Sumanta
Dietz, Laura
description Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this task, they are known to hallucinate new information, their models are secret, hard to analyze and control. Some generative LLMs provide supporting references, yet these are often unrelated to the generated content. As an alternative, we propose to study article generation systems that integrate document retrieval, query-specific clustering, and summarization. By design, such models can provide actual citations as provenance for their generated text. In particular, we contribute an evaluation framework that allows to separately trains and evaluate each of these three components before combining them into one system. We experimentally demonstrate that a system comprised of the best-performing individual components also obtains the best F-1 overall system quality.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2879442454</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2879442454</sourcerecordid><originalsourceid>FETCH-proquest_journals_28794424543</originalsourceid><addsrcrecordid>eNqNjcsKwjAQRYMgKOo_DLgO1DTV6q6Ij63afQl1KpE00Uki6NebhR_g6sDlXM6AjUWeL3gphRixmff3LMvEciWKIh-z2xkDaXwh35roAxK_xL5XpD-4gcpCZdJmVdAvhOBgZ688OJ4ANSlttb1B5whOEenN_QNb3ekWKgq6NQgHtEjp7OyUDTtlPM5-nLD5fldvj_xB7hnRh-buYgoZ34hytZZSyELm_1lf31tHPw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2879442454</pqid></control><display><type>article</type><title>Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation</title><source>Freely Accessible Journals</source><creator>Connor Lennox ; Kashyapi, Sumanta ; Dietz, Laura</creator><creatorcontrib>Connor Lennox ; Kashyapi, Sumanta ; Dietz, Laura</creatorcontrib><description>Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this task, they are known to hallucinate new information, their models are secret, hard to analyze and control. Some generative LLMs provide supporting references, yet these are often unrelated to the generated content. As an alternative, we propose to study article generation systems that integrate document retrieval, query-specific clustering, and summarization. By design, such models can provide actual citations as provenance for their generated text. In particular, we contribute an evaluation framework that allows to separately trains and evaluate each of these three components before combining them into one system. We experimentally demonstrate that a system comprised of the best-performing individual components also obtains the best F-1 overall system quality.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Clustering ; Large language models ; Queries</subject><ispartof>arXiv.org, 2023-10</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Connor Lennox</creatorcontrib><creatorcontrib>Kashyapi, Sumanta</creatorcontrib><creatorcontrib>Dietz, Laura</creatorcontrib><title>Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation</title><title>arXiv.org</title><description>Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this task, they are known to hallucinate new information, their models are secret, hard to analyze and control. Some generative LLMs provide supporting references, yet these are often unrelated to the generated content. As an alternative, we propose to study article generation systems that integrate document retrieval, query-specific clustering, and summarization. By design, such models can provide actual citations as provenance for their generated text. In particular, we contribute an evaluation framework that allows to separately trains and evaluate each of these three components before combining them into one system. We experimentally demonstrate that a system comprised of the best-performing individual components also obtains the best F-1 overall system quality.</description><subject>Clustering</subject><subject>Large language models</subject><subject>Queries</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjcsKwjAQRYMgKOo_DLgO1DTV6q6Ij63afQl1KpE00Uki6NebhR_g6sDlXM6AjUWeL3gphRixmff3LMvEciWKIh-z2xkDaXwh35roAxK_xL5XpD-4gcpCZdJmVdAvhOBgZ688OJ4ANSlttb1B5whOEenN_QNb3ekWKgq6NQgHtEjp7OyUDTtlPM5-nLD5fldvj_xB7hnRh-buYgoZ34hytZZSyELm_1lf31tHPw</recordid><startdate>20231018</startdate><enddate>20231018</enddate><creator>Connor Lennox</creator><creator>Kashyapi, Sumanta</creator><creator>Dietz, Laura</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231018</creationdate><title>Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation</title><author>Connor Lennox ; Kashyapi, Sumanta ; Dietz, Laura</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28794424543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Clustering</topic><topic>Large language models</topic><topic>Queries</topic><toplevel>online_resources</toplevel><creatorcontrib>Connor Lennox</creatorcontrib><creatorcontrib>Kashyapi, Sumanta</creatorcontrib><creatorcontrib>Dietz, Laura</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Connor Lennox</au><au>Kashyapi, Sumanta</au><au>Dietz, Laura</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation</atitle><jtitle>arXiv.org</jtitle><date>2023-10-18</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this task, they are known to hallucinate new information, their models are secret, hard to analyze and control. Some generative LLMs provide supporting references, yet these are often unrelated to the generated content. As an alternative, we propose to study article generation systems that integrate document retrieval, query-specific clustering, and summarization. By design, such models can provide actual citations as provenance for their generated text. In particular, we contribute an evaluation framework that allows to separately trains and evaluate each of these three components before combining them into one system. We experimentally demonstrate that a system comprised of the best-performing individual components also obtains the best F-1 overall system quality.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-10
issn 2331-8422
language eng
recordid cdi_proquest_journals_2879442454
source Freely Accessible Journals
subjects Clustering
Large language models
Queries
title Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T18%3A05%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Retrieve-Cluster-Summarize:%20An%20Alternative%20to%20End-to-End%20Training%20for%20Query-specific%20Article%20Generation&rft.jtitle=arXiv.org&rft.au=Connor%20Lennox&rft.date=2023-10-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2879442454%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2879442454&rft_id=info:pmid/&rfr_iscdi=true