On The Persona-based Summarization of Domain-Specific Documents

ACL 2024 Findings (Association for Computational Linguistics) In an ever-expanding world of domain-specific knowledge, the increasing complexity of consuming, and storing information necessitates the generation of summaries from large information repositories. However, every persona of a domain has...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mullick, Ankan, Bose, Sombit, Saha, Rounak, Bhowmick, Ayan Kumar, Goyal, Pawan, Ganguly, Niloy, Dey, Prasenjit, Kokku, Ravi
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Information Retrieval
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Mullick, Ankan Bose, Sombit Saha, Rounak Bhowmick, Ayan Kumar Goyal, Pawan Ganguly, Niloy Dey, Prasenjit Kokku, Ravi
description	ACL 2024 Findings (Association for Computational Linguistics) In an ever-expanding world of domain-specific knowledge, the increasing complexity of consuming, and storing information necessitates the generation of summaries from large information repositories. However, every persona of a domain has different requirements of information and hence their summarization. For example, in the healthcare domain, a persona-based (such as Doctor, Nurse, Patient etc.) approach is imperative to deliver targeted medical information efficiently. Persona-based summarization of domain-specific information by humans is a high cognitive load task and is generally not preferred. The summaries generated by two different humans have high variability and do not scale in cost and subject matter expertise as domains and personas grow. Further, AI-generated summaries using generic Large Language Models (LLMs) may not necessarily offer satisfactory accuracy for different domains unless they have been specifically trained on domain-specific data and can also be very expensive to use in day-to-day operations. Our contribution in this paper is two-fold: 1) We present an approach to efficiently fine-tune a domain-specific small foundation LLM using a healthcare corpus and also show that we can effectively evaluate the summarization quality using AI-based critiquing. 2) We further show that AI-based critiquing has good concordance with Human-based critiquing of the summaries. Hence, such AI-based pipelines to generate domain-specific persona-based summaries can be easily scaled to other domains such as legal, enterprise documents, education etc. in a very efficient and cost-effective manner.
doi_str_mv	10.48550/arxiv.2406.03986
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_03986</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_03986</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-bee2297a992909e4e33ec960aeb2342ac05e534e264c9fa7151b49098462405f3</originalsourceid><addsrcrecordid>eNotj8uKwjAYhbOZhTg-gCvzAumkubVZyeAdBAW7L38zf5iATaVVmfHprZfV4cDh8H2EjFOeqFxr_gXtX7gmQnGTcGlzMyDTXaTFL9I9tl0TgVXQ4Q89XOoa2nCDc2gibTydNzWEyA4ndMEH13d3qTGeu0_y4eHY4eidQ1IsF8Vszba71Wb2vWVgMsMqRCFsBtYKyy0qlBKdNRywElIJcFyjlgqFUc56yFKdVqpf5sr0sNrLIZm8bp8G5akNPd9_-TApnybyDmYjQmU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On The Persona-based Summarization of Domain-Specific Documents</title><source>arXiv.org</source><creator>Mullick, Ankan ; Bose, Sombit ; Saha, Rounak ; Bhowmick, Ayan Kumar ; Goyal, Pawan ; Ganguly, Niloy ; Dey, Prasenjit ; Kokku, Ravi</creator><creatorcontrib>Mullick, Ankan ; Bose, Sombit ; Saha, Rounak ; Bhowmick, Ayan Kumar ; Goyal, Pawan ; Ganguly, Niloy ; Dey, Prasenjit ; Kokku, Ravi</creatorcontrib><description>ACL 2024 Findings (Association for Computational Linguistics) In an ever-expanding world of domain-specific knowledge, the increasing complexity of consuming, and storing information necessitates the generation of summaries from large information repositories. However, every persona of a domain has different requirements of information and hence their summarization. For example, in the healthcare domain, a persona-based (such as Doctor, Nurse, Patient etc.) approach is imperative to deliver targeted medical information efficiently. Persona-based summarization of domain-specific information by humans is a high cognitive load task and is generally not preferred. The summaries generated by two different humans have high variability and do not scale in cost and subject matter expertise as domains and personas grow. Further, AI-generated summaries using generic Large Language Models (LLMs) may not necessarily offer satisfactory accuracy for different domains unless they have been specifically trained on domain-specific data and can also be very expensive to use in day-to-day operations. Our contribution in this paper is two-fold: 1) We present an approach to efficiently fine-tune a domain-specific small foundation LLM using a healthcare corpus and also show that we can effectively evaluate the summarization quality using AI-based critiquing. 2) We further show that AI-based critiquing has good concordance with Human-based critiquing of the summaries. Hence, such AI-based pipelines to generate domain-specific persona-based summaries can be easily scaled to other domains such as legal, enterprise documents, education etc. in a very efficient and cost-effective manner.</description><identifier>DOI: 10.48550/arxiv.2406.03986</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Information Retrieval</subject><creationdate>2024-06</creationdate><rights>http://creativecommons.org/publicdomain/zero/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.03986$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.03986$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mullick, Ankan</creatorcontrib><creatorcontrib>Bose, Sombit</creatorcontrib><creatorcontrib>Saha, Rounak</creatorcontrib><creatorcontrib>Bhowmick, Ayan Kumar</creatorcontrib><creatorcontrib>Goyal, Pawan</creatorcontrib><creatorcontrib>Ganguly, Niloy</creatorcontrib><creatorcontrib>Dey, Prasenjit</creatorcontrib><creatorcontrib>Kokku, Ravi</creatorcontrib><title>On The Persona-based Summarization of Domain-Specific Documents</title><description>ACL 2024 Findings (Association for Computational Linguistics) In an ever-expanding world of domain-specific knowledge, the increasing complexity of consuming, and storing information necessitates the generation of summaries from large information repositories. However, every persona of a domain has different requirements of information and hence their summarization. For example, in the healthcare domain, a persona-based (such as Doctor, Nurse, Patient etc.) approach is imperative to deliver targeted medical information efficiently. Persona-based summarization of domain-specific information by humans is a high cognitive load task and is generally not preferred. The summaries generated by two different humans have high variability and do not scale in cost and subject matter expertise as domains and personas grow. Further, AI-generated summaries using generic Large Language Models (LLMs) may not necessarily offer satisfactory accuracy for different domains unless they have been specifically trained on domain-specific data and can also be very expensive to use in day-to-day operations. Our contribution in this paper is two-fold: 1) We present an approach to efficiently fine-tune a domain-specific small foundation LLM using a healthcare corpus and also show that we can effectively evaluate the summarization quality using AI-based critiquing. 2) We further show that AI-based critiquing has good concordance with Human-based critiquing of the summaries. Hence, such AI-based pipelines to generate domain-specific persona-based summaries can be easily scaled to other domains such as legal, enterprise documents, education etc. in a very efficient and cost-effective manner.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8uKwjAYhbOZhTg-gCvzAumkubVZyeAdBAW7L38zf5iATaVVmfHprZfV4cDh8H2EjFOeqFxr_gXtX7gmQnGTcGlzMyDTXaTFL9I9tl0TgVXQ4Q89XOoa2nCDc2gibTydNzWEyA4ndMEH13d3qTGeu0_y4eHY4eidQ1IsF8Vszba71Wb2vWVgMsMqRCFsBtYKyy0qlBKdNRywElIJcFyjlgqFUc56yFKdVqpf5sr0sNrLIZm8bp8G5akNPd9_-TApnybyDmYjQmU</recordid><startdate>20240606</startdate><enddate>20240606</enddate><creator>Mullick, Ankan</creator><creator>Bose, Sombit</creator><creator>Saha, Rounak</creator><creator>Bhowmick, Ayan Kumar</creator><creator>Goyal, Pawan</creator><creator>Ganguly, Niloy</creator><creator>Dey, Prasenjit</creator><creator>Kokku, Ravi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240606</creationdate><title>On The Persona-based Summarization of Domain-Specific Documents</title><author>Mullick, Ankan ; Bose, Sombit ; Saha, Rounak ; Bhowmick, Ayan Kumar ; Goyal, Pawan ; Ganguly, Niloy ; Dey, Prasenjit ; Kokku, Ravi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-bee2297a992909e4e33ec960aeb2342ac05e534e264c9fa7151b49098462405f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Mullick, Ankan</creatorcontrib><creatorcontrib>Bose, Sombit</creatorcontrib><creatorcontrib>Saha, Rounak</creatorcontrib><creatorcontrib>Bhowmick, Ayan Kumar</creatorcontrib><creatorcontrib>Goyal, Pawan</creatorcontrib><creatorcontrib>Ganguly, Niloy</creatorcontrib><creatorcontrib>Dey, Prasenjit</creatorcontrib><creatorcontrib>Kokku, Ravi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mullick, Ankan</au><au>Bose, Sombit</au><au>Saha, Rounak</au><au>Bhowmick, Ayan Kumar</au><au>Goyal, Pawan</au><au>Ganguly, Niloy</au><au>Dey, Prasenjit</au><au>Kokku, Ravi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On The Persona-based Summarization of Domain-Specific Documents</atitle><date>2024-06-06</date><risdate>2024</risdate><abstract>ACL 2024 Findings (Association for Computational Linguistics) In an ever-expanding world of domain-specific knowledge, the increasing complexity of consuming, and storing information necessitates the generation of summaries from large information repositories. However, every persona of a domain has different requirements of information and hence their summarization. For example, in the healthcare domain, a persona-based (such as Doctor, Nurse, Patient etc.) approach is imperative to deliver targeted medical information efficiently. Persona-based summarization of domain-specific information by humans is a high cognitive load task and is generally not preferred. The summaries generated by two different humans have high variability and do not scale in cost and subject matter expertise as domains and personas grow. Further, AI-generated summaries using generic Large Language Models (LLMs) may not necessarily offer satisfactory accuracy for different domains unless they have been specifically trained on domain-specific data and can also be very expensive to use in day-to-day operations. Our contribution in this paper is two-fold: 1) We present an approach to efficiently fine-tune a domain-specific small foundation LLM using a healthcare corpus and also show that we can effectively evaluate the summarization quality using AI-based critiquing. 2) We further show that AI-based critiquing has good concordance with Human-based critiquing of the summaries. Hence, such AI-based pipelines to generate domain-specific persona-based summaries can be easily scaled to other domains such as legal, enterprise documents, education etc. in a very efficient and cost-effective manner.</abstract><doi>10.48550/arxiv.2406.03986</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2406.03986
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2406_03986
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Information Retrieval
title	On The Persona-based Summarization of Domain-Specific Documents
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T09%3A56%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20The%20Persona-based%20Summarization%20of%20Domain-Specific%20Documents&rft.au=Mullick,%20Ankan&rft.date=2024-06-06&rft_id=info:doi/10.48550/arxiv.2406.03986&rft_dat=%3Carxiv_GOX%3E2406_03986%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true