Meeting the challenge: A benchmark corpus for automated Urdu meeting summarization

Meeting summarization has become crucial as the world is gradually shifting towards remote work. Nowadays, automation of meeting summary generation is really needed in order to minimize the time and effort. The surge in online meetings has made summarization an indispensable requirement, yet summari...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 2024-07, Vol.61 (4), p.103734, Article 103734
Hauptverfasser: Sadia, Bareera, Adeeba, Farah, Shams, Sana, Javed, Kashif
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Meeting summarization has become crucial as the world is gradually shifting towards remote work. Nowadays, automation of meeting summary generation is really needed in order to minimize the time and effort. The surge in online meetings has made summarization an indispensable requirement, yet summarizing Urdu meetings poses a formidable challenge due to the scarcity of pertinent corpora. Abstractively summarizing Urdu meetings compounds this challenge. This research addresses these gaps by introducing the Center for Language Engineering (CLE) Meeting Corpus, a benchmark resource tailored for meeting summarization in administrative and technical domains where Urdu is the primary language. Comprising 240 recorded meetings, encompassing both scenario-based and natural discussions, the corpus spans approximately 7900 min (∼132 h) of meeting duration. Beyond corpus creation, the study delves into the performance analysis of various deep learning models in Urdu abstractive meeting summarization. Models, including ur_mT5-small, ur_mT5-base, ur_mBART-large, ur_RoBERTa-urduhack-small, and GPT-3.5 with prompting, undergo comprehensive evaluation using both automated metrics and manual assessments based on five specific criteria. This research not only addresses the immediate challenges of Urdu meeting summarization but also contributes to advancing the capabilities of meeting summarization systems in diverse organizational contexts where Urdu is the language of communication during meetings. •Novel Corpus: Urdu CLE Meeting Corpus, benchmark for meeting summarization.•Model Fine-Tuning: Optimized deep learning models for Urdu meeting summarization.•Model Evaluation: Evaluated fine-tuned models for Urdu abstractive summaries.
ISSN:0306-4573
DOI:10.1016/j.ipm.2024.103734