KEMoS: A knowledge-enhanced multi-modal summarizing framework for Chinese online meetings

The demand for “online meetings” and “collaborative office work” keeps surging recently, producing an abundant amount of relevant data. How to provide participants with accurate and fast summarizing service has attracted extensive attention. Existing meeting summarizing models overlook the utilizati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2024-10, Vol.178, p.106417, Article 106417
Hauptverfasser: Qi, Peng, Sun, Yan, Yao, Muyan, Tao, Dan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The demand for “online meetings” and “collaborative office work” keeps surging recently, producing an abundant amount of relevant data. How to provide participants with accurate and fast summarizing service has attracted extensive attention. Existing meeting summarizing models overlook the utilization of multi-modal information and the information offsetting during summarizing. In this paper, we develop a knowledge-enhanced multi-modal summarizing framework. Firstly, we construct a three-layer multi-modal meeting knowledge graph, including basic, knowledge, and multi-modal layer, to integrate meeting information thoroughly. Then, we raise a topic-based hierarchical clustering approach, which considers information entropy and difference simultaneously, to capture the semantic evolution of meetings. Next, we devise a multi-modal enhanced encoding strategy, including a sentence-level cross-modal encoder, a joint loss function, and a knowledge graph embedding module, to learn the meeting and topic-level presentations. Finally, when generating summaries, we design a topic-enhanced decoding strategy for the Transformer decoder which mitigates semantic offsetting with the aid of topic information. Extensive experiments show that our proposed work consistently outperforms state-of-the-art solutions on the Chinese meeting dataset, where the ROUGE-1, ROUGE-2, and ROUGE-L are 49.98%, 21.03%, and 32.03% respectively.
ISSN:0893-6080
1879-2782
1879-2782
DOI:10.1016/j.neunet.2024.106417