Multi-document keyphrase extraction using partial mutual information

A keyphrase extraction system and method are provided. The system and method can be employed to create an automatic summary of a subset of document(s). The system can automatically extract a list of keyword(s) that can operate on multiple documents, and across many different domains. The system is u...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Surendran, Arungunram C
Format: Patent
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A keyphrase extraction system and method are provided. The system and method can be employed to create an automatic summary of a subset of document(s). The system can automatically extract a list of keyword(s) that can operate on multiple documents, and across many different domains. The system is unsupervised and requires no prior learning.A term identifier identifies candidate terms (e.g., words and/or phrases) in the document subset which are used to form a document-term matrix. A probability computation component calculates probability values of: (1) the joint probability of a word (e.g., term) and a document, (2) the marginal probability of the word (e.g., term), and (3) the marginal probability of the document. Based on the probability values, a partial mutual information metric can be calculated for each candidate term. Based on the partial mutual information metric, one or more of the terms can be identified as summary keyphrases.