Method for developing a classifier for classifying communications

A computer assisted/implemented method for developing a classifier for classifying communications includes roughly four stages, where these stages are designed to be iterative: (1) a stage defining where and how to harvest messages (i.e., from Internet message boards, ews groups and the like), which...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: NIGAM KAMAL P, STOCKTON ROBERT G
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A computer assisted/implemented method for developing a classifier for classifying communications includes roughly four stages, where these stages are designed to be iterative: (1) a stage defining where and how to harvest messages (i.e., from Internet message boards, ews groups and the like), which also defines an expected domain of application for the lassifier; (2) a guided question/answering stage for the computerized tool to elicit the user's criteria for determining whether a message is relevant or irrelevant; (3) a labeling stage where the user examines carefully-selected messages and provides feedback about whether or not it is relevant and sometimes also what elements of the criteria were used to make the decision; and (4) a performance evaluation stage where parameters of the classifier training are optimized, the best classifier is produced, and known performance bounds are calculated. In the guided question/answering stage, the criteria are parameterized in such a way that (a) they can be operationalized into the text classifier through key words and phrases, and (b) a human-readable criteria can be produced, which can be reviewed and edited. The labeling phase is oriented toward an extended Active Learning framework. That is, the exemplary embodiment decides which example messages to show the user based upon what category of messages the system thinks would be most useful to the Active Learning process.