Domain-specific unstructured text retrieval
Retrieving from the Internet unstructured text related to a specified domain is described. Training data is accessed; the training data comprises unstructured text related to the specified domain. A first classifier is trained using features of the training data. It is used to classify unstructured...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Retrieving from the Internet unstructured text related to a specified domain is described. Training data is accessed; the training data comprises unstructured text related to the specified domain. A first classifier is trained using features of the training data. It is used to classify unstructured text having plurality of features, to obtain unstructured text examples related to the domain. The unstructured text examples are used to retrieve from the Internet similar examples which do not have at least some of the plurality of features. Optionally, a second classifier is trained using the similar examples. Additional unstructured text is retrieved from the Internet and the second classifier is used to label the additional unstructured text for domain relevance. |
---|