Using Particle Swarm Optimization to Improve the Precision and Recall of Taxonomy Extraction

The web offers a huge amount of information from which ontologies can be developed. The available information on the web can also be harnessed to help create correct ontologies from other documents. Taxonomy or hierarchy of concepts is part of ontology layers. In this work, we present the use of par...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Syafrullah, M., Salim, N. B.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The web offers a huge amount of information from which ontologies can be developed. The available information on the web can also be harnessed to help create correct ontologies from other documents. Taxonomy or hierarchy of concepts is part of ontology layers. In this work, we present the use of particle swarm optimization (PSO) for automatic acquisition of concept hierarchy from text documents. First, all pairs of noun phrases or terms in a text document was extracted and three different features were calculated for each pair. The first feature count the number of times the pair occurs in is-a relationship patterns in a redefined corpus over all is-a relations involving the first term in the corpus. The second feature counts the number of times the pairs occurs in is-a relationship patterns on a subset of web documents containing the pairs is-a relationship patterns over all is-a relations containing the first term in the subset. The third feature uses the number of paths between all senses of the terms in Word Net over all senses of the first term. Next, PSO was used to optimize the weights of each feature. Finally, the weights were multiplied with the feature values and then summed up to score all the pairs. The best scoring pairs were selected as taxonomy candidates derived from the documents and compared with a taxonomy gold standard developed previously with subject matter experts. Findings showed that the one obtained from Web get the best results when each feature was used individually. However, when the features were combined using weight optimized with PSO, the best result was achieved by the pre-defined corpus and Web. Overall, results for optimized weight using PSO is better than those using balanced weights for all features.
DOI:10.1109/DASC.2011.49