Supervised learning for building stemmers

This work is part of a project aiming to define a methodology for building simple but robust stemmers, having primitive knowledge of the stemmer’s target language. The methodology starts with a very simple primary stemmer that simply removes the longest suffix (using the primitive knowledge – the li...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of information science 2015-06, Vol.41 (3), p.315-328
1. Verfasser:	Karanikolas, Nikitas N.
Format:	Artikel
Sprache:	eng
Schlagworte:	Information retrieval Language Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This work is part of a project aiming to define a methodology for building simple but robust stemmers, having primitive knowledge of the stemmer’s target language. The methodology starts with a very simple primary stemmer that simply removes the longest suffix (using the primitive knowledge – the list of available suffixes) that matches the ending of the examined word. Information retrieval (IR) experts express their arguments against the results of the primary stemmer. These (the experts’ arguments) are valuable knowledge that offer us the ability to apply supervised learning in order to automatically produce better stemmers (that conform to the arguments expressed by the IR experts). We also conduct an evaluation of our supervised learning-based methodology that builds stemmers for languages that the experts do not have knowledge on.
ISSN:	0165-5515 1741-6485
DOI:	10.1177/0165551515572528