A STEMMING ALGORITHM FOR LATIN TEXT DATABASES

This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for proc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of documentation 1996-06, Vol.52 (2), p.172-187
Hauptverfasser: SCHINKE, ROBYN, GREENGRASS, MARK, ROBERTSON, ALEXANDER M., WILLETT, PETER
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 187
container_issue 2
container_start_page 172
container_title Journal of documentation
container_volume 52
creator SCHINKE, ROBYN
GREENGRASS, MARK
ROBERTSON, ALEXANDER M.
WILLETT, PETER
description This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.
doi_str_mv 10.1108/eb026966
format Article
fullrecord <record><control><sourceid>proquest_emera</sourceid><recordid>TN_cdi_proquest_miscellaneous_85654259</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>EJ526315</ericid><sourcerecordid>85654259</sourcerecordid><originalsourceid>FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</originalsourceid><addsrcrecordid>eNqF0VtLwzAUB_AgCs4L-AF8KCjqSzWX5vZYtW6TzclW0beQpilUu4vJBvrtzdjcg4g-hfD_cU7OCQBHCF4iBMWVLSBmkrEt0EKcipgTLrdBC0KMY5ggsQv2vH-FEIVAtECcRqM86_e7D-0o7bUHw27e6Ud3g2HUS_PuQ5RnL3l0m-bpdTrKRgdgp9KNt4frcx883WX5TSfuDdrdm7QXm4SReYxEKUpIhUTMCCQlqTjmnNnQNaHc8ircCqsTpqtEW401KbSkZVEwU5nSCLIPzlZ1Z276vrB-rsa1N7Zp9MROF14JymiCqfwXUk5kEjYQ4MkP-DpduEkYQiEmApKILPterJRxU--drdTM1WPtPhWCarle9b3eQE_XBbU3uqmcnpjabzyBYXiIAjteMetqs0mze4oZQTTE8Squ_dx-bHLt3hQLH0RV8ozV462UbfzC1DD483W5sXW6Kf9638nv8luoWVmRL4Vmo1E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1683949138</pqid></control><display><type>article</type><title>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</title><source>Emerald A-Z Current Journals</source><source>Periodicals Index Online</source><creator>SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER</creator><creatorcontrib>SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER</creatorcontrib><description>This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.</description><identifier>ISSN: 0022-0418</identifier><identifier>EISSN: 1758-7379</identifier><identifier>DOI: 10.1108/eb026966</identifier><identifier>CODEN: JDOCAS</identifier><language>eng</language><publisher>Bradford: MCB UP Ltd</publisher><subject>Algorithms ; Content analysis ; Exact sciences and technology ; Full Text Databases ; Grammar ; Indexing. Classification. Abstracting. Syntheses ; Information and communication sciences ; Information and document structure and analysis ; Information processing and retrieval ; Information science. Documentation ; Latin ; Latin literature ; Sciences and techniques of general use ; Search Strategies ; Searching ; Stem Analysis ; Suffixes ; Word Processing</subject><ispartof>Journal of documentation, 1996-06, Vol.52 (2), p.172-187</ispartof><rights>MCB UP Limited</rights><rights>1996 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</citedby><cites>FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/eb026966/full/pdf$$EPDF$$P50$$Gemerald$$H</linktopdf><linktohtml>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/eb026966/full/html$$EHTML$$P50$$Gemerald$$H</linktohtml><link.rule.ids>314,780,784,967,11635,27869,27924,27925,52686,52689</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ526315$$DView record in ERIC$$Hfree_for_read</backlink><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=3091601$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SCHINKE, ROBYN</creatorcontrib><creatorcontrib>GREENGRASS, MARK</creatorcontrib><creatorcontrib>ROBERTSON, ALEXANDER M.</creatorcontrib><creatorcontrib>WILLETT, PETER</creatorcontrib><title>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</title><title>Journal of documentation</title><description>This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.</description><subject>Algorithms</subject><subject>Content analysis</subject><subject>Exact sciences and technology</subject><subject>Full Text Databases</subject><subject>Grammar</subject><subject>Indexing. Classification. Abstracting. Syntheses</subject><subject>Information and communication sciences</subject><subject>Information and document structure and analysis</subject><subject>Information processing and retrieval</subject><subject>Information science. Documentation</subject><subject>Latin</subject><subject>Latin literature</subject><subject>Sciences and techniques of general use</subject><subject>Search Strategies</subject><subject>Searching</subject><subject>Stem Analysis</subject><subject>Suffixes</subject><subject>Word Processing</subject><issn>0022-0418</issn><issn>1758-7379</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1996</creationdate><recordtype>article</recordtype><sourceid>K30</sourceid><recordid>eNqF0VtLwzAUB_AgCs4L-AF8KCjqSzWX5vZYtW6TzclW0beQpilUu4vJBvrtzdjcg4g-hfD_cU7OCQBHCF4iBMWVLSBmkrEt0EKcipgTLrdBC0KMY5ggsQv2vH-FEIVAtECcRqM86_e7D-0o7bUHw27e6Ud3g2HUS_PuQ5RnL3l0m-bpdTrKRgdgp9KNt4frcx883WX5TSfuDdrdm7QXm4SReYxEKUpIhUTMCCQlqTjmnNnQNaHc8ircCqsTpqtEW401KbSkZVEwU5nSCLIPzlZ1Z276vrB-rsa1N7Zp9MROF14JymiCqfwXUk5kEjYQ4MkP-DpduEkYQiEmApKILPterJRxU--drdTM1WPtPhWCarle9b3eQE_XBbU3uqmcnpjabzyBYXiIAjteMetqs0mze4oZQTTE8Squ_dx-bHLt3hQLH0RV8ozV462UbfzC1DD483W5sXW6Kf9638nv8luoWVmRL4Vmo1E</recordid><startdate>19960601</startdate><enddate>19960601</enddate><creator>SCHINKE, ROBYN</creator><creator>GREENGRASS, MARK</creator><creator>ROBERTSON, ALEXANDER M.</creator><creator>WILLETT, PETER</creator><general>MCB UP Ltd</general><general>Emerald</general><general>Aslib, etc</general><scope>BSCLL</scope><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>GHXMH</scope><scope>GPCCI</scope><scope>IOIBA</scope><scope>K30</scope><scope>PAAUG</scope><scope>PAWHS</scope><scope>PAWZZ</scope><scope>PAXOH</scope><scope>PBHAV</scope><scope>PBQSW</scope><scope>PBYQZ</scope><scope>PCIWU</scope><scope>PCMID</scope><scope>PCZJX</scope><scope>PDGRG</scope><scope>PDWWI</scope><scope>PETMR</scope><scope>PFVGT</scope><scope>PGXDX</scope><scope>PIHIL</scope><scope>PISVA</scope><scope>PJCTQ</scope><scope>PJTMS</scope><scope>PLCHJ</scope><scope>PMHAD</scope><scope>PNQDJ</scope><scope>POUND</scope><scope>PPLAD</scope><scope>PQAPC</scope><scope>PQCAN</scope><scope>PQCMW</scope><scope>PQEME</scope><scope>PQHKH</scope><scope>PQMID</scope><scope>PQNCT</scope><scope>PQNET</scope><scope>PQSCT</scope><scope>PQSET</scope><scope>PSVJG</scope><scope>PVMQY</scope><scope>PZGFC</scope><scope>E3H</scope><scope>F2A</scope><scope>7T9</scope></search><sort><creationdate>19960601</creationdate><title>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</title><author>SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1996</creationdate><topic>Algorithms</topic><topic>Content analysis</topic><topic>Exact sciences and technology</topic><topic>Full Text Databases</topic><topic>Grammar</topic><topic>Indexing. Classification. Abstracting. Syntheses</topic><topic>Information and communication sciences</topic><topic>Information and document structure and analysis</topic><topic>Information processing and retrieval</topic><topic>Information science. Documentation</topic><topic>Latin</topic><topic>Latin literature</topic><topic>Sciences and techniques of general use</topic><topic>Search Strategies</topic><topic>Searching</topic><topic>Stem Analysis</topic><topic>Suffixes</topic><topic>Word Processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SCHINKE, ROBYN</creatorcontrib><creatorcontrib>GREENGRASS, MARK</creatorcontrib><creatorcontrib>ROBERTSON, ALEXANDER M.</creatorcontrib><creatorcontrib>WILLETT, PETER</creatorcontrib><collection>Istex</collection><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Periodicals Index Online Segment 09</collection><collection>Periodicals Index Online Segment 10</collection><collection>Periodicals Index Online Segment 29</collection><collection>Periodicals Index Online</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - West</collection><collection>Primary Sources Access (Plan D) - International</collection><collection>Primary Sources Access &amp; Build (Plan A) - MEA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Midwest</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Northeast</collection><collection>Primary Sources Access (Plan D) - Southeast</collection><collection>Primary Sources Access (Plan D) - North Central</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Southeast</collection><collection>Primary Sources Access (Plan D) - South Central</collection><collection>Primary Sources Access &amp; Build (Plan A) - UK / I</collection><collection>Primary Sources Access (Plan D) - Canada</collection><collection>Primary Sources Access (Plan D) - EMEALA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - North Central</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - South Central</collection><collection>Primary Sources Access &amp; Build (Plan A) - International</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - International</collection><collection>Primary Sources Access (Plan D) - West</collection><collection>Periodicals Index Online Segments 1-50</collection><collection>Primary Sources Access (Plan D) - APAC</collection><collection>Primary Sources Access (Plan D) - Midwest</collection><collection>Primary Sources Access (Plan D) - MEA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Canada</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - UK / I</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - EMEALA</collection><collection>Primary Sources Access &amp; Build (Plan A) - APAC</collection><collection>Primary Sources Access &amp; Build (Plan A) - Canada</collection><collection>Primary Sources Access &amp; Build (Plan A) - West</collection><collection>Primary Sources Access &amp; Build (Plan A) - EMEALA</collection><collection>Primary Sources Access (Plan D) - Northeast</collection><collection>Primary Sources Access &amp; Build (Plan A) - Midwest</collection><collection>Primary Sources Access &amp; Build (Plan A) - North Central</collection><collection>Primary Sources Access &amp; Build (Plan A) - Northeast</collection><collection>Primary Sources Access &amp; Build (Plan A) - South Central</collection><collection>Primary Sources Access &amp; Build (Plan A) - Southeast</collection><collection>Primary Sources Access (Plan D) - UK / I</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - APAC</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - MEA</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Journal of documentation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SCHINKE, ROBYN</au><au>GREENGRASS, MARK</au><au>ROBERTSON, ALEXANDER M.</au><au>WILLETT, PETER</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ526315</ericid><atitle>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</atitle><jtitle>Journal of documentation</jtitle><date>1996-06-01</date><risdate>1996</risdate><volume>52</volume><issue>2</issue><spage>172</spage><epage>187</epage><pages>172-187</pages><issn>0022-0418</issn><eissn>1758-7379</eissn><coden>JDOCAS</coden><abstract>This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.</abstract><cop>Bradford</cop><pub>MCB UP Ltd</pub><doi>10.1108/eb026966</doi><tpages>16</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0022-0418
ispartof Journal of documentation, 1996-06, Vol.52 (2), p.172-187
issn 0022-0418
1758-7379
language eng
recordid cdi_proquest_miscellaneous_85654259
source Emerald A-Z Current Journals; Periodicals Index Online
subjects Algorithms
Content analysis
Exact sciences and technology
Full Text Databases
Grammar
Indexing. Classification. Abstracting. Syntheses
Information and communication sciences
Information and document structure and analysis
Information processing and retrieval
Information science. Documentation
Latin
Latin literature
Sciences and techniques of general use
Search Strategies
Searching
Stem Analysis
Suffixes
Word Processing
title A STEMMING ALGORITHM FOR LATIN TEXT DATABASES
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T09%3A13%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_emera&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20STEMMING%20ALGORITHM%20FOR%20LATIN%20TEXT%20DATABASES&rft.jtitle=Journal%20of%20documentation&rft.au=SCHINKE,%20ROBYN&rft.date=1996-06-01&rft.volume=52&rft.issue=2&rft.spage=172&rft.epage=187&rft.pages=172-187&rft.issn=0022-0418&rft.eissn=1758-7379&rft.coden=JDOCAS&rft_id=info:doi/10.1108/eb026966&rft_dat=%3Cproquest_emera%3E85654259%3C/proquest_emera%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1683949138&rft_id=info:pmid/&rft_ericid=EJ526315&rfr_iscdi=true