A STEMMING ALGORITHM FOR LATIN TEXT DATABASES
This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for proc...
Gespeichert in:
Veröffentlicht in: | Journal of documentation 1996-06, Vol.52 (2), p.172-187 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 187 |
---|---|
container_issue | 2 |
container_start_page | 172 |
container_title | Journal of documentation |
container_volume | 52 |
creator | SCHINKE, ROBYN GREENGRASS, MARK ROBERTSON, ALEXANDER M. WILLETT, PETER |
description | This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words. |
doi_str_mv | 10.1108/eb026966 |
format | Article |
fullrecord | <record><control><sourceid>proquest_emera</sourceid><recordid>TN_cdi_proquest_miscellaneous_85654259</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>EJ526315</ericid><sourcerecordid>85654259</sourcerecordid><originalsourceid>FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</originalsourceid><addsrcrecordid>eNqF0VtLwzAUB_AgCs4L-AF8KCjqSzWX5vZYtW6TzclW0beQpilUu4vJBvrtzdjcg4g-hfD_cU7OCQBHCF4iBMWVLSBmkrEt0EKcipgTLrdBC0KMY5ggsQv2vH-FEIVAtECcRqM86_e7D-0o7bUHw27e6Ud3g2HUS_PuQ5RnL3l0m-bpdTrKRgdgp9KNt4frcx883WX5TSfuDdrdm7QXm4SReYxEKUpIhUTMCCQlqTjmnNnQNaHc8ircCqsTpqtEW401KbSkZVEwU5nSCLIPzlZ1Z276vrB-rsa1N7Zp9MROF14JymiCqfwXUk5kEjYQ4MkP-DpduEkYQiEmApKILPterJRxU--drdTM1WPtPhWCarle9b3eQE_XBbU3uqmcnpjabzyBYXiIAjteMetqs0mze4oZQTTE8Squ_dx-bHLt3hQLH0RV8ozV462UbfzC1DD483W5sXW6Kf9638nv8luoWVmRL4Vmo1E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1683949138</pqid></control><display><type>article</type><title>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</title><source>Emerald A-Z Current Journals</source><source>Periodicals Index Online</source><creator>SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER</creator><creatorcontrib>SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER</creatorcontrib><description>This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.</description><identifier>ISSN: 0022-0418</identifier><identifier>EISSN: 1758-7379</identifier><identifier>DOI: 10.1108/eb026966</identifier><identifier>CODEN: JDOCAS</identifier><language>eng</language><publisher>Bradford: MCB UP Ltd</publisher><subject>Algorithms ; Content analysis ; Exact sciences and technology ; Full Text Databases ; Grammar ; Indexing. Classification. Abstracting. Syntheses ; Information and communication sciences ; Information and document structure and analysis ; Information processing and retrieval ; Information science. Documentation ; Latin ; Latin literature ; Sciences and techniques of general use ; Search Strategies ; Searching ; Stem Analysis ; Suffixes ; Word Processing</subject><ispartof>Journal of documentation, 1996-06, Vol.52 (2), p.172-187</ispartof><rights>MCB UP Limited</rights><rights>1996 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</citedby><cites>FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/eb026966/full/pdf$$EPDF$$P50$$Gemerald$$H</linktopdf><linktohtml>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/eb026966/full/html$$EHTML$$P50$$Gemerald$$H</linktohtml><link.rule.ids>314,780,784,967,11635,27869,27924,27925,52686,52689</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ526315$$DView record in ERIC$$Hfree_for_read</backlink><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=3091601$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SCHINKE, ROBYN</creatorcontrib><creatorcontrib>GREENGRASS, MARK</creatorcontrib><creatorcontrib>ROBERTSON, ALEXANDER M.</creatorcontrib><creatorcontrib>WILLETT, PETER</creatorcontrib><title>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</title><title>Journal of documentation</title><description>This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.</description><subject>Algorithms</subject><subject>Content analysis</subject><subject>Exact sciences and technology</subject><subject>Full Text Databases</subject><subject>Grammar</subject><subject>Indexing. Classification. Abstracting. Syntheses</subject><subject>Information and communication sciences</subject><subject>Information and document structure and analysis</subject><subject>Information processing and retrieval</subject><subject>Information science. Documentation</subject><subject>Latin</subject><subject>Latin literature</subject><subject>Sciences and techniques of general use</subject><subject>Search Strategies</subject><subject>Searching</subject><subject>Stem Analysis</subject><subject>Suffixes</subject><subject>Word Processing</subject><issn>0022-0418</issn><issn>1758-7379</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1996</creationdate><recordtype>article</recordtype><sourceid>K30</sourceid><recordid>eNqF0VtLwzAUB_AgCs4L-AF8KCjqSzWX5vZYtW6TzclW0beQpilUu4vJBvrtzdjcg4g-hfD_cU7OCQBHCF4iBMWVLSBmkrEt0EKcipgTLrdBC0KMY5ggsQv2vH-FEIVAtECcRqM86_e7D-0o7bUHw27e6Ud3g2HUS_PuQ5RnL3l0m-bpdTrKRgdgp9KNt4frcx883WX5TSfuDdrdm7QXm4SReYxEKUpIhUTMCCQlqTjmnNnQNaHc8ircCqsTpqtEW401KbSkZVEwU5nSCLIPzlZ1Z276vrB-rsa1N7Zp9MROF14JymiCqfwXUk5kEjYQ4MkP-DpduEkYQiEmApKILPterJRxU--drdTM1WPtPhWCarle9b3eQE_XBbU3uqmcnpjabzyBYXiIAjteMetqs0mze4oZQTTE8Squ_dx-bHLt3hQLH0RV8ozV462UbfzC1DD483W5sXW6Kf9638nv8luoWVmRL4Vmo1E</recordid><startdate>19960601</startdate><enddate>19960601</enddate><creator>SCHINKE, ROBYN</creator><creator>GREENGRASS, MARK</creator><creator>ROBERTSON, ALEXANDER M.</creator><creator>WILLETT, PETER</creator><general>MCB UP Ltd</general><general>Emerald</general><general>Aslib, etc</general><scope>BSCLL</scope><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>GHXMH</scope><scope>GPCCI</scope><scope>IOIBA</scope><scope>K30</scope><scope>PAAUG</scope><scope>PAWHS</scope><scope>PAWZZ</scope><scope>PAXOH</scope><scope>PBHAV</scope><scope>PBQSW</scope><scope>PBYQZ</scope><scope>PCIWU</scope><scope>PCMID</scope><scope>PCZJX</scope><scope>PDGRG</scope><scope>PDWWI</scope><scope>PETMR</scope><scope>PFVGT</scope><scope>PGXDX</scope><scope>PIHIL</scope><scope>PISVA</scope><scope>PJCTQ</scope><scope>PJTMS</scope><scope>PLCHJ</scope><scope>PMHAD</scope><scope>PNQDJ</scope><scope>POUND</scope><scope>PPLAD</scope><scope>PQAPC</scope><scope>PQCAN</scope><scope>PQCMW</scope><scope>PQEME</scope><scope>PQHKH</scope><scope>PQMID</scope><scope>PQNCT</scope><scope>PQNET</scope><scope>PQSCT</scope><scope>PQSET</scope><scope>PSVJG</scope><scope>PVMQY</scope><scope>PZGFC</scope><scope>E3H</scope><scope>F2A</scope><scope>7T9</scope></search><sort><creationdate>19960601</creationdate><title>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</title><author>SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1996</creationdate><topic>Algorithms</topic><topic>Content analysis</topic><topic>Exact sciences and technology</topic><topic>Full Text Databases</topic><topic>Grammar</topic><topic>Indexing. Classification. Abstracting. Syntheses</topic><topic>Information and communication sciences</topic><topic>Information and document structure and analysis</topic><topic>Information processing and retrieval</topic><topic>Information science. Documentation</topic><topic>Latin</topic><topic>Latin literature</topic><topic>Sciences and techniques of general use</topic><topic>Search Strategies</topic><topic>Searching</topic><topic>Stem Analysis</topic><topic>Suffixes</topic><topic>Word Processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SCHINKE, ROBYN</creatorcontrib><creatorcontrib>GREENGRASS, MARK</creatorcontrib><creatorcontrib>ROBERTSON, ALEXANDER M.</creatorcontrib><creatorcontrib>WILLETT, PETER</creatorcontrib><collection>Istex</collection><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Periodicals Index Online Segment 09</collection><collection>Periodicals Index Online Segment 10</collection><collection>Periodicals Index Online Segment 29</collection><collection>Periodicals Index Online</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - West</collection><collection>Primary Sources Access (Plan D) - International</collection><collection>Primary Sources Access & Build (Plan A) - MEA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Midwest</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Northeast</collection><collection>Primary Sources Access (Plan D) - Southeast</collection><collection>Primary Sources Access (Plan D) - North Central</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Southeast</collection><collection>Primary Sources Access (Plan D) - South Central</collection><collection>Primary Sources Access & Build (Plan A) - UK / I</collection><collection>Primary Sources Access (Plan D) - Canada</collection><collection>Primary Sources Access (Plan D) - EMEALA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - North Central</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - South Central</collection><collection>Primary Sources Access & Build (Plan A) - International</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - International</collection><collection>Primary Sources Access (Plan D) - West</collection><collection>Periodicals Index Online Segments 1-50</collection><collection>Primary Sources Access (Plan D) - APAC</collection><collection>Primary Sources Access (Plan D) - Midwest</collection><collection>Primary Sources Access (Plan D) - MEA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Canada</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - UK / I</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - EMEALA</collection><collection>Primary Sources Access & Build (Plan A) - APAC</collection><collection>Primary Sources Access & Build (Plan A) - Canada</collection><collection>Primary Sources Access & Build (Plan A) - West</collection><collection>Primary Sources Access & Build (Plan A) - EMEALA</collection><collection>Primary Sources Access (Plan D) - Northeast</collection><collection>Primary Sources Access & Build (Plan A) - Midwest</collection><collection>Primary Sources Access & Build (Plan A) - North Central</collection><collection>Primary Sources Access & Build (Plan A) - Northeast</collection><collection>Primary Sources Access & Build (Plan A) - South Central</collection><collection>Primary Sources Access & Build (Plan A) - Southeast</collection><collection>Primary Sources Access (Plan D) - UK / I</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - APAC</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - MEA</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Journal of documentation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SCHINKE, ROBYN</au><au>GREENGRASS, MARK</au><au>ROBERTSON, ALEXANDER M.</au><au>WILLETT, PETER</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ526315</ericid><atitle>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</atitle><jtitle>Journal of documentation</jtitle><date>1996-06-01</date><risdate>1996</risdate><volume>52</volume><issue>2</issue><spage>172</spage><epage>187</epage><pages>172-187</pages><issn>0022-0418</issn><eissn>1758-7379</eissn><coden>JDOCAS</coden><abstract>This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.</abstract><cop>Bradford</cop><pub>MCB UP Ltd</pub><doi>10.1108/eb026966</doi><tpages>16</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0022-0418 |
ispartof | Journal of documentation, 1996-06, Vol.52 (2), p.172-187 |
issn | 0022-0418 1758-7379 |
language | eng |
recordid | cdi_proquest_miscellaneous_85654259 |
source | Emerald A-Z Current Journals; Periodicals Index Online |
subjects | Algorithms Content analysis Exact sciences and technology Full Text Databases Grammar Indexing. Classification. Abstracting. Syntheses Information and communication sciences Information and document structure and analysis Information processing and retrieval Information science. Documentation Latin Latin literature Sciences and techniques of general use Search Strategies Searching Stem Analysis Suffixes Word Processing |
title | A STEMMING ALGORITHM FOR LATIN TEXT DATABASES |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T09%3A13%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_emera&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20STEMMING%20ALGORITHM%20FOR%20LATIN%20TEXT%20DATABASES&rft.jtitle=Journal%20of%20documentation&rft.au=SCHINKE,%20ROBYN&rft.date=1996-06-01&rft.volume=52&rft.issue=2&rft.spage=172&rft.epage=187&rft.pages=172-187&rft.issn=0022-0418&rft.eissn=1758-7379&rft.coden=JDOCAS&rft_id=info:doi/10.1108/eb026966&rft_dat=%3Cproquest_emera%3E85654259%3C/proquest_emera%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1683949138&rft_id=info:pmid/&rft_ericid=EJ526315&rfr_iscdi=true |