A STEMMING ALGORITHM FOR LATIN TEXT DATABASES

This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for proc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of documentation 1996-06, Vol.52 (2), p.172-187
Hauptverfasser:	SCHINKE, ROBYN, GREENGRASS, MARK, ROBERTSON, ALEXANDER M., WILLETT, PETER
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Content analysis Exact sciences and technology Full Text Databases Grammar Indexing. Classification. Abstracting. Syntheses Information and communication sciences Information and document structure and analysis Information processing and retrieval Information science. Documentation Latin Latin literature Sciences and techniques of general use Search Strategies Searching Stem Analysis Suffixes Word Processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	187
container_issue	2
container_start_page	172
container_title	Journal of documentation
container_volume	52
creator	SCHINKE, ROBYN GREENGRASS, MARK ROBERTSON, ALEXANDER M. WILLETT, PETER
description	This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.
doi_str_mv	10.1108/eb026966
format	Article
fullrecord	<record><control><sourceid>proquest_emera</sourceid><recordid>TN_cdi_proquest_miscellaneous_85654259</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>EJ526315</ericid><sourcerecordid>85654259</sourcerecordid><originalsourceid>FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</originalsourceid><addsrcrecordid>eNqF0VtLwzAUB_AgCs4L-AF8KCjqSzWX5vZYtW6TzclW0beQpilUu4vJBvrtzdjcg4g-hfD_cU7OCQBHCF4iBMWVLSBmkrEt0EKcipgTLrdBC0KMY5ggsQv2vH-FEIVAtECcRqM86_e7D-0o7bUHw27e6Ud3g2HUS_PuQ5RnL3l0m-bpdTrKRgdgp9KNt4frcx883WX5TSfuDdrdm7QXm4SReYxEKUpIhUTMCCQlqTjmnNnQNaHc8ircCqsTpqtEW401KbSkZVEwU5nSCLIPzlZ1Z276vrB-rsa1N7Zp9MROF14JymiCqfwXUk5kEjYQ4MkP-DpduEkYQiEmApKILPterJRxU--drdTM1WPtPhWCarle9b3eQE_XBbU3uqmcnpjabzyBYXiIAjteMetqs0mze4oZQTTE8Squ_dx-bHLt3hQLH0RV8ozV462UbfzC1DD483W5sXW6Kf9638nv8luoWVmRL4Vmo1E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1683949138</pqid></control><display><type>article</type><title>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</title><source>Emerald A-Z Current Journals</source><source>Periodicals Index Online</source><creator>SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER</creator><creatorcontrib>SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER</creatorcontrib><description>This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.</description><identifier>ISSN: 0022-0418</identifier><identifier>EISSN: 1758-7379</identifier><identifier>DOI: 10.1108/eb026966</identifier><identifier>CODEN: JDOCAS</identifier><language>eng</language><publisher>Bradford: MCB UP Ltd</publisher><subject>Algorithms ; Content analysis ; Exact sciences and technology ; Full Text Databases ; Grammar ; Indexing. Classification. Abstracting. Syntheses ; Information and communication sciences ; Information and document structure and analysis ; Information processing and retrieval ; Information science. Documentation ; Latin ; Latin literature ; Sciences and techniques of general use ; Search Strategies ; Searching ; Stem Analysis ; Suffixes ; Word Processing</subject><ispartof>Journal of documentation, 1996-06, Vol.52 (2), p.172-187</ispartof><rights>MCB UP Limited</rights><rights>1996 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</citedby><cites>FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/eb026966/full/pdf$$EPDF$$P50$$Gemerald$$H</linktopdf><linktohtml>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/eb026966/full/html$$EHTML$$P50$$Gemerald$$H</linktohtml><link.rule.ids>314,780,784,967,11635,27869,27924,27925,52686,52689</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ526315$$DView record in ERIC$$Hfree_for_read</backlink><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=3091601$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SCHINKE, ROBYN</creatorcontrib><creatorcontrib>GREENGRASS, MARK</creatorcontrib><creatorcontrib>ROBERTSON, ALEXANDER M.</creatorcontrib><creatorcontrib>WILLETT, PETER</creatorcontrib><title>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</title><title>Journal of documentation</title><description>This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.</description><subject>Algorithms</subject><subject>Content analysis</subject><subject>Exact sciences and technology</subject><subject>Full Text Databases</subject><subject>Grammar</subject><subject>Indexing. Classification. Abstracting. Syntheses</subject><subject>Information and communication sciences</subject><subject>Information and document structure and analysis</subject><subject>Information processing and retrieval</subject><subject>Information science. Documentation</subject><subject>Latin</subject><subject>Latin literature</subject><subject>Sciences and techniques of general use</subject><subject>Search Strategies</subject><subject>Searching</subject><subject>Stem Analysis</subject><subject>Suffixes</subject><subject>Word Processing</subject><issn>0022-0418</issn><issn>1758-7379</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1996</creationdate><recordtype>article</recordtype><sourceid>K30</sourceid><recordid>eNqF0VtLwzAUB_AgCs4L-AF8KCjqSzWX5vZYtW6TzclW0beQpilUu4vJBvrtzdjcg4g-hfD_cU7OCQBHCF4iBMWVLSBmkrEt0EKcipgTLrdBC0KMY5ggsQv2vH-FEIVAtECcRqM86_e7D-0o7bUHw27e6Ud3g2HUS_PuQ5RnL3l0m-bpdTrKRgdgp9KNt4frcx883WX5TSfuDdrdm7QXm4SReYxEKUpIhUTMCCQlqTjmnNnQNaHc8ircCqsTpqtEW401KbSkZVEwU5nSCLIPzlZ1Z276vrB-rsa1N7Zp9MROF14JymiCqfwXUk5kEjYQ4MkP-DpduEkYQiEmApKILPterJRxU--drdTM1WPtPhWCarle9b3eQE_XBbU3uqmcnpjabzyBYXiIAjteMetqs0mze4oZQTTE8Squ_dx-bHLt3hQLH0RV8ozV462UbfzC1DD483W5sXW6Kf9638nv8luoWVmRL4Vmo1E</recordid><startdate>19960601</startdate><enddate>19960601</enddate><creator>SCHINKE, ROBYN</creator><creator>GREENGRASS, MARK</creator><creator>ROBERTSON, ALEXANDER M.</creator><creator>WILLETT, PETER</creator><general>MCB UP Ltd</general><general>Emerald</general><general>Aslib, etc</general><scope>BSCLL</scope><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>GHXMH</scope><scope>GPCCI</scope><scope>IOIBA</scope><scope>K30</scope><scope>PAAUG</scope><scope>PAWHS</scope><scope>PAWZZ</scope><scope>PAXOH</scope><scope>PBHAV</scope><scope>PBQSW</scope><scope>PBYQZ</scope><scope>PCIWU</scope><scope>PCMID</scope><scope>PCZJX</scope><scope>PDGRG</scope><scope>PDWWI</scope><scope>PETMR</scope><scope>PFVGT</scope><scope>PGXDX</scope><scope>PIHIL</scope><scope>PISVA</scope><scope>PJCTQ</scope><scope>PJTMS</scope><scope>PLCHJ</scope><scope>PMHAD</scope><scope>PNQDJ</scope><scope>POUND</scope><scope>PPLAD</scope><scope>PQAPC</scope><scope>PQCAN</scope><scope>PQCMW</scope><scope>PQEME</scope><scope>PQHKH</scope><scope>PQMID</scope><scope>PQNCT</scope><scope>PQNET</scope><scope>PQSCT</scope><scope>PQSET</scope><scope>PSVJG</scope><scope>PVMQY</scope><scope>PZGFC</scope><scope>E3H</scope><scope>F2A</scope><scope>7T9</scope></search><sort><creationdate>19960601</creationdate><title>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</title><author>SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c463t-18d8d058916c81993f72776e001457e7f277bea46af4aea2a3ba95dbb6cfcdc83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1996</creationdate><topic>Algorithms</topic><topic>Content analysis</topic><topic>Exact sciences and technology</topic><topic>Full Text Databases</topic><topic>Grammar</topic><topic>Indexing. Classification. Abstracting. Syntheses</topic><topic>Information and communication sciences</topic><topic>Information and document structure and analysis</topic><topic>Information processing and retrieval</topic><topic>Information science. Documentation</topic><topic>Latin</topic><topic>Latin literature</topic><topic>Sciences and techniques of general use</topic><topic>Search Strategies</topic><topic>Searching</topic><topic>Stem Analysis</topic><topic>Suffixes</topic><topic>Word Processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SCHINKE, ROBYN</creatorcontrib><creatorcontrib>GREENGRASS, MARK</creatorcontrib><creatorcontrib>ROBERTSON, ALEXANDER M.</creatorcontrib><creatorcontrib>WILLETT, PETER</creatorcontrib><collection>Istex</collection><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Periodicals Index Online Segment 09</collection><collection>Periodicals Index Online Segment 10</collection><collection>Periodicals Index Online Segment 29</collection><collection>Periodicals Index Online</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - West</collection><collection>Primary Sources Access (Plan D) - International</collection><collection>Primary Sources Access & Build (Plan A) - MEA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Midwest</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Northeast</collection><collection>Primary Sources Access (Plan D) - Southeast</collection><collection>Primary Sources Access (Plan D) - North Central</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Southeast</collection><collection>Primary Sources Access (Plan D) - South Central</collection><collection>Primary Sources Access & Build (Plan A) - UK / I</collection><collection>Primary Sources Access (Plan D) - Canada</collection><collection>Primary Sources Access (Plan D) - EMEALA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - North Central</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - South Central</collection><collection>Primary Sources Access & Build (Plan A) - International</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - International</collection><collection>Primary Sources Access (Plan D) - West</collection><collection>Periodicals Index Online Segments 1-50</collection><collection>Primary Sources Access (Plan D) - APAC</collection><collection>Primary Sources Access (Plan D) - Midwest</collection><collection>Primary Sources Access (Plan D) - MEA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Canada</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - UK / I</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - EMEALA</collection><collection>Primary Sources Access & Build (Plan A) - APAC</collection><collection>Primary Sources Access & Build (Plan A) - Canada</collection><collection>Primary Sources Access & Build (Plan A) - West</collection><collection>Primary Sources Access & Build (Plan A) - EMEALA</collection><collection>Primary Sources Access (Plan D) - Northeast</collection><collection>Primary Sources Access & Build (Plan A) - Midwest</collection><collection>Primary Sources Access & Build (Plan A) - North Central</collection><collection>Primary Sources Access & Build (Plan A) - Northeast</collection><collection>Primary Sources Access & Build (Plan A) - South Central</collection><collection>Primary Sources Access & Build (Plan A) - Southeast</collection><collection>Primary Sources Access (Plan D) - UK / I</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - APAC</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - MEA</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Journal of documentation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SCHINKE, ROBYN</au><au>GREENGRASS, MARK</au><au>ROBERTSON, ALEXANDER M.</au><au>WILLETT, PETER</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ526315</ericid><atitle>A STEMMING ALGORITHM FOR LATIN TEXT DATABASES</atitle><jtitle>Journal of documentation</jtitle><date>1996-06-01</date><risdate>1996</risdate><volume>52</volume><issue>2</issue><spage>172</spage><epage>187</epage><pages>172-187</pages><issn>0022-0418</issn><eissn>1758-7379</eissn><coden>JDOCAS</coden><abstract>This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.</abstract><cop>Bradford</cop><pub>MCB UP Ltd</pub><doi>10.1108/eb026966</doi><tpages>16</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0022-0418
ispartof	Journal of documentation, 1996-06, Vol.52 (2), p.172-187
issn	0022-0418 1758-7379
language	eng
recordid	cdi_proquest_miscellaneous_85654259
source	Emerald A-Z Current Journals; Periodicals Index Online
subjects	Algorithms Content analysis Exact sciences and technology Full Text Databases Grammar Indexing. Classification. Abstracting. Syntheses Information and communication sciences Information and document structure and analysis Information processing and retrieval Information science. Documentation Latin Latin literature Sciences and techniques of general use Search Strategies Searching Stem Analysis Suffixes Word Processing
title	A STEMMING ALGORITHM FOR LATIN TEXT DATABASES
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T09%3A13%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_emera&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20STEMMING%20ALGORITHM%20FOR%20LATIN%20TEXT%20DATABASES&rft.jtitle=Journal%20of%20documentation&rft.au=SCHINKE,%20ROBYN&rft.date=1996-06-01&rft.volume=52&rft.issue=2&rft.spage=172&rft.epage=187&rft.pages=172-187&rft.issn=0022-0418&rft.eissn=1758-7379&rft.coden=JDOCAS&rft_id=info:doi/10.1108/eb026966&rft_dat=%3Cproquest_emera%3E85654259%3C/proquest_emera%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1683949138&rft_id=info:pmid/&rft_ericid=EJ526315&rfr_iscdi=true