Text variability measures in corpus design for Setswana lexicography
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Newcastle upon Tyne
Cambridge Scholars Publ.
2011
|
Ausgabe: | 1. publ. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV040424553 | ||
003 | DE-604 | ||
005 | 20180103 | ||
007 | t | ||
008 | 120919s2011 xxkd||| |||| 00||| eng d | ||
010 | |a 2010671328 | ||
020 | |a 9781443826372 |9 978-1-4438-2637-2 | ||
020 | |a 1443826375 |9 1-4438-2637-5 | ||
035 | |a (OCoLC)815899025 | ||
035 | |a (DE-599)BVBBV040424553 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
044 | |a xxk |c GB | ||
049 | |a DE-12 |a DE-703 | ||
050 | 0 | |a PL8747.4 | |
084 | |a EP 19425 |0 (DE-625)26887:230 |2 rvk | ||
100 | 1 | |a Otlogetswe, Thapelo J. |e Verfasser |4 aut | |
245 | 1 | 0 | |a Text variability measures in corpus design for Setswana lexicography |c by Thapelo J. Otlogetswe |
250 | |a 1. publ. | ||
264 | 1 | |a Newcastle upon Tyne |b Cambridge Scholars Publ. |c 2011 | |
300 | |a XIV, 318 S. |b graph. Darst. |c 22 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Literaturverz. S. [275] - 292 | ||
650 | 4 | |a Tswana language |x Lexicography | |
650 | 0 | 7 | |a Tswana-Sprache |0 (DE-588)4256812-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Wortschatz |0 (DE-588)4126555-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Tswana-Sprache |0 (DE-588)4256812-2 |D s |
689 | 0 | 1 | |a Wortschatz |0 (DE-588)4126555-5 |D s |
689 | 0 | 2 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-025277181 |
Datensatz im Suchindex
_version_ | 1805326094271774720 |
---|---|
adam_text |
Table of Contents
List of Tables.viii
List of Figures.xi
Preface.xii
Acknowledgements.xiii
Abbreviations.xv
Chapter One.1
Introduction
LI Background to the study
L2 Statement of the research problem
1.3 Clarifying terms: genre, text type and varieties
1.4 Methodology
L5 Aims of the study
1.6 Research goals
1.7 Exposition of chapters
Chapter Two.11
The Setswana Language
2.1 The Botswana language situation
2.2 The Setswana language
2.3 Setswana dialects
2.3.1 The village, cattlepost, lands and city language
2.4 Domains of Setswana language use
2.4.1 Education
2.5 Text categories
2.6 Challenges of multilingualism and diglossia
2.7 The poverty of data
2.8 Setswana language research
2.9 Conclusion
VI
Table of Contents
Chapter Three.27
Corpus Lexicography
3.1 Introduction
3.2 What is a corpus?
3.3 Web as corpus
3.4 Frequency profiling: frequency and type/token
3.4.1 Frequency counts
3.4.2 Type/token and problems of wordhood
3.5 Relevance of corpora to lexicography
3.6 Some pre-electronic frequency studies
3.7 Electronic-corpora studies
3.8 Keyword analysis
3.9 Business keywords
3.10 Concordance
3.11 A review of existing methods of headword list identification
3.12 A historical perspective of headword lists
3.13 Non-corpus dependant methods of dictionary compilation
3.14 Semantic domains
3.15 Corpus lexicography and Setswana dictionaries
3.16 Conclusion
Chapter Four.70
Issues in Corpus Design for Lexicography
4.1 Introduction
4.2 Balance and representativeness
4.3 Corpus annotation
4.4 Sample size
4.5 Brown Corpus and BNC review
4.6 The exploration of both corpora
4.7 Conclusion
Chapter Five.119
The Setswana Corpus Compilation
5.1 Introduction
5.2 The design strategy
5.3 Overall corpus statistics
5.4 The Zipfian distribution
5.5 Corpus components
5.6 The compilation of corpus components
5.7 Conclusion
Text Variability Measures in Corpus Design for Setswana Lexicography vil
Chapter Six.145
Measuring Text Type Diversity
6.1 Introduction
6.2 Keyword analysis
6.3 Conclusion to keyword analysis
Chapter Seven.194
Type/Token Measures of Corpus Chunks
7.1 Type/token measures
7.2 Text divisions for experiments
7.2.1 Newspaper Components type/token
7.3 Conclusion of type-token measurements
7.4 A comparison of the top 100 tokens
7.5 A direct comparison of Setswana spoken and written corpus
components
7.6 Conclusion
Chapter Eight. 265
Conclusion and Future Work
8.1 Future research and applications
Bibliography.275
Appendix 1: Proposed Subentries of pelo Headword.293
Appendix 2: Participation Consent Form.295
Appendix 3: Conversation Log. 297
Appendix 4: Headteacher’s Letter.299
Appendix 5: Accompanying Details for Classroom Recordings.301
Appendix 6: Letter to publishers asking for text.303
Appendix 7: BNC Part-of-speech Codes. 305
Index
311
This book is about the design of a Setswana corpus for lexicography. While various corpora have been
compiled and a variety of corpora-based research has been attempted in African languages, no effort
has been made towards corpus design. Additionally, although extensive analysis of the Setswana
language has been done by missionaries, grammarians and linguists since the 1800s, none of this
research is in corpus design. Most research has been largely on the grammatical study of the language.
The recent corpora research in African languages in general has been on the use of corpora for the
compilation of dictionaries and little of it is in corpus design. Pioneers of this kind of corpora research
in African languages are Prinsloo and De Schryver (1999), De Schryver and Prisloo (2000 and 2001)
and Gouws and Prisloo (200S).
Because of a lack of research in corpora design particularly in African languages, this book attempts to
fill that gap, especially for Setswana. It is hoped that the finding of this study will inspire similar
designs in other languages comparable to Setswana.
We explore corpus design by focusing on measuring a variety of text types for lexical richness at
comparable token points.
The study explores the question of whether a corpus compiled for lexicography must comprise a
variety of texts drawn from different text types or whether the quality of retrieved information for
lexicographic purposes from a corpus comprising diverse text varieties could be equally extracted
from a corpus with a single text type. This study therefore determines whether linguistic variability is
crucial in corpus design for lexicography.
Thapelo J. Otlogetswe is Senior Lecturer of English Linguistics and Lexicography in the Department of English at the
University of Botswana, where he obtained a Bachelor of Arts and a Post Graduate Diploma in Education. He studied
for an MPhil in Comparative Linguistics and Philology at the University of Oxford. His doctoral studies in Corpus
Linguistics were done at the University of Brighton and the University of Pretoria. His research focuses on lexical
computing and corpus lexicography, particularly that of the Setswana1 language, which he is passionate about. His
research also includes computational and statistical genre and text type analysis, Setswana names, and Setswana
rhyming patterns. He has been involved in the development of a Setswana spellchecker (for OpenOffice) and the
compilation of a multi-million token Setswana corpus. Dr Otlogetswe has published a number of books, amongst these:
English-Setswana Dictionary and Poeletso-medumo ya Setswana: A Setswana Rhyming Dictionary and AL L. A. Kgasa: A
Pioneer Setswana Lexicographer. He has also co-authored a Setswana orthography book: Mokwalo o o lolameng wa
Setswana. Dr Otlogetswe led the groundbreaking translation work on the Setswana Google Search which has made it
possible for people to access the Google search interface in the Setswana language. He is a member of the African
Association for Lexicography (Afrilex) as well as a commissioner of the Setswana Commission established by the
Academy of African Languages (AC ALAN), a language arm of the African Union. |
any_adam_object | 1 |
author | Otlogetswe, Thapelo J. |
author_facet | Otlogetswe, Thapelo J. |
author_role | aut |
author_sort | Otlogetswe, Thapelo J. |
author_variant | t j o tj tjo |
building | Verbundindex |
bvnumber | BV040424553 |
callnumber-first | P - Language and Literature |
callnumber-label | PL8747 |
callnumber-raw | PL8747.4 |
callnumber-search | PL8747.4 |
callnumber-sort | PL 48747.4 |
callnumber-subject | PL - Eastern Asia, Africa, Oceania |
classification_rvk | EP 19425 |
ctrlnum | (OCoLC)815899025 (DE-599)BVBBV040424553 |
discipline | Außereuropäische Sprachen und Literaturen Literaturwissenschaft |
edition | 1. publ. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV040424553</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20180103</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">120919s2011 xxkd||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2010671328</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781443826372</subfield><subfield code="9">978-1-4438-2637-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1443826375</subfield><subfield code="9">1-4438-2637-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)815899025</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV040424553</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxk</subfield><subfield code="c">GB</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-12</subfield><subfield code="a">DE-703</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">PL8747.4</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">EP 19425</subfield><subfield code="0">(DE-625)26887:230</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Otlogetswe, Thapelo J.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text variability measures in corpus design for Setswana lexicography</subfield><subfield code="c">by Thapelo J. Otlogetswe</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Newcastle upon Tyne</subfield><subfield code="b">Cambridge Scholars Publ.</subfield><subfield code="c">2011</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XIV, 318 S.</subfield><subfield code="b">graph. Darst.</subfield><subfield code="c">22 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverz. S. [275] - 292</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Tswana language</subfield><subfield code="x">Lexicography</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Tswana-Sprache</subfield><subfield code="0">(DE-588)4256812-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Wortschatz</subfield><subfield code="0">(DE-588)4126555-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Tswana-Sprache</subfield><subfield code="0">(DE-588)4256812-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Wortschatz</subfield><subfield code="0">(DE-588)4126555-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-025277181</subfield></datafield></record></collection> |
id | DE-604.BV040424553 |
illustrated | Illustrated |
indexdate | 2024-07-23T00:05:27Z |
institution | BVB |
isbn | 9781443826372 1443826375 |
language | English |
lccn | 2010671328 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-025277181 |
oclc_num | 815899025 |
open_access_boolean | |
owner | DE-12 DE-703 |
owner_facet | DE-12 DE-703 |
physical | XIV, 318 S. graph. Darst. 22 cm |
publishDate | 2011 |
publishDateSearch | 2011 |
publishDateSort | 2011 |
publisher | Cambridge Scholars Publ. |
record_format | marc |
spelling | Otlogetswe, Thapelo J. Verfasser aut Text variability measures in corpus design for Setswana lexicography by Thapelo J. Otlogetswe 1. publ. Newcastle upon Tyne Cambridge Scholars Publ. 2011 XIV, 318 S. graph. Darst. 22 cm txt rdacontent n rdamedia nc rdacarrier Literaturverz. S. [275] - 292 Tswana language Lexicography Tswana-Sprache (DE-588)4256812-2 gnd rswk-swf Wortschatz (DE-588)4126555-5 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf Tswana-Sprache (DE-588)4256812-2 s Wortschatz (DE-588)4126555-5 s Korpus Linguistik (DE-588)4165338-5 s DE-604 Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext |
spellingShingle | Otlogetswe, Thapelo J. Text variability measures in corpus design for Setswana lexicography Tswana language Lexicography Tswana-Sprache (DE-588)4256812-2 gnd Wortschatz (DE-588)4126555-5 gnd Korpus Linguistik (DE-588)4165338-5 gnd |
subject_GND | (DE-588)4256812-2 (DE-588)4126555-5 (DE-588)4165338-5 |
title | Text variability measures in corpus design for Setswana lexicography |
title_auth | Text variability measures in corpus design for Setswana lexicography |
title_exact_search | Text variability measures in corpus design for Setswana lexicography |
title_full | Text variability measures in corpus design for Setswana lexicography by Thapelo J. Otlogetswe |
title_fullStr | Text variability measures in corpus design for Setswana lexicography by Thapelo J. Otlogetswe |
title_full_unstemmed | Text variability measures in corpus design for Setswana lexicography by Thapelo J. Otlogetswe |
title_short | Text variability measures in corpus design for Setswana lexicography |
title_sort | text variability measures in corpus design for setswana lexicography |
topic | Tswana language Lexicography Tswana-Sprache (DE-588)4256812-2 gnd Wortschatz (DE-588)4126555-5 gnd Korpus Linguistik (DE-588)4165338-5 gnd |
topic_facet | Tswana language Lexicography Tswana-Sprache Wortschatz Korpus Linguistik |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT otlogetswethapeloj textvariabilitymeasuresincorpusdesignforsetswanalexicography |