Text variability measures in corpus design for Setswana lexicography

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Otlogetswe, Thapelo J. (VerfasserIn)
Format: Buch
Sprache:English
Veröffentlicht: Newcastle upon Tyne Cambridge Scholars Publ. 2011
Ausgabe:1. publ.
Schlagworte:
Online-Zugang:Inhaltsverzeichnis
Klappentext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!

MARC

LEADER 00000nam a2200000 c 4500
001 BV040424553
003 DE-604
005 20180103
007 t
008 120919s2011 xxkd||| |||| 00||| eng d
010 |a 2010671328 
020 |a 9781443826372  |9 978-1-4438-2637-2 
020 |a 1443826375  |9 1-4438-2637-5 
035 |a (OCoLC)815899025 
035 |a (DE-599)BVBBV040424553 
040 |a DE-604  |b ger  |e rakwb 
041 0 |a eng 
044 |a xxk  |c GB 
049 |a DE-12  |a DE-703 
050 0 |a PL8747.4 
084 |a EP 19425  |0 (DE-625)26887:230  |2 rvk 
100 1 |a Otlogetswe, Thapelo J.  |e Verfasser  |4 aut 
245 1 0 |a Text variability measures in corpus design for Setswana lexicography  |c by Thapelo J. Otlogetswe 
250 |a 1. publ. 
264 1 |a Newcastle upon Tyne  |b Cambridge Scholars Publ.  |c 2011 
300 |a XIV, 318 S.  |b graph. Darst.  |c 22 cm 
336 |b txt  |2 rdacontent 
337 |b n  |2 rdamedia 
338 |b nc  |2 rdacarrier 
500 |a Literaturverz. S. [275] - 292 
650 4 |a Tswana language  |x Lexicography 
650 0 7 |a Tswana-Sprache  |0 (DE-588)4256812-2  |2 gnd  |9 rswk-swf 
650 0 7 |a Wortschatz  |0 (DE-588)4126555-5  |2 gnd  |9 rswk-swf 
650 0 7 |a Korpus  |g Linguistik  |0 (DE-588)4165338-5  |2 gnd  |9 rswk-swf 
689 0 0 |a Tswana-Sprache  |0 (DE-588)4256812-2  |D s 
689 0 1 |a Wortschatz  |0 (DE-588)4126555-5  |D s 
689 0 2 |a Korpus  |g Linguistik  |0 (DE-588)4165338-5  |D s 
689 0 |5 DE-604 
856 4 2 |m Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment  |q application/pdf  |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA  |3 Inhaltsverzeichnis 
856 4 2 |m Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment  |q application/pdf  |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA  |3 Klappentext 
943 1 |a oai:aleph.bib-bvb.de:BVB01-025277181 

Datensatz im Suchindex

_version_ 1805326094271774720
adam_text Table of Contents List of Tables.viii List of Figures.xi Preface.xii Acknowledgements.xiii Abbreviations.xv Chapter One.1 Introduction LI Background to the study L2 Statement of the research problem 1.3 Clarifying terms: genre, text type and varieties 1.4 Methodology L5 Aims of the study 1.6 Research goals 1.7 Exposition of chapters Chapter Two.11 The Setswana Language 2.1 The Botswana language situation 2.2 The Setswana language 2.3 Setswana dialects 2.3.1 The village, cattlepost, lands and city language 2.4 Domains of Setswana language use 2.4.1 Education 2.5 Text categories 2.6 Challenges of multilingualism and diglossia 2.7 The poverty of data 2.8 Setswana language research 2.9 Conclusion VI Table of Contents Chapter Three.27 Corpus Lexicography 3.1 Introduction 3.2 What is a corpus? 3.3 Web as corpus 3.4 Frequency profiling: frequency and type/token 3.4.1 Frequency counts 3.4.2 Type/token and problems of wordhood 3.5 Relevance of corpora to lexicography 3.6 Some pre-electronic frequency studies 3.7 Electronic-corpora studies 3.8 Keyword analysis 3.9 Business keywords 3.10 Concordance 3.11 A review of existing methods of headword list identification 3.12 A historical perspective of headword lists 3.13 Non-corpus dependant methods of dictionary compilation 3.14 Semantic domains 3.15 Corpus lexicography and Setswana dictionaries 3.16 Conclusion Chapter Four.70 Issues in Corpus Design for Lexicography 4.1 Introduction 4.2 Balance and representativeness 4.3 Corpus annotation 4.4 Sample size 4.5 Brown Corpus and BNC review 4.6 The exploration of both corpora 4.7 Conclusion Chapter Five.119 The Setswana Corpus Compilation 5.1 Introduction 5.2 The design strategy 5.3 Overall corpus statistics 5.4 The Zipfian distribution 5.5 Corpus components 5.6 The compilation of corpus components 5.7 Conclusion Text Variability Measures in Corpus Design for Setswana Lexicography vil Chapter Six.145 Measuring Text Type Diversity 6.1 Introduction 6.2 Keyword analysis 6.3 Conclusion to keyword analysis Chapter Seven.194 Type/Token Measures of Corpus Chunks 7.1 Type/token measures 7.2 Text divisions for experiments 7.2.1 Newspaper Components type/token 7.3 Conclusion of type-token measurements 7.4 A comparison of the top 100 tokens 7.5 A direct comparison of Setswana spoken and written corpus components 7.6 Conclusion Chapter Eight. 265 Conclusion and Future Work 8.1 Future research and applications Bibliography.275 Appendix 1: Proposed Subentries of pelo Headword.293 Appendix 2: Participation Consent Form.295 Appendix 3: Conversation Log. 297 Appendix 4: Headteacher’s Letter.299 Appendix 5: Accompanying Details for Classroom Recordings.301 Appendix 6: Letter to publishers asking for text.303 Appendix 7: BNC Part-of-speech Codes. 305 Index 311 This book is about the design of a Setswana corpus for lexicography. While various corpora have been compiled and a variety of corpora-based research has been attempted in African languages, no effort has been made towards corpus design. Additionally, although extensive analysis of the Setswana language has been done by missionaries, grammarians and linguists since the 1800s, none of this research is in corpus design. Most research has been largely on the grammatical study of the language. The recent corpora research in African languages in general has been on the use of corpora for the compilation of dictionaries and little of it is in corpus design. Pioneers of this kind of corpora research in African languages are Prinsloo and De Schryver (1999), De Schryver and Prisloo (2000 and 2001) and Gouws and Prisloo (200S). Because of a lack of research in corpora design particularly in African languages, this book attempts to fill that gap, especially for Setswana. It is hoped that the finding of this study will inspire similar designs in other languages comparable to Setswana. We explore corpus design by focusing on measuring a variety of text types for lexical richness at comparable token points. The study explores the question of whether a corpus compiled for lexicography must comprise a variety of texts drawn from different text types or whether the quality of retrieved information for lexicographic purposes from a corpus comprising diverse text varieties could be equally extracted from a corpus with a single text type. This study therefore determines whether linguistic variability is crucial in corpus design for lexicography. Thapelo J. Otlogetswe is Senior Lecturer of English Linguistics and Lexicography in the Department of English at the University of Botswana, where he obtained a Bachelor of Arts and a Post Graduate Diploma in Education. He studied for an MPhil in Comparative Linguistics and Philology at the University of Oxford. His doctoral studies in Corpus Linguistics were done at the University of Brighton and the University of Pretoria. His research focuses on lexical computing and corpus lexicography, particularly that of the Setswana1 language, which he is passionate about. His research also includes computational and statistical genre and text type analysis, Setswana names, and Setswana rhyming patterns. He has been involved in the development of a Setswana spellchecker (for OpenOffice) and the compilation of a multi-million token Setswana corpus. Dr Otlogetswe has published a number of books, amongst these: English-Setswana Dictionary and Poeletso-medumo ya Setswana: A Setswana Rhyming Dictionary and AL L. A. Kgasa: A Pioneer Setswana Lexicographer. He has also co-authored a Setswana orthography book: Mokwalo o o lolameng wa Setswana. Dr Otlogetswe led the groundbreaking translation work on the Setswana Google Search which has made it possible for people to access the Google search interface in the Setswana language. He is a member of the African Association for Lexicography (Afrilex) as well as a commissioner of the Setswana Commission established by the Academy of African Languages (AC ALAN), a language arm of the African Union.
any_adam_object 1
author Otlogetswe, Thapelo J.
author_facet Otlogetswe, Thapelo J.
author_role aut
author_sort Otlogetswe, Thapelo J.
author_variant t j o tj tjo
building Verbundindex
bvnumber BV040424553
callnumber-first P - Language and Literature
callnumber-label PL8747
callnumber-raw PL8747.4
callnumber-search PL8747.4
callnumber-sort PL 48747.4
callnumber-subject PL - Eastern Asia, Africa, Oceania
classification_rvk EP 19425
ctrlnum (OCoLC)815899025
(DE-599)BVBBV040424553
discipline Außereuropäische Sprachen und Literaturen
Literaturwissenschaft
edition 1. publ.
format Book
fullrecord <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV040424553</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20180103</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">120919s2011 xxkd||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2010671328</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781443826372</subfield><subfield code="9">978-1-4438-2637-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1443826375</subfield><subfield code="9">1-4438-2637-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)815899025</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV040424553</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxk</subfield><subfield code="c">GB</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-12</subfield><subfield code="a">DE-703</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">PL8747.4</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">EP 19425</subfield><subfield code="0">(DE-625)26887:230</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Otlogetswe, Thapelo J.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text variability measures in corpus design for Setswana lexicography</subfield><subfield code="c">by Thapelo J. Otlogetswe</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Newcastle upon Tyne</subfield><subfield code="b">Cambridge Scholars Publ.</subfield><subfield code="c">2011</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XIV, 318 S.</subfield><subfield code="b">graph. Darst.</subfield><subfield code="c">22 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverz. S. [275] - 292</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Tswana language</subfield><subfield code="x">Lexicography</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Tswana-Sprache</subfield><subfield code="0">(DE-588)4256812-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Wortschatz</subfield><subfield code="0">(DE-588)4126555-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Tswana-Sprache</subfield><subfield code="0">(DE-588)4256812-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Wortschatz</subfield><subfield code="0">(DE-588)4126555-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&amp;doc_library=BVB01&amp;local_base=BVB01&amp;doc_number=025277181&amp;sequence=000001&amp;line_number=0001&amp;func_code=DB_RECORDS&amp;service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&amp;doc_library=BVB01&amp;local_base=BVB01&amp;doc_number=025277181&amp;sequence=000002&amp;line_number=0002&amp;func_code=DB_RECORDS&amp;service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-025277181</subfield></datafield></record></collection>
id DE-604.BV040424553
illustrated Illustrated
indexdate 2024-07-23T00:05:27Z
institution BVB
isbn 9781443826372
1443826375
language English
lccn 2010671328
oai_aleph_id oai:aleph.bib-bvb.de:BVB01-025277181
oclc_num 815899025
open_access_boolean
owner DE-12
DE-703
owner_facet DE-12
DE-703
physical XIV, 318 S. graph. Darst. 22 cm
publishDate 2011
publishDateSearch 2011
publishDateSort 2011
publisher Cambridge Scholars Publ.
record_format marc
spelling Otlogetswe, Thapelo J. Verfasser aut
Text variability measures in corpus design for Setswana lexicography by Thapelo J. Otlogetswe
1. publ.
Newcastle upon Tyne Cambridge Scholars Publ. 2011
XIV, 318 S. graph. Darst. 22 cm
txt rdacontent
n rdamedia
nc rdacarrier
Literaturverz. S. [275] - 292
Tswana language Lexicography
Tswana-Sprache (DE-588)4256812-2 gnd rswk-swf
Wortschatz (DE-588)4126555-5 gnd rswk-swf
Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf
Tswana-Sprache (DE-588)4256812-2 s
Wortschatz (DE-588)4126555-5 s
Korpus Linguistik (DE-588)4165338-5 s
DE-604
Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext
spellingShingle Otlogetswe, Thapelo J.
Text variability measures in corpus design for Setswana lexicography
Tswana language Lexicography
Tswana-Sprache (DE-588)4256812-2 gnd
Wortschatz (DE-588)4126555-5 gnd
Korpus Linguistik (DE-588)4165338-5 gnd
subject_GND (DE-588)4256812-2
(DE-588)4126555-5
(DE-588)4165338-5
title Text variability measures in corpus design for Setswana lexicography
title_auth Text variability measures in corpus design for Setswana lexicography
title_exact_search Text variability measures in corpus design for Setswana lexicography
title_full Text variability measures in corpus design for Setswana lexicography by Thapelo J. Otlogetswe
title_fullStr Text variability measures in corpus design for Setswana lexicography by Thapelo J. Otlogetswe
title_full_unstemmed Text variability measures in corpus design for Setswana lexicography by Thapelo J. Otlogetswe
title_short Text variability measures in corpus design for Setswana lexicography
title_sort text variability measures in corpus design for setswana lexicography
topic Tswana language Lexicography
Tswana-Sprache (DE-588)4256812-2 gnd
Wortschatz (DE-588)4126555-5 gnd
Korpus Linguistik (DE-588)4165338-5 gnd
topic_facet Tswana language Lexicography
Tswana-Sprache
Wortschatz
Korpus Linguistik
url http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025277181&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv AT otlogetswethapeloj textvariabilitymeasuresincorpusdesignforsetswanalexicography