Can Numbers Talk? Basic Data Management of a Corpus

This study attempts a series of quantitative analyses on a cornucopia of data in the Corpus of Scientific Journal Articles (CSJA), a special- purpose corpus consisting of 360 journal articles in 10 major scientific fields. Major findings include: (1) the average word length is 6.31 characters;(2) a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:RELC journal 1999-06, Vol.30 (1), p.1-17
1. Verfasser: Kuo, Chih-Hua
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This study attempts a series of quantitative analyses on a cornucopia of data in the Corpus of Scientific Journal Articles (CSJA), a special- purpose corpus consisting of 360 journal articles in 10 major scientific fields. Major findings include: (1) the average word length is 6.31 characters;(2) a word-form occurs 36.8 times on average;(3) a text category having a larger number of running words tends to have a higher word recurrence rate; (4) most of the 100 most frequent word-forms are function words; (5) in comparison with the COBUILD corpus and the LOB corpus, numbers and letters are much more frequently used in the CSJA than in the other two corpora; (6) only a very limited number of word-forms have a high recurrence rate while more than half of the vocabulary occur only once or twice; (7) despite disciplinary difference, word frequency profiles of the ten scientific fields are very similar, showing that different scientific fields bear similar patterns in the use of words.
ISSN:0033-6882
1745-526X
DOI:10.1177/003368829903000101