Can Numbers Talk? Basic Data Management of a Corpus
This study attempts a series of quantitative analyses on a cornucopia of data in the Corpus of Scientific Journal Articles (CSJA), a special- purpose corpus consisting of 360 journal articles in 10 major scientific fields. Major findings include: (1) the average word length is 6.31 characters;(2) a...
Gespeichert in:
Veröffentlicht in: | RELC journal 1999-06, Vol.30 (1), p.1-17 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This study attempts a series of quantitative analyses on a cornucopia of data in the Corpus of Scientific Journal Articles (CSJA), a special- purpose corpus consisting of 360 journal articles in 10 major scientific fields. Major findings include: (1) the average word length is 6.31 characters;(2) a word-form occurs 36.8 times on average;(3) a text category having a larger number of running words tends to have a higher word recurrence rate; (4) most of the 100 most frequent word-forms are function words; (5) in comparison with the COBUILD corpus and the LOB corpus, numbers and letters are much more frequently used in the CSJA than in the other two corpora; (6) only a very limited number of word-forms have a high recurrence rate while more than half of the vocabulary occur only once or twice; (7) despite disciplinary difference, word frequency profiles of the ten scientific fields are very similar, showing that different scientific fields bear similar patterns in the use of words. |
---|---|
ISSN: | 0033-6882 1745-526X |
DOI: | 10.1177/003368829903000101 |