Although there are currently several corpora of Australian English, they have not been widely used due to their small size or scope in comparison with well-known corpora such as the British National Corpus (BNC) and the American National Corpus (ANC). In order to compile a corpus that will be widely used, it is necessary to make it comparable to those large corpora. This paper thus reviews the designs of current widely used corpora in the world and proposes a design for the Australian National Corpus (AusNC). In doing so, this paper outlines what needs to be taken into consideration to compile an Australian corpus, such as timeline, various genres or categories to be included in the corpus, and selection criteria for texts to be included in each genre or subgenre. A careful design of the corpus before actual data are collected or accepted for inclusion in the corpus will help avoid waste of resources.
Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages
edited by Michael Haugh, Kate Burridge, Jean Mulder, and Pam Peters
Table of contents