The Australian National Corpus (AusNC) should embrace the research needs of as many kinds of linguists as possible, as well as those who work with language in other professions—for example, natural language processing (NLP) engineers, translators and interpreters, and speech pathologists. It calls for a core corpus consisting of targeted numbers of text-types in set categories of variation, so that the structure as a whole and its subsections can provide benchmarking for research into specialized kinds of language. In addition, a large open-ended collection of texts (text archive) from the internet and elsewhere would be useful for researchers in the humanities and NLP. Texts in languages other than English could be included in a translation subcorpus or as associated microcorpora. The underlying system needs to be capable of aligning texts with audio and video data, and common (XML) systems of annotation will be vital in supporting sophisticated searching of the AusNC.
Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages
edited by Michael Haugh, Kate Burridge, Jean Mulder, and Pam Peters
Table of contents