|
List of proceedings Info for readers Info for authors Info for editors Info for libraries Order form Shopping cart |
The Architecture of a Multipurpose Australian National Corpus Pam Peters 1-9 (complete pdf) The Australian National Corpus (AusNC) should embrace the research needs of as many kinds of linguists as possible, as well as those who work with language in other professions--for example, natural language processing (NLP) engineers, translators and interpreters, and speech pathologists. It calls for a core corpus consisting of targeted numbers of text-types in set categories of variation, so that the structure as a whole and its subsections can provide benchmarking for research into specialized kinds of language. In addition, a large open-ended collection of texts (text archive) from the internet and elsewhere would be useful for researchers in the humanities and NLP. Texts in languages other than English could be included in a translation subcorpus or as associated microcorpora. The underlying system needs to be capable of aligning texts with audio and video data, and common (XML) systems of annotation will be vital in supporting sophisticated searching of the AusNC. Published in: Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages edited by Michael Haugh, Kate Burridge, Jean Mulder, and Pam Peters Table of contents ISBN 978-1-57473-435-5 library binding vi+113 pages publication date: 2009 published by Cascadilla Proceedings Project, Somerville, MA, USA |