The Architecture of a Multipurpose
Australian National Corpus

Pam Peters

All proceedings

Info for readers Info for authors Info for editors Info for libraries

Order form Shopping cart

Paper 2281

The Architecture of a Multipurpose Australian National Corpus

Pam Peters
1-9 (complete paper or proceedings contents)

Abstract

The Australian National Corpus (AusNC) should embrace the research needs of as many kinds of linguists as possible, as well as those who work with language in other professions—for example, natural language processing (NLP) engineers, translators and interpreters, and speech pathologists. It calls for a core corpus consisting of targeted numbers of text-types in set categories of variation, so that the structure as a whole and its subsections can provide benchmarking for research into specialized kinds of language. In addition, a large open-ended collection of texts (text archive) from the internet and elsewhere would be useful for researchers in the humanities and NLP. Texts in languages other than English could be included in a translation subcorpus or as associated microcorpora. The underlying system needs to be capable of aligning texts with audio and video data, and common (XML) systems of annotation will be vital in supporting sophisticated searching of the AusNC.

Published in

Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages

edited by Michael Haugh, Kate Burridge, Jean Mulder, and Pam Peters

Table of contents

ISBN 978-1-57473-435-5 library binding
vi + 113 pages
publication date: 2009
published by Cascadilla Proceedings Project, Somerville, MA, USA