Cascadilla Proceedings Project: Paper 2281 Abstract


List of proceedings

Enter a document #:
Enter search terms:




Info for readers

Info for authors

Info for editors

Info for libraries



Order form

Shopping cart

The Architecture of a Multipurpose Australian National Corpus
Pam Peters
1-9 (complete pdf)
Bookmark and Share

The Australian National Corpus (AusNC) should embrace the research needs of as many kinds of linguists as possible, as well as those who work with language in other professions--for example, natural language processing (NLP) engineers, translators and interpreters, and speech pathologists. It calls for a core corpus consisting of targeted numbers of text-types in set categories of variation, so that the structure as a whole and its subsections can provide benchmarking for research into specialized kinds of language. In addition, a large open-ended collection of texts (text archive) from the internet and elsewhere would be useful for researchers in the humanities and NLP. Texts in languages other than English could be included in a translation subcorpus or as associated microcorpora. The underlying system needs to be capable of aligning texts with audio and video data, and common (XML) systems of annotation will be vital in supporting sophisticated searching of the AusNC.



Published in:
Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages
edited by Michael Haugh, Kate Burridge, Jean Mulder, and Pam Peters

Table of contents

ISBN 978-1-57473-435-5 library binding
vi+113 pages
publication date: 2009
published by Cascadilla Proceedings Project, Somerville, MA, USA

Printed edition: $190.00



Copyright © 2009 Cascadilla Proceedings Project. All rights reserved. To request permission to copy any elements from our pages, or to send comments or questions about our pages, please write to webmaster@cascadilla.com and make sure to provide the URL of the particular page.