Language Documentation and an
Australian National Corpus

Simon Musgrave; Sarah Cutfield

All proceedings

Info for readers Info for authors Info for editors Info for libraries

Order form Shopping cart

Paper 2282

Language Documentation and an Australian National Corpus

Simon Musgrave and Sarah Cutfield
10-18 (complete paper or proceedings contents)

Abstract

Corpus linguistics and language documentation are usually considered separate subdisciplines within linguistics, having developed from different traditions and often operating on different scales, but the authors will suggest that there are commonalities to the two: both aim to represent language use in a community, and both are concerned with managing digital data. The authors propose that the development of the Australian National Corpus (AusNC) be guided by the experience of language documentation in the management of multimodal digital data and its annotation, and in ethical issues pertaining to making the data accessible. This would allow an AusNC that is distributed, multimodal, and multilingual, with holdings of text, audio, and video data distributed across multiple institutions; and including Indigenous, sign, and migrant community languages. An audit of language material held by Australian institutions and individuals is necessary to gauge the diversity and volume of possible content, and to inform common technical standards.

Published in

Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages

edited by Michael Haugh, Kate Burridge, Jean Mulder, and Pam Peters

Table of contents

ISBN 978-1-57473-435-5 library binding
vi + 113 pages
publication date: 2009
published by Cascadilla Proceedings Project, Somerville, MA, USA