Designing a Multimodal Spoken
Component of the Australian National Corpus

Michael Haugh

All proceedings

Info for readers Info for authors Info for editors Info for libraries

Order form Shopping cart

Paper 2290

Designing a Multimodal Spoken Component of the Australian National Corpus

Michael Haugh
74-86 (complete paper or proceedings contents)

Abstract

The builders of the largest and most comprehensive spoken corpus to date, the spoken component of the British National Corpus, were constrained by the technologies available to them in the late 1980s and early 1990s. The movement into the hands of ordinary researchers of powerful technologies for digitizing and managing audio(visual) recordings, as well as transcribing or annotating such recordings, heralds a new age for the ways in which we study spoken interaction, namely the creation of multimodal spoken corpora. The paper begins by outlining what constitutes a multimodal corpus, and drawing a distinction between multimodal text corpora and multimodal spoken corpora, the latter of which is the primary focus of this paper. The case for why a multimodal spoken component of the Australian National Corpus (AusNC) is to be favoured over traditional approaches to spoken corpora is then outlined. Some of the key challenges that arise in designing a multimodal spoken corpus are next explored. In light of such a complex array of challenges, it is concluded that the principles outlined in Agile Corpus Creation theory (Voorman & Gut, 2008) constitute the most pragmatic way forward in designing and building a multimodal spoken component of the AusNC.

Published in

Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages

edited by Michael Haugh, Kate Burridge, Jean Mulder, and Pam Peters

Table of contents

ISBN 978-1-57473-435-5 library binding
vi + 113 pages
publication date: 2009
published by Cascadilla Proceedings Project, Somerville, MA, USA