This paper provides a history of the American National Corpus (ANC) project and summarizes its technical aspects, including processing procedure, representation format, and software that has been developed for using the corpus. It will recount the shifts in goals and approach that have occurred—partly for practical reasons, partly as a result of experience with corpus development and distribution, and partly because of evolving attitudes within the field of computational linguistics—over the past decade since the ANC project was initiated. The participants in the project hope that their experience and methodologies will provide methodologies and tools that can contribute to development of an Australian National Corpus (AusNC) and other national corpus-building projects. Their goal is to ensure that the ANC and the Australian National Corpus are fully interoperable, so that the two corpora can be both compared and used together.
Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus: Mustering Languages
edited by Michael Haugh, Kate Burridge, Jean Mulder, and Pam Peters
Table of contents