Cascadilla Proceedings Project: Paper 1091


Home page

List of proceedings

Enter a document #:
Enter search terms:




Info for readers

Info for authors

Info for editors

Info for libraries



Order form

Shopping cart

Selected Proceedings of the 7th Hispanic Linguistics Symposium
edited by David Eddington

ISBN 1-57473-403-2 library binding
v + 202 pages
publication date: 2005
published by Cascadilla Proceedings Project, Somerville, MA, USA

Table of contents



Abstract

Mark Davies
Vocabulary Range and Text Coverage: Insights from the Forthcoming Routledge Frequency Dictionary of Spanish
106-115 (complete pdf)

A fundamental issue facing language learners and teachers is being able to effectively maximize the acquisition of vocabulary, by focusing on those words that the learner is most likely to encounter. This paper provides insight into this question, based on data from the forthcoming Routledge Frequency Dictionary of Spanish. This is the first large-scale frequency dictionary of Spanish in more than forty years, and is the first to be based on a large corpus (20 million words) representing equivalent sizes of sub-corpora from spoken Spanish, fiction, and non-fiction. The corpus was tagged and lemmatized, and then the most frequent 6000 lemma were selected based on overall frequency, range, and dispersion throughout the corpus. The data indicate that learners who have mastered approximately 4000 lemma will be able to recognize about 90% of all tokens in spoken Spanish, whereas approximately 7000 lemma are needed for 90% coverage in fiction, and 8000 lemma for non-fiction.


Copyright © 2004 Cascadilla Proceedings Project. All rights reserved. To request permission to copy any elements from our pages, or to send comments or questions about our pages, please write to webmaster@cascadilla.com and make sure to provide the URL of the particular page. This page last updated 8 December 2004.