Vocabulary Range and Text Coverage: Insights from the Forthcoming Routledge Frequency Dictionary of Spanish
Mark Davies
106-115 (complete paper or proceedings contents)
Abstract
A fundamental issue facing language learners and teachers is being able to effectively maximize the acquisition of vocabulary, by focusing on those words that the learner is most likely to encounter. This paper provides insight into this question, based on data from the forthcoming Routledge Frequency Dictionary of Spanish. This is the first large-scale frequency dictionary of Spanish in more than forty years, and is the first to be based on a large corpus (20 million words) representing equivalent sizes of sub-corpora from spoken Spanish, fiction, and non-fiction. The corpus was tagged and lemmatized, and then the most frequent 6000 lemma were selected based on overall frequency, range, and dispersion throughout the corpus. The data indicate that learners who have mastered approximately 4000 lemma will be able to recognize about 90% of all tokens in spoken Spanish, whereas approximately 7000 lemma are needed for 90% coverage in fiction, and 8000 lemma for non-fiction.
Published in
Selected Proceedings of the 7th Hispanic Linguistics Symposium
edited by David Eddington Table of contents
ISBN 978-1-57473-403-4 library binding
v + 202 pages
publication date: 2005
published by Cascadilla Proceedings Project, Somerville, MA, USA