It is a standard assumption in linguistics that all human languages are equally (and enormously) complex; when looked at as a whole, no language can be called "simpler" than another. Certainly, languages can differ in the distribution of their complexity, so that one might employ a richer inflectional system, or entertain a more complicated gamut of syllable shapes than another, but it is generally supposed that these differences must "even out" as one considers entire linguistic systems. A number of researchers have recently begun to approach this equal complexity hypothesis as an empirical claim to be tested under particular definitions of complexity. Perhaps the most famous recent example is McWhorter's (2001) controversial claim that "creole grammars are the world's simplest grammars," but see also Juola (1998), Shosted (2006), Nichols (2007), and Pellegrino (2007). This paper argues for an information theoretic approach to defining linguistic complexity and offers preliminary results for a novel method of using the mathematical notion of Kolmogorov complexity together with an automatic lemmatizer to construct a numerical metric of morphological complexity.
Proceedings of the 26th West Coast Conference on Formal Linguistics
edited by Charles B. Chang and Hannah J. Haynie
Table of contents