Quantifying and Measuring
Morphological Complexity

Max Bane

Abstract

It is a standard assumption in linguistics that all human languages are equally (and enormously) complex; when looked at as a whole, no language can be called "simpler" than another. Certainly, languages can differ in the distribution of their complexity, so that one might employ a richer inflectional system, or entertain a more complicated gamut of syllable shapes than another, but it is generally supposed that these differences must "even out" as one considers entire linguistic systems. A number of researchers have recently begun to approach this equal complexity hypothesis as an empirical claim to be tested under particular definitions of complexity. Perhaps the most famous recent example is McWhorter's (2001) controversial claim that "creole grammars are the world's simplest grammars," but see also Juola (1998), Shosted (2006), Nichols (2007), and Pellegrino (2007). This paper argues for an information theoretic approach to defining linguistic complexity and offers preliminary results for a novel method of using the mathematical notion of Kolmogorov complexity together with an automatic lemmatizer to construct a numerical metric of morphological complexity.

Paper 1657

Abstract

Published in