Avoiding the Comparative Fallacy in
the Annotation of Learner Corpora

Marwa Ragheb; Markus Dickinson

Abstract

Annotated corpora of learner language can be useful for SLA researchers and FL teachers. Tagging phenomena with part of speech (POS) information and grammatical relations between words can make it feasible to search in a learner corpus for abstract grammatical properties not easily captured through a lexical search (e.g., headless relative clauses). One type of corpus annotation for learner language has focused on so-called errors (e.g., Granger, 2003), using specific error tags for phenomena that 'deviate' from the L2. Some of these schemes make use of target hypotheses, attempting to capture the learner's intention. These approaches risk falling into the comparative fallacy (Bley-Vroman, 1983), since they try to map specific phenomena in interlanguage to target categories in the L2. The task is even more challenging with ambiguous utterances. In the same vein, it is undesirable to bias any annotation in terms of the L1 (cf. Lakshmanan & Selinker, 2001). A recent approach is to annotate interlanguage as it appears, without focusing on errors (e.g., Díaz-Negrillo et al, 2010; Dickinson and Ragheb, 2009). In providing linguistic annotation such as POS tags or syntactic relations, one has to ensure that the annotation supports different topics of SLA research, while avoiding the comparative fallacy or inferring learner intention. This paper discusses the ramifications of annotating syntactic properties in learner language and pinpoints where annotation designers must be aware of the comparative fallacy. Using different layers of annotation to capture variability in learner language, the authors argue that one should annotate observable linguistic properties that are clearly defined. They show how even if one defines the properties in terms of the L2, a systematic description of learner data can support L2 syntactic studies, provide insight into interlanguage, and avoid inferring intention, putting the final interpretation of the data in the hands of SLA researchers.

Paper 2620

Abstract

Published in