"Culturomics is a form of computational lexicology that studies human behavior and cultural trends through the quantitative analysis of digitized texts" (en.wikipedia.org/wiki/Culturomics, last visited 20 November 2012). The term was coined in a Science article significantly called "Quantitative Analysis of Culture Using Millions of Digitized Books". Michel and Aiden, two of the authors, helped create the Google Ngram Viewer which enables us to determine the relative frequency of any chosen word (or "n-gram") in over 5 million English books amounting to over 361 billion words (Michel et al. 2011a: 176, col. 2). In spite of its huge database the Ngram Viewer "can't sort books by genre or topic" (Nunberg 2010). Michel et al. admit, however, that subject- and genre-specific "subcorpora are an important area for future exploration" (2011b). In a case study of wrath and anger, the paper uses the genre categorization of the much more modest "European Database of Descriptors of English Electronic Texts" (EuDDEET, ca. 2,000 books at present) for such exploration. Issues to be addressed include OCR quality and representativeness in terms of genre, period, and topic/subject-matter.
Selected Proceedings of the 2012 Symposium on New Approaches in English Historical Lexis (HEL-LEX 3)
edited by R. W. McConchie, Teo Juvonen, Mark Kaunisto, Minna Nevala, and Jukka Tyrkkö
Table of contents