Language and Literature. A Corpus Stylistic Approach to Charles Dickens

1 Introduction

Early Corpus linguistics and stylistics began with Chomsky’s approach to language. He explicitly stated that there are three levels of adequacy upon which grammatical and linguistic theories can be evaluated: observational adequacy, descriptive adequacy and explanatory adequacy[1].

This “revolution” through Chomsky founded the basis for corpus-based analysis, a method which uses adequate examples to give introspection how a language works and how it is used by different authors. Corpus-based analysis offers new insights into studies of language and new computer tools and software make it possible to get access to a wide range of electronic corpora.

In my research paper I will carry out a corpus stylistic approach to the language of 19th century author Charles Dickens. This means that I will basically focus on his special register and investigate his use of particular clusters as recurrent combinations of words used in his corpus. Furthermore, I will focus on A Christmas Carol (1843) as an exemplifying novel of how language patterns are used by Dickens. This masterpiece has the smallest number of words of all his novels, namely 28.541[2], which renders it a special challenge to analyse. Moreover, it hasn’t been analysed by many corpus linguists before which puts A Christmas Carol in the light of a nearly unexamined piece of art ready to explore.

My thesis which will be developed in the following chapters would be that Dickens’s novels, especially A Christmas Carol, provide a unit of meaning, their own worlds of text, in which Dickens’s unique style can be sifted out, providing recurring clusters which offer a corpus work based on effective comparison.

To enter the deep analysis to provide a well-worked out research paper I will start with a description and findings of corpus analysis. Secondly, I will spend a chapter on three-, four- and five-word clusters in the Dickens Corpus and especially A Christmas Carol with particular focus on five-word clusters. Moreover, I will introduce the five categories of labels, body part, speech, time and place and as if clusters and examine these categories in Dickens’s Great Expectations (1861) and A Christmas Carol as two examples of a contrastive analysis. I will finish my work by concluding my previous discoveries.

The main basis of my analysis will be founded on Michaela Mahlberg’s work Clusters, key clusters and local textual functions in Dickens which gives an interesting insight into the linguistic and stylistic dynamics of Charles Dickens’s novels.

2 Outline of Corpus Analysis

In general, a corpus contains collections of computer-readable texts which often have several million words. For example, works of a specific author or of groups of authors can be compared statistically with texts by other authors which offers the possibility of isolating authorial ‘fingerprints’, but this method has been under-used by many stylisticians.[3]

A corpora with collections of texts, for instance of one specific author as Charles Dickens or of several authors of the 19th century, aims to approach to discourse representation. The most important features about corpus theoretical framework are that “[…] language is seen as a social phenomenon, meaning and form are associated and a corpus linguistic description of language prioritizes lexis”[4]. The latter, corpus linguistics, focuses on repeated and typical uses of language, where words tend to co-occur and form collocations. As a consequence, corpus linguistics relies on the evidence of language usage as collected and analysed in corpora.[5]

Another approach to study corpora is corpus stylistics. Although it is in an early stage of development among linguists, it covers the second important feature of corpus theoretical framework, namely the association of meaning and form. It focuses on flexible grammar with local categories of description and it elevates from linguistic norms that lead to the creation of artistic effects.[6]

A third possibility of a corpus-based approach is the corpus annotation which means to investigate a particular linguistic feature by taking (or making) a corpus and conducting a thorough and exhaustive analysis of the feature as it occurs in this corpus.[7]

In general, “many corpus linguists are actively engaged in issues of language theory, and many generative grammarians have shown an increasing concern for the data upon which their theories are based, even though data collection remains at best a marginal concern in modern generative theory.”[8]

After Noam Chomsky, there are three levels of adequacy upon which grammatical descriptions and linguistic theories can be evaluated: observational adequacy, descriptive adequacy, and explanatory adequacy.[9]

“If a theory or description achieves observational adequacy, it is able to describe which sentences in a language are grammatically well formed”[10]. To achieve descriptive adequacy (a higher level of adequacy), […] “the description or theory must not only describe whether individual sentences are well formed but in addition specify the abstract grammatical properties making the sentences well formed.”[11] The highest level of adequacy is explanatory adequacy, […] “which is achieved when the description or theory not only reaches descriptive adequacy but does so using abstract principles which can be applied beyond the language being considered and become a part of “Universal Grammar”.”[12]

Unlike generative grammarians, “[…] corpus linguists see complexity and variation as inherent in language, and in their discussions of language, they place a very high priority on descriptive adequacy, not explanatory adequacy”[13]. Additionally, corpora are excellent sources for verifying “[…] the falsifiability, completeness, simplicity, strength, and objectivity of any linguistic hypothesis”[14]. Therefore, corpora will probably never have much of a role in generative grammar. By doing a corpus-based analysis, language cannot be described without literature. For this reason, I decided to concentrate on the corpus stylistics-driven approach to analyse Charles Dickens’s A Christmas Carol. I will focus on local textual functions and clusters as initial pointers to these functions and as sequences of word forms that go together in a natural discourse.

Due to the opposition to generative grammar and their objectivity to any textual analysis, “[…] corpora are much better suited to functional analyses of language: analyses that are focused not simply on providing a formal description of language but on describing the use of language as a communicative tool.”[15]

3 Three, Four- and Five-Word Clusters in the Dickens Corpus and A Christmas Carol

Before I will start my analysis, I would like to give as short introduction to Dickens’s first novel of his Christmas Books.

A Christmas Carol, which was written in 1843 by Charles Dickens and which is the first part of The Christmas Books, depicts the intention to present the inhuman conditions of the lower class in 19th century England to the reader. It deals with social criticism and the categories of emotional depth and changes placed in the context of celebrating the Christmas season.

In the story, the first cold-hearted and miserly character Ebenezer Scrooge is introduced, who, after being confronted by the three spirits of Christmas past, Christmas present and Christmas yet to come, opens his heart to the magic of Christmas and becomes aware of the transience of life, enjoying the moment and providing help to poor people. The story is full if simple allegories and well-rounded characters. In the following chapters it will be set in the context of the whole Dickens Corpus provided my Michaela Mahlberg and be compared to it and analysed in terms of a stylistic approach to corpora.

The corpus that was used for my research paper is the Dickens Corpus provided by Michaela Mahlberg in Clusters, key clusters and local textual functions in Dickens. It contains 23 texts and about 4.5 million words[16]. Furthermore, the texts I used for my studies are A Christmas Carol containing 28.541 words and Great Expectations which consists of 186.274 words.[17] All texts are taken from Project Gutenberg which provides a format that can be used with WordSmith Tools Version 3.0.


Language and Literature. A Corpus Stylistic Approach to Charles Dickens
Language and Literature. A Corpus Stylistic Approach to Charles Dickens

