A Corpus-Driven Approach to Stylistic Analysis of a Lexical Richness Curve

An Analysis of Six English Novels

Scientific Study, 2017

90 Pages



Chapter One Introduction

Chapter Two Stylistics Vs. Corpus Linguistics
2.1 What is Stylistics all about ?
2.1.1 The Need for Stylistics
2.1.2 Major Areas of Stylistics
2.2 Corpus Linguistics : What is it all about ?
2.2.1 Features of Corpus Linguistics :
2.2.2 The Goals of Corpus Linguistics
2.2.3 What is Corpus ?
2.2.4 Types of Corpora
2.2.5 The Use of Computer to Study Language
2.2.6 The Corpus-Based Approach VS. The Intuition- Based Approach
2.2.7 Corpus Linguistics : A Methodology or an Independent Discipline
2.2.8 Corpus - Based and Corpus-Driven Approaches
2.3 Corpus Stylistics
2.3.1 Goals of Corpus Stylistics
2.3.2 Features of Corpus Stylistics
2.3.3 Limitations of Corpus Stylistics
2.3.4 The Circle of Corpus Linguistics Description and Literary Appreciation

Chapter Three Lexical Richness as a Stylistic Feature
3.1 Lexical Richness
3.2 Measuring Style
3.3 The Problem of Measuring Style
3.4 Style as Recurrence
3.5 The Relationship Between Frequency and Significance in a Corpus
3.6 The Quantitative Features of Style
3.6.1 Lexical Features
3.6.2 Character Specific Features ( N-gram feature)
3.6.3 Syntactic Features
3.6.4 Semantic Features
3.7 Lexical Richness and Type-Token Curve
3.8 Creativity and Literary Vocabulary

Chapter Four Text Corpora and Methodology
4.1 Design Consideration
4.2 Corpus Design
4.3 Technical Preparation
4.3.1 Planning a Storage System
4.3.2 Copyright
4.3.3 Electronic Version
4.4 Text Corpora
4.5 Corpus Features
4.6 Methods Used in Analyzing the Corpus
4.6.1 WordSmith Tools Version ( 7.0 )
4.6.2 Microsoft Office Excel
4.7 The Length of Individual Text Samples
4.8 Representativeness and Balance
4.9 Sampling Methodology
4.10 Summary of the Analysis Procedures

Chapter Five Analysis and Results
5.1 Type – Token Curve Analysis
5.1.1 Type – Token Curve Analysis of Virginia Woolf's Novels
5.1.2 Type – Token Curve Analysis of William Faulkner's Novels.
5.1.3 Type – Token Curve Analysis of James Joyce's Ulysses and A Portrait of the Artist as a Young Man
5.1.4 Type – Token Curves of the Three Authors
5.2 The Results

Chapter Six Conclusions and Suggestions for Further Studies

6.1 Conclusions


Chapter One Introduction

Corpus stylistics concentrates on the application of corpus methods in order to analyze literary and non- literary texts by combining linguistic descriptions with literary appreciations so corpus stylistics is based on the combination of two disciplines, stylistics and corpus linguistics .

The field of corpus stylistics is a relatively new field of study and the studies conducted within this field are few comparing with the other fields. The present book uses corpus methods to study certain statistical aspects of literary texts.

The problem the researcherss deal with might be formed in three preliminary questions:

1. What are the analytical potentialities of corpus stylistics in general?
2. How far can researcherss utilize type-token curves to investigate the writers' lexical repertoire?
3. Can researcherss produce rigorous statistical verifications and descriptions of writers' lexical profiles, besides tracing and comparing their lexical developments over the course of time?

The major goals the researchers trying to achieve are threefold:

1. investigating the usefulness of the type – token curves in measuring lexical diversity and finding about its statistical reliability in giving rigorous accounts of lexical measurements.
2. analyzing and describing the lexical richness of six novels of stream of consciousness : Virginia Woolf's The Waves and To the Lighthouse, William Faulkner's The Sound and the Fury and Light in August, and James Joyce's Ulysses and A portrait of the artist as a Young Man.
3. comparing the statistical results of each novel with the others to find out the richest and the poorest ones.

It became evident that corpus stylistics constitutes a rigorous source of verifying the validity of critical impressions about writers' lexical richness and diversity. Type – token curves hold a central position within the toolkit of corpus-driven stylistic investigations. They can be used actively in giving a highly objective measurement of lexical richness. This statistical gauge might even suggest whether writers' vocabulary vividness develops or declines in the course of time. As it will be shown throughout the analyses conducted throughout this study regarding Faulkner's lexical richness.

In order to fulfill the goals of the study and to verify its underlying hypotheses the following procedures have been followed:

1. building up a corpus of all the texts to be studied using readily available electronic data ( machine-readable corpus).
2. segmenting 15 random samples from each text with approximately 1,000 tokens intervals.
3. finding word frequency for each sample using WordSmith Tools ( 0.7 ) calculating the frequency of tokens and types.
4. Transcribing the frequencies into Microsoft Excel to format the data producing type-token curves.
5. Comparing the type-token curves of each sample with the others.

Lexical richness investigations are quite diverse and keeping your analyses within a specific scope is a must, otherwise, the whole study will be adrift in the criss-crossed web of statistical perspectives of this phenomenon. Consequently, the book is limited to a corpus - based stylistic analysis utilizing one particular statistical marker related to type-token frequencies. Only six English novels of stream of consciousness are laid out for analysis. The novels are Virginia Woolf's The Waves and To the Lighthouse, William Faulkner's Light in August and The Sound and Fury, and James Joyce's Ulysses and A Portrait of the Artist as a Young Man. The researcherss conducted the analyses by using one particular version of WordSmith Tools ( 0.7 ) and more specifically Wordlist Tool.

The book is supposed to be valuable for researcherss interested in the field of corpus stylistics in general. It might be of a particular significance for researcherss concerned with exploring the statistical behavior and manifestations of certain linguistic markers. It is also valuable for getting postgraduate students into practice with some preliminary applications of corpus linguistics in general and corpus stylistics in particular.

Needless to say, all the weaknesses revealed in this study are due to the researcherss' imperfect knowledge and practice while they tiptoe all over such a rapid growing area of investigation called corpus stylistics.

Chapter Two Stylistics Vs. Corpus Linguistics

This chapter is an attempt to figure out the relation that might hold between stylistics and corpus linguistics. To address this question we need first to sketch out the major domains of stylistics and corpus linguistics. Then one might be in a position to spot any possible bonds between the two domains. Therefore, the researchers would shed some lights upon the major investigation areas in stylistics and corpus linguistics. Consequently the interrelationship between the two could be quite evident in corpus stylistic analysis.

2.1 What is Stylistics all about ?

The interdisciplinary nature of stylistics makes it versatile in drawing upon theories and models from other disciplines within linguistics and beyond. What is even worse is the view that stylistics is a rather advanced stage of practicing linguistics with all its divergent sub-disciplines with a central focus on text analysis. Though Leech and short (2007:11) define stylistics simply as the linguistic study of style, stylistics shares boundaries with descriptions provided by structuralists, generativists, and recently cognitivists (see Jeffries and McIntyre, 2010).

Literary stylistics has, whether implicitly or explicitly, the aim of explaining the relations between language use and artistic function. From the linguist's point of view, the important question is: why does the author choose this form of expression and not the other?

Stylistics provides linguists and critics with rich potentialities of interpretation (Simpson, 2004 : 2). One of the reasons why language is quite crucial to stylisticians is the very fact that the various patterns of structural constructions represent an indispensable reference of the textual function (ibid). Moreover, Jeffries and McIntyre ( 2010:1) point out that stylistics is a sub- discipline of linguistics which deals with the systematic analysis of style in language and how this style can vary according to factors such as genre, context, historical period and author.

It is also within the scope of stylistics interest to study the ways in which meaning is produced through language, mainly but not exclusively, in literary and non-literary texts. Stylisticians use linguistic models, theories, and frameworks as their tools of analysis in order to describe and explain how and why a text works the way it does (Norgaard et al , 2010 : 1). Besides, the aesthetic uses of language, particularly in literature, are quite appealing and tempting to stylistic exploration of the various manifestations of language use.

From these perspectives, one can detect that there is no general agreement among linguists about what stylistics is concerned with. Given the wide variety of definitions and perspectives of stylistics, the term 'style' also has a variety of definitions and explanations a matter that might increase the blurriness of the boundaries within stylistics territory.

Crystal and Davey (1969:9), for example, define style as "a selection of language habits, the occasional linguistic idiosyncrasies which characterize an individual uniqueness". So, this definition gives the importance to language habits which shape the individual style. Furthermore, style according to this definition is considered " occasional linguistic idiosyncrasies " which distinguishes one's style from the others'.

Moreover, The word 'style' is etymologically derived from the Latin word 'stylus' meaning 'Reed', reed is a stick for writing (Ogidefa, 2014: 100). Later ' Stylus ' metamorphosed into style. So, style refers to the manner in which form is executed or the means in which the context is expressed. The trace of this conception of 'style' has grown so fiercely in the recent years and reached unprecedented levels of confluence of divergent fields of research, such as cognitive psychology and even philosophy.

2.1.1 The Need for Stylistics

Stylistics has a strong connection with linguistics since the latter provides theories which help in the process of language interpretation (Simpson, 2004: 3). The claims that stylistics is concerned with literature rather than linguistics is a common point of criticism in the field of stylistics. For example, Sinclair supports this claim by stressing the importance of literary stylistics for studying language (ibid.).

Other stylisticians think that stylistics is concerned with both literary and non-literary texts because many forms of discourse (advertising , journalism, popular music and even casual conversation) often display in stylistics (ibid). The other reason is that stylistic analysis is as much about probing linguistic structures and their functions as it is about figuring out the architecture of literary texts.

Other support of this view could be the fact that contemporary stylistics presents the integrated state of language and literature (Sotirova, 2007:1). Moreover, Burke (2014 :1) states that stylistics, or literary linguistics as it is sometimes called, is the study and analysis of texts; it is in particular, although not exclusively, the study and analysis of literary texts.

The explanatory power of stylistics can help us understand in more depth the ways in which style of texts can influence the perception of readers in ever day situations (Jeffries and McIntyre, 2010: 8). Also, stylistics can get insights from other disciplines such as discourse analysis and pragmatics and this in turn may also provide an account of the implicitly manipulative uses of language (ibid).

The need for stylistics is indispensable because life is negotiated through language and this language is well–described in structural, contextual, and stylistic modes. However, there is still a need for insights about textual meaning that is explained more effectively by a discipline that draws heavily on a systematic investigation of language artistic functions.

2.1.2 Major Areas of Stylistics

Though diverse, the major investigation areas of stylistics might be grouped into three issues (Hussein, 2014: 2). The following section will hopefully sketch out the headlines of these issues. Style as Deviation

The concept of deviation in the study of style is so important because texts can be stylistically distinctive when their language traits deviate from some norm of language. The norm can be absolute (i.e. norm which is related to the language as a whole) , or it can be relative (i.e. norm which is related to some set of texts) (Enkvist, 1985: 40).

The normal use of language contains all kinds of normal expectations and the grammar of a language is a codification of such expectation (Ellis, 1974:160). A style might be simply a deviation from what is normally expected.

The term deviation may refer to a veer from the norms of language as a whole, or it may refer to the norms of literary composition and even those within a particular text (Aquilina, 2014:13). According to Leech (1969 : 56-57) deviation can create an element of interest and surprise. Different types of linguistic deviation are usually distinguished across three levels; realization level (i.e. phonology and Graphology) , grammatical or lexical level and semantic level (ibid).

The concept of deviation can be subdivided into determinate deviation and statistical deviation (Enkvist, 1985:41 ). Statistical deviation deals with the linguistic differences between the domain and the norm that can be quantitatively measured (ibid). As for determinate deviation, it is non-quantitative. The deviation of this type can be observed by what is allowed in the language system and which usually occurs in the texts in general (ibid).

Another type of deviation, mentioned by leech (2007: 44), relies on the observation that some language features within a text may depart from the norm of the text itself. So these features may stand out against a background which the text itself set the ground for. This type is called internal deviation (ibid). Style as Recurrence

Some approaches to style have been based on the assumption that style is considered as a function of frequency, so style in this case can be measured statistically. For example, Bloch and Bolling (1960:40) underscore the distributional aspect of the style throughout the targeted text seeing it as "the message carried by the frequency distributions and transitional probabilities of its linguistic features, especially as they differ from those of some features in the language as a whole ".

To discover what is special about the style of a particular text is to find out the frequencies of the linguistic features it contains then measuring up the figures of these frequencies against what is considered normal frequencies of linguistic features (Leech , 2007:35 ). A distinction should be drawn between parallelism and simple repetition, some linguists regard the latter as a limited case of the former. The repetition of whole phrases or clauses (in terms of both : structure and lexical items ) is a case of parallelism (short, 1996:13-18). Style as Choice

There are many ways of using the repertoire of language system to express the textual representation. In fact, one way might be artistically privileged over another. It is undeniable that choices in style are rather motivated even if unconsciously as long as these choices have an important impact on the way texts are structured and interpreted (Simpson , 2004: 22).

According to Leech and Short (2007: 9) the word style simply refers to "the way in which language is used in a given context, by a given person, for a given purpose and so on". And to make this definition obvious they verify Saussure's distinction of langue and parole. In this case, "langue refers to the code or system of values common to the speaker of a language, while parole is the specific uses of this system or the selection from this system " (ibid).

Other linguists do not go so far from Leech and Short's approach to style. For example, Traugott (1980: 409) stresses that "style refers to patterned choice, whether at the phonological, lexical, syntactic, or pragmatic level". Enkvist et al., (1964:12) investigate style as a potential choice among other alternative expressions.

Every writer makes certain choices from a language system and these choices are supposed to be motivated by certain artistic and aesthetic insights. The factors that make one version more effective as a written communication are not factors of accuracy or clarity but they tend to be bound up with the relationship between the writer and the reader ( Kirkman, 2005: 1).

2.2 Corpus Linguistics : What is it all about ?

Bonelli (2007:1), in her book corpus linguistics at work, looks at corpus linguistics as a "pre – application methodology". It comprises an empirical approach to the description of language use within a contextual functional theory of meaning, and it makes an exceptional use of new technologies.

Meyer (2002:141), however, looks at corpus linguistics as one of the most exciting methodological developments in linguistics since the Chomskyan revolution of 1951. In fact, corpus linguistics makes use of some recent developments in technology to make feasible the kinds of empirical analyses of language that corpus linguistics wishes to undertake.

Corpus linguistics is a source of evidence for improving descriptions of language use and various applications such as the processing of natural language by machines and understanding how to learn or teach a language. It shows how quantitative analysis can be useful for linguistic description (Jockers, 2013:1). Moreover, Horner (1996:11) mentions that corpus linguistics, using computer as a tool , is currently considered as a methodology involving an empiricist approach to language. So, it is clear that corpus linguistics development is related to the drastic development that takes place in the computer-science field. This helps corpus linguistics gain such attribution nowadays (ibid).

2.2.1 Features of Corpus Linguistics :

Corpus linguistics has many potentialities attributed to its distinctive features, the most important ones of these features are the following :

1. It is an empirical (experiment – based ) approach in which patterns of language use observed in real language texts (spoken or written ) are analyzed (Biber et al., 1998 : 9-12).
2. It uses a representative sample of the target language stored in an electronic database (a corpus) as the basis for the analysis (Riegelman, 2005:132).
3. The internet constitutes a major reliable source of the data and methods in corpus linguistics giving linguists a whole universe of ever expanding data mine (Leech, 2007: 133).
4. It draws on both quantitative and qualitative analytical techniques to interpret the findings (Biber et al.,1998 :9-12) .
5. It relies on computer software to count linguistic patterns as part of the analysis (Hornby, 2010:210).

2.2.2 The Goals of Corpus Linguistics

The goals of corpus linguistics, according to kennedy (1998:1), are both wide and pedagogically oriented. Corpus linguistics is not an end in itself but is one source of evidence for improving the description of structure and use of languages, and for various applications including the processing of natural languages by machine. In addition, corpus linguistics could be defined as the empirical study of language which is based on electronic corpora. Actually, it is carried out with the help of computer software, so its results can be used in a range of fields such as computational linguistics or language teaching.

Corpus linguistics becomes an interdisciplinary field, where different approaches have met and reinforced each other. In spite of this, the central goal of corpus linguistics is still to reach a better understanding of the human language (Stein and Quirk, 1991:3). It provides means for the empirical analysis of language. Also, corpus linguistics has had much to offer to other areas by providing better means of investigation. By applying corpus linguistics techniques, other areas of study can gain insight to answer various questions with the aid of computer programs (Evison , 2010 :7).

2.2.3 What is Corpus ?

To discuss corpus linguistics is to figure out what a corpus actually is. Corpus studies are based on collections of texts usually stored and analyzed electronically. They look at the occurrence and reoccurrence of specific linguistic features to show how and where they occur in the discourse (Paltridge , 2012: 156).

A corpus is usually computer – readable and able to be accessed with tools such as concordances. The corpus has usually been designed for the purpose of the analysis (ibid). Indeed, a corpus is a collection of texts based on a set of design criteria, one of them is what the corpus aims to be representative of (Cheng , 2012 : 3).

Moreover, Bennett (2010: 12) looks at corpus as a principled collection of authentic texts stored electronically and can be used to discover information about language that may not have been noticed through intuition alone (ibid). Crystal (2008:117) approaches corpora in a rather more rigorous way seeing it as a "collection of linguistic data, either written or a transcription of recorded speech, which can be used as a starting point of linguistic description or as a means of verifying hypotheses about a language".

2.2.4 Types of Corpora

There are several types of corpora and which type should be used depends on the purpose of the corpus as well as the amount of the data involved (Cock, 2011:109). Mahadi et al. (2010:9) indicate that corpora can be classified based on various perspectives and criteria. For example, they can be classified according to the amount of linguistic information added to them or time span they cover but the most common way to classify corpora is based on their intended purpose or function (ibid). The following is a sketch of the most common types of corpora: General Corpora

General corpus is the widest type of corpora because it is very large , it includes more than 10 million words, and it contains many languages. General corpus tries to give the user an idea about the whole picture of a language as much as possible (Bennett , 2010 :13). Nesi ( 2012: 156), as well, mentions that general corpora deal with spoken or written discourse as well as frequencies of occurrence, and co-occurrence of specific aspects of language in the discourse.

General corpus contains many sub – types such as : The British National Corpus (BNC) and The American National Corpus (ANC). In general corpora the written discourse can be newspaper and magazine articles, works of fiction and nonfiction as well as scholarly Journals (Bennett, 2010: 13). Specialized Corpora

According to Baker ( 2006 : 26 ), specialized corpus is the most important type of corpora and it is used to study aspects of a particular variety or genre of language. Furthermore, it deals with texts of certain types and these texts can be large or small and it is often used to answer very specific research questions (Bennett , 2010 : 13 ).

Specialized corpus is intended to be representative of a particular type of text and it can be used to investigate a specific type of language ( Hunston, 2002: 14). Likewise, it is designed for a particular research project, for example, to study particular specialist genres of language, such as child language, English for academic purpose (Meyer, 2002: 36). Moreover, it might be used to address any particular research question.

Examples of specialized corpus include, The Michigan Corpus of Academic Spoken English (MICASE), The British Academic Spoken English (BASE), The British Academic Written English Corpus (BAWE), and the TOEFL Spoken and Written Academic Language Corpus. (Paltridge, 2012: 157-159). Learner Corpora

Learner corpus is a type of corpora which deals with written texts and spoken transcripts of language used by students who are immediately acquiring the language (Bennett, 2010: 14). It is used to see common errors that students make. The famous type of learner corpus is the International Corpus of Learner of English (ICLE) ( Palacios, and Alonso, 2006 : 109 ).

Learner corpus is related to classroom language and it is a collection of learners' speaking or writing performances while acquiring second language (Mcenery et al , 2006: 65). Moreover, Hunston ( 2008 : 426 ) , stresses that the term learner corpora is opposed to a developmental corpus , which is produced by children acquiring their first language ( L1 ). Pedagogical Corpora

Pedagogical corpus deals with the language used in the classroom setting. It involves academic textbooks, transcripts of classroom interactions and any other transcript whether spoken or written which the learner encounter in any educational setting. It can be applied to make sure that the students are learning actual samples of language. (Bennett, 2010: 14). Opportunistic Corpora

Opportunistic corpus is the result of collecting all the corpora one can lay hand upon ( Teubert, and Cermakova, 2004: 120 Cited in Hussein, 2014 : 61) . Moreover, both special corpus and reference corpus are similar to opportunistic corpus because they neglect size, domain , and genre (Hussein, 2014 : 61). Parallel Corpora

Some linguists consider parallel corpus as a specific type of multilingual corpus in which there is a relationship between texts in different languages (Hussein , 2014: 63). Usually, the texts involved are direct translations of one another (Baker et al, 2006: 119). Sometimes parallel corpus is called translation corpus as Teubert and Cermakova (2004: 122) tend to call it ( Cited in Hussein , 2o14: 63 ). Comparable Corpora

Comparable corpus deals with two or more corpora that can be designated comparable when they are built on the same criteria of design and of similar size ( Bonelli and sinclair, 2006: 212). Actually, comparable corpora are reduced in number at the present but they will be of more interest in the future translation (Arhire, 2011: 9 ). They also require similar text types in terms of genre, length, time stretch, etc ( ibid ). Monitor Corpora

A Corpus can be viewed as static or dynamic language database. Monitor corpora follow the dynamic language model. This is why monitor corpora are sometimes referred to as dynamic. Monitor corpora are used in tracking rapid language change as well as they relatively cover a short span of time (Mcenery, et at., 2006: 15). Furthermore, this type of corpus is designed and built carefully so that it can be a valid representative of the language variety within a particular period of time (Hussein, 2014: 61).

Monitor corpus is constantly changed (e. g. annually, monthly or even daily) and it increases in size, though the length of text types involved in the corpus remain constant. An obvious example of this type is the" Bank of English" (BOE) (Mcenery, et al., 2006: 67). Diachronic / historical corpora

Diachronic or historical corpus deals with texts which are related to the same language but belong to different periods of time (Mcenery, et al., 2006: 67). The period of time covered by this type is far more extensive than one used by Brown /Frown" corpus or a "minor corpus" ( ibid ). This type of corpus is used to show changes that take place in language evolution. A well- known example of this type of corpora in English is the Helsinki Diachronic Corpus of English Texts. (ibid: 65). Virtual /on – line Corpora

Due to the development that occurred in the internet world, corpus linguistics started to show interest in the internet language (Hussein, 2014: 64). Furthermore, on – line corpus offers instances of naturally occurring conversation for authentic language learning and this tend to support the development of more accurate and authoritative models of spoken grammar and conversation interaction ( Pope, 2012 : 15 ).

On - line corpus is user - friendly and easy to handle by enabling technically less competent learners to exploit corpora just like browsing web pages. Furthermore, English foreign language learners can get multiple corpus examples from different corpora (Luo, et al., 2015 : 39 ).

2.2.5 The Use of Computer to Study Language

One of the essential qualities of a corpus is machine- readability. Machine- readability is one of the actual attributes of modern corpora. Electronic corpus has qualities which are not found in paper- based equivalent (Banasiewicz, 2015 : 108 ). The most important advantages offered by computers in language study is the speed of processing and the ease of manipulating data, as well as computerized corpora can be dealt with rapidly at minimal cost ( Mcenery,et al., 2006: 6).

What is more, machine- readable data can be dealt with accurately and consistently by computers (Barnbrook 1996: 11). Computers can avoid mistakes made by manual analysis, thus the result of computers will be more reliable and trustworthy ( ibid ).

Finally, computers allow a lot of processing to be applied on the corpus so that the text corpus will be enriched with metadata and linguistic analysis (Mcener, et al., 2006: 6 ). Computers have an important contribution to the methodological frame of linguistic enquiry ( Bonelli, 2001: 210). Corpus linguistics gives the prominence to the computer because it is considered as the motivated factor in corpus linguistics' techniques. (Leech, 1991:23). The techniques used in corpus linguistics are capable of identifying features of a text that cannot be discovered by mere human observation ( Ho, 2011: 7).

2.2.6 The Corpus-Based Approach VS. The Intuition- Based Approach

In intuition- based approach, researcherss can use linguistically pure examples because invented examples are free from language external influences(Seuren,1998: 260-262). Moreover, intuition based approach should be handled carefully for a bunch of reasons ( ibid ).

First, it is possible to be influenced by one's dialect or sociolect , what seems acceptable to one speaker may be not to (ibid : 261). Second , when producing examples to validate or invalidate the topic in this case the researchers consciously monitoring one's language production. The utterance may not represent typical language use even if one's intuition is correct (Lu, 2016:120). Finally, results based on intuition are difficult to verify ( McEnery et al., 2006 : 6 ).

The corpus- based approach relies upon authentic or real texts which can yield reliable quantitative data (Gelderen, 2005 : 246 ). It can also provide the linguist with improved reliability because it does not reject intuition which is important to the empirical data (Leech, 1991: 14) . In fact, corpus data are based upon the balanced between the use of such data and the use of one's intuition ( ibid ) . Furthermore, corpus - based approach and intuition based approach are complementary and must be so if a range of research questions are to be addressed by linguists ( see McEnery and Wilson, 2001: 19 ; Sinclair, 2003:8).

2.2.7 Corpus Linguistics : A Methodology or an Independent Discipline

Linguists show no agreement about the status of corpus linguistics , some of them consider it as a methodology others consider it as an independent branch of linguistics. For example, Bonelli (2001: 1) argues that corpus linguistics "goes well beyond this methodological role" and becomes an independent discipline with its own theoretical foundations. She (ibid) adds that corpus linguistics " has become a new research enterprise and new philosophical approach to linguistic enquiry". This goes with the view shared by some linguists that corpus linguistics is an independent branch of linguistics.

On the other hand, some linguists hold the view that corpus linguistics is a methodology. For example, McEnery, et al. (2006: 7) argue that "corpus linguistics is indeed a methodology rather than an independent branch of linguistics in the same sense as phonetics, syntax, semantics or pragmatics". They think that these areas of linguistics describe or explain specific areas of language use, while corpus linguistics is not related to a specific area of language investigation. They add that "in spite of the fact that corpus linguistics is a system of methods and principles of how to use corpora in language, yet this theoretical status is not a theory by itself " (ibid).

The attempt of treating corpus linguistics anything than a methodology fails because even the linguists who strongly argue that corpus linguistics is an independent discipline of linguistics use the term 'approach' and 'methodology' in their explanation of corpus linguistics ( ib id : 8). Other linguists, like Lindquist (2009: 1 ), also support this view by saying that corpus linguistics is a methodology, comprising a large number of related methods which can be used by scholars of many different theoretical learnings.

Actually, corpus linguistics can be considered as a methodology in spite of the fact that it is a complex system of methods and principles. It can prove researchers of different interests with rigorous methods and procedures regarding computerized linguistic data.

2.2.8 Corpus - Based and Corpus-Driven Approaches

The differences between corpus – based and corpus – driven approaches are tackled quite often in most of the introductions to corpus linguistics. Sometimes the relationship that holds between corpus-based works and linguistic theory might be expressed in terms of corpus – based and corpus – driven. On the one side, corpus – based linguists "adopt a confident stand with respect to the relationship between theory and data in that they bring with them models of language and description which they believe to be fundamentally adequate … " ( Bonelli , 2001 : 66 ). So, according to the corpus – based approach, corpus data can be used as modifications or adjustments of a theory , as well as it can be used as quantitative evidence. ( Mahlberg , 2005 : 18 ).

Mcenery and Hardie ( 2012 : 6 ), on the other hand, reveal that corpus – based studies use corpus data to discover a theory or hypotheses in the current literature , so that they can be validated , refuted or refined . This approach, corpus – based, according to them treats corpus linguistics as a method. A useful distinction has been made by Baker et. al ( 2006 : 54 ) between corpus – based and corpus – driven. Corpus – based uses a corpus as a source of examples to examine the frequency and the plausibility , or to check the researchers intuition , of the language contained within a smaller data set ( ibid).

Corpus – driven approach is more an inductive process in which the corpus is the data and the patterns in it are noted as ways of expressing regularities.(ibid). Corpus – driven approach does not treat corpus linguistics as a method but the corpus itself should be the sole source of our hypotheses about language (Mcenery and Hardie , 2012 : 6 ).

Moreover, Mahlberg ( 2005 : 18 ) adds that in corpus – driven approach the evidence provided by the corpus leads to the theoretical statement. The methodological path according to Bonelli ( 2007 : 85 ) can be described as "observation leads to the unification in the theoretical statement " .

Nevertheless, Corpus–based and corpus–driven approaches share some similarities. They both share the same underlying characteristic of conducting an empirical analysis. Both of them are based on corpora , and computer software is used to make qualitative data, i. e. functional interpretations of quantitative patterns ( Tabbert , 2015 : 57 ).

2.3 Corpus Stylistics

Nogaard, et al. ( 2010: 9), mention that Corpus Stylistics is the combination of applying corpus linguistics and stylistics or it is the use of modern methods of corpus linguistics to investigate literary and non- literary texts. Corpus stylistics is considerably considered as the offshoot of stylistics ( ibid ).

Recently an increasing popular discipline of linguistics emerged which is called corpus - stylistics. Its methods and approaches are used to study language patterns in literature, in particular, and all other types of texts ( Ho, 2011 : 5). Furthermore, Mahlberg (2013:5), one of the prominent figures in corpus stylistics, thinks that corpus - stylistics concentrates on the application of corpus methods in order to analyse texts by combining linguistic descriptions with literary appreciations so corpus stylistics is based on the combination of tow disciplines , literary stylistics and corpus linguistics.

Furthermore, Starcke (2010:1) defines corpus stylistics in the same way but with little addition. Corpus stylistics is the cooperation of corpus linguistics and stylistics (ibid). It is the linguistic analysis of literary texts stored electronically. What is more, Biber, et al.,( 1998:45 ) add that corpus stylistics is the study that deals with literary texts by employing corpus – linguistic methods in order to support the analysis of the texts. (ibid : 180 ).


Excerpt out of 90 pages


A Corpus-Driven Approach to Stylistic Analysis of a Lexical Richness Curve
An Analysis of Six English Novels
Thi-Qar University  (College of Arts)
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
826 KB
Corpus Stylistics, Lexical Richness, Type- Token Curve
Quote paper
Khalid Shakir Hussein (Author)Ali Hussein Abdul-Ameer (Co-author), 2017, A Corpus-Driven Approach to Stylistic Analysis of a Lexical Richness Curve, Munich, GRIN Verlag, https://www.grin.com/document/353181


  • No comments yet.
Read the ebook
Title: A Corpus-Driven Approach to Stylistic Analysis of a Lexical Richness Curve

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free