Translation Assessment and Lexical Loss. A Corpus-Based Approach

A measure of Lexical Drain in Arabic-English translation using Type/Token Ratio and Guiraud’s Index

Scientific Study, 2019

208 Pages




List of Tables

List of Figures

List of Abbreviations


Chapter One
The Basic Argument, Aims and Values

Chapter Two
Corpus Linguistics, Stylistics & Translation
2.1 Corpus Linguistics (CL): What is it?
2.1.1 What Does 'Corpus' Mean?.
2.1.2 The Aims of Corpus Linguistics.
2.1.3 Characteristics of Corpus Linguistics.
2.1.4 Corpora Typology. General Corpora. Specialized Corpora. Learner Corpora. Pedagogical Corpora. Opportunistic Corpora. Monitor Corpora. Parallel Corpora. Comparable Corpora. Virtual \ On-Line Corpora. Diachronic \ Historical Corpora.
2.1.5 The Potentialities of Corpus-Based Methods. Concordances. Frequency Lists. Keyword Lists. Collocate Lists. Dispersion Plots.
2.2 Stylistics and Style: Areas of Interest
2.2.1 Stylistics as an Advanced Level of Practicing Linguistics.
2.2.2 The Need of Stylistics in Analyzing Texts.
2.2.3 Stylistics and Aspects of Style. Style as Choice. Style as Deviation. Style as Recurrence.
2.2.4 Corpus Stylistics. The Aims of Corpus Stylistics. Characteristics of Corpus Stylistics. The Scope of Corpus Stylistics. Corpus Linguistic Circle: Linguistic Description & Literary Appreciation
2.3 Translation: What is it?
2.3.1 Theories of Translation. Philological Theories of Translation. Linguistic Theories of Translation. Sociolinguistic Theories of Translation.
2.3.2 Models of Translation. The Grammatical Model The Cultural Model The Interpretive Model The Text Typological Model
2.3.3 Translation Assessment of the Models.
2.3.4 Style and Translation. The Style of Translation. The Translation of Style.
2.3.5 Stylistic Habits and Rhetorical Choices. Literary Relevance of Stylistic Habits. Stylistic Habits and Rhetorical Choices in Translation

Chapter Three Lexical Loss as a Stylistic Marker
3.1 The Concept of Translation Loss
3.2 Lexical Translation Loss
3.3 Lexical Diversity Measures
3.3.1 TTR.
3.3.2 GI
3.4 Measuring Style in Translation
3.5 Technical Difficulties in Measuring Style
3.6 Literature Review: Some Issues in Translating the Glorious Qur'an into English
3.6.1 Lexical Issues.
3.6.2 Syntactic Issues.
3.6.3 Semantic Issues. Metaphorical Issues. Polysemic Issues. Metonymic Issues.

Chapter Four Corpus Design & Methodology
4.1 Accountability in Designing a Corpus
4.2 Designing the Corpus
4.3 Issues in Corpus Design
4.3.1 Static vs. Dynamic.
4.3.2 Representativeness & Balance.
4.3.3 Size.
4.3.4 Morphological Typology.
4.4 Sampling Methodology
4.5 Corpus Properties
4.6 Corpus Data
4.7 Methods Used in Corpus Analysis
4.7.1 WordSmith Tools.
4.7.2 Microsoft Office Excel
4.7.3 WordCounter Tools.
4.7.4 Farasa Tools.
4.8 Corpus Analysis Procedures

Chapter Five Analysis & Results
5.1 TTR Analysis
5.2 GI Analysis
5.3 TTR and GI Analyses of the Arabic Corpus
5.3.1 TTR Analysis of the Arabic Corpus (before and after using Farasa)
5.3.2 GI Analysis of the Arabic Corpus (before and after using Farasa)
5.4 TTR and GI Analyses of the English Corpus
5.4.1 TTR Analysis of the English Corpus.
5.4.2 GI Analysis of the English Corpus.
5.5 The Results

Chapter Six
Concluding Remarks



List of Tables

Table (1) Corpora Description of the Source Text (the Glorious Qur’an and its Target Texts (Three English Translations)

Table (2) The Statistical Description of the Arabic Corpus (Before and After Using Farasa Tools).

Table (3) Tokens, Types and TTRs of (16) Textual Samples of the Glorious Qur’an (Before Using Farasa Tools)

Table (4) Tokens, Types and TTRs of (27) Textual Samples of the Glorious Qur’an (After Using Farasa Tools)

Table (5) Types, √ Tokens and GIs of (16) Textual Samples of the Glorious Qur’an (Before Using Farasa).

Table (6) Types, √ Tokens and GIs of (27) Textual Samples of the Glorious Qur’an (After Using Farasa)

Table (7) Tokens, Types and TTRs of (31) Textual Samples of Muhammad Ali’s Translation.

Table (8) Tokens, Types and TTRs of (31) Textual Samples of Pickthall’s Translation.

Table (9) Tokens, Types and TTRs of (34) Textual Samples of Yusuf Ali’s Translation…

Table (10) Types, Tokens, √ Tokens and GIs of (31) Textual Samples of Muhammad Ali’s Translation…

Table (11) Types, Tokens, √ Tokens and GIs of (31) Textual Samples of Pickthall’s Translation

Table (12) Types, Tokens, √ Tokens and GIs of (34) Textual Samples of Yusuf Ali’s Translation

List of Figures

Figure (1) The Philological Circle.

Figure (2) The Corpus Stylistic Circle

Figure (3) The Mathematical Calculation of Guiraud's Index.

Figure (4) A Sequence of How a Corpus Works

Figure (5) Segmenting the Arabic Word (waja’alakum) into its Basic Constituents.

Figure (6) The Order of Type/Token Ratio (TTR) of the whole Textual Samples of the Arabic Corpus (Before and After Using Farasa)

Figure (7) The Order of Guiraud’s Index (GI) of the Whole Textual Samples of the Arabic Corpus (Before and After Using Farasa)

Figure (8) The Order of Type/Token Ratio (TTR) of (96) Textual Samples of the English Corpus

Figure (9) The Order of Guiraud’s Index (GI) of (96) Textual Samples of the English Corpus.

Figure (10) The Order of Type/Token Ratio (TTR) of Arabic (Before Using Farasa) & English Corpora.

Figure (11) The Order of Guiraud’s Index (GI) of Arabic (Before Using Farasa) & English Corpora.

Figure (12) The Order of Type/Token Ratio (TTR) of Arabic (After Using Farasa) & English Corpora

Figure (13) The Order of Guiraud’s Index (GI) of Arabic (After Using Farasa) & English Corpora.

List of Abbreviations

Abbildung in dieser Leseprobe nicht enthalten


Over the past 30 years corpus-based investigations have witnessed unprecedented familiarity among the circles and trends of modern linguistic researchers. The research potentialities are limitless and rather confusing. The applications extremely proliferate every day ranging from language teaching, forensic linguistics, historical studies, psycholinguistics, cross-cultural studies to translational studies. This book chases one particular potentiality of corpus-based studies related to the process of translation within a sort of assessment context.

The challenges are quite many starting with the scarcity of serious corpus-based studies of Arabic and ending with the complex morphological variations between Arabic and English and the way these variations should be addressed within the context of a corpus-based assessment of the translation as a process. Knowing the criss cross nature of translation assessment, the researchers focused on one specific feature to be traced in the source text and the target text(s). This feature scores the rates of lexical loss in both ST and TT to reach a sort of calculation that might measure the size of the gap between the lexicon sizes of both texts.

With the aid of corpus linguistics and stylistics in respect to their meeting area corpus stylistics, the researchers measure the lexicons of three English translations (those of Yusuf, Pickthall and Muhammad) which have shown different degrees of lexical loss in comparison with the original Arabic Quranic text. These degrees go hand in hand with the size of the linguistic repertoire each translator utilizes in his translation to such an extent that it sounds rather promising to regard lexical loss measure as a trustworthy stylistic marker. That is, each translator has his own distinctive rate of lexical loss that might be an idiosyncratic marker of his translational style.

Though loss might not be limited only to one linguistic level (i.e., lexical level) at the expense of other levels, these translations have been characterized by various degrees in terms of the limits of the size of vocabulary each translation holds. After an extensive verification of data reliability, both Arabic and English corpora are segmented into (123) samples. The whole Arabic corpus is distributed into (27) samples. As for the English corpus, it is totally tokenized into (96) samples: Yusuf’s translation is distributed into (34) samples, Pickthall’s translation (31) and Muhammad’s translation (31). Each sample approximately holds around (5,000) tokens.

Accordingly, the use of Farasa Tools, specifically segmentation module, helps in the morphological analysis of the Arabic corpus. It is so valuable to neutralize the morphological differences between Arabic (a synthetic language) and English (an analytical language). Hence, equalizing the two corpora can be done only by using Farasa tool, a matter that TTR (Type/Token Ratio) and GI (Guiraud’s Index) analyses can produce more reliable results than what can be produced without working on such tools. Besides, TTR analysis and GI analysis are verified quantitatively. The former is done by using the user-friendly software WordSmith Tools (4.0), specifically Wordlist tool. As for the latter, it is done by using Excel, specifically SQRT equation. To this end, the results taken from such programs are plotted graphically using Microsoft Excel Spreadsheets.

The difference, as far as the lexical loss of the entire target texts is concerned, gives an actual indication of where lexical losses do occur among the three translations in comparison with their original Quranic text. Finally, an assessment of the most accurate and reliable English translation of the Glorious Qur’an can be attained depending on the difference between the size of the Arabic Quranic lexicon and those of the target texts. It is therefore found that: Yusuf's translation ranked in the first place due to the least degree of lexical loss it showed –this makes it relatively the most accurate and reliable translation to the Glorious Qur'an; Pickthall's centered in the second place in terms of lexical loss followed by Muhammad's, in the third place, which revealed the highest degree of lexical loss.

Needless to say, the researchers are most definitely responsible solely for any pitfalls, misunderstandings, and mistakes which they would like the readers to make every possible effort to let the researchers know about them.

Khalid S. Hussein

Abdul-Haq A. Al-Sahlani

Dhi-Qar University

April 2019

Chapter One


The Basic Argument, Aims and Values

Drawing on corpus-based techniques and stylistic analysis, this book argues that Lexical Translation Loss can be measured quantitatively. Loss in lexicon is quite inevitable due to the obscurity and fuzziness in the boundaries between the source language (SL) and the target language (TL). However, translation, as a process, aims at communicating a meaning with its original tone and intent from (SL) to (TL). The chances of communicating a specific message might not be carried across the TL thoroughly. Certain elements might be lost when a translator passes on a particular text from SL into TL. Such a process is a wild goose chased with an ambition of getting around the lexical gap that occurs between SL and TL. Nevertheless, the cultural and regional differences of both SL and TL should be taken into consideration in the process of translation.

Essentially, the basic problem caused by this gap is the number of challenges the translator faces to minimize the loss of translation in his/her target text. This loss might have various types, one of them is ‘lexical loss’, which constitutes the area of interest in the present study. So it is the difference in the size of lexical diversity that triggers an unavoidable lexical loss in translating an ST into TT. Most importantly, the size of lexical diversity scored in the source text should be taken into account as much as possible by a faithful translator.

Furthermore, corpus stylistics focuses on applying corpus techniques to analyze literary and non-literary texts by gathering linguistic descriptions with literary appreciations. That is to say ‘corpus stylistics’ is based on an incorporation of two disciplines: stylistics and corpus linguistics. The domain corpus linguistics deals with is a rather new domain of study. The studies conducted with the help of this domain are few in the Iraqi universities. Therefore, the present study focuses on using corpus-based methods to study certain aspects of the lexical loss that occurs within the process of translation. Thus, the problem the researchers aim to address might be formed in terms of the following four basic questions:

1. What are the indices of measuring ‘lexical loss in translation’?
2. How reliable are these indices?
3. How do they work exactly?
4. Can the lexical loss measurement be used as an index of translation assessment in regard to the amount of difference between the size of the Source Text lexicon and that of the Target Text ?

These are the major questions the present study attempts to explore and finds answers for. Nevertheless, the basic argument in this book goes like this: the narrower the lexical loss between the source text and the target text, the more faithful and accurate the translation.

As for the aims of writing this book, the study tries to investigate the utility of the type/token ratio (TTR) and Guiraud’s Index (GI) in measuring lexical diversity in the translation process, and finding about their statistical reliability in giving rigorous accounts of lexical loss measurements. With a particular focus on Arabic and English the book goes through an analysis and description of lexical loss occurred in three English translations of the Arabic Holy Quran.

The book handles as well a sort of comparison between the TTR and GI values for both the Glorious Qur’an ( or the Source Text) and its three English versions (or the Target Texts) to figure out the size of the lexical loss occurred in the process of translation. The whole study takes its direction from the new potentialities of "corpus stylistics" in verifying the accuracy and faithfulness of translation process taking into consideration the basic techniques and concepts of modern corpus linguistics.

The book is hopefully useful for researchers whose basic aims fall within corpus stylistic studies and their application potentialities. Therefore, such a book is valuable in the sense that a corpus-based stylistic analysis offers a large amount of information for researchers who practically focus their attention on corpus linguistics, corpus stylistics and translation regarding lexical diversity in general and lexical loss in particular.

Chapter Two

Corpus Linguistics, Stylistics & Translation

The present chapter is an attempt to figure out the relationship that governs three basic areas of investigation: Corpus Linguistics (CL), Stylistics and Translation. To know what that relationship is, the researcher within the linguistic range needs first to dig out the major areas of interest of these three fields. Then, s/he will be in a position to gain a perfect and understandable view about the first two fields with respect to their meeting area Corpus Stylistics (CS) and about the term Translation with respect to the concept of Style, which is used in translating a language into another.

2.1 Corpus Linguistics (CL): What is it?

Corpus Linguistics (henceforth CL) is defined as "the study of language based on examples of 'real life' language use" (McEnery & Wilson, 2001: 1), it employs a computer-based analysis of huge quantities of data. This means that CL is ''multilingual'' containing many languages with their varieties which can be studied by using corpus data (ibid). According to Hunston (2002: 2), linguists have always used the word Corpus to describe a set of natural examples of language containing anything from short sentences to a large set of written texts or even sound recordings, which have been gathered for linguistic research.

However, McEnery & Wilson (2001: 206) are interested in classifying a corpus by a rigorous scheme of classification characterized by its inherent features. Accordingly, a corpus might refer to:

(i) (loosely) any body of text; (ii) (most commonly) a body of machine-readable text; (iii) (more strictly) a finite collection of machine-readable text, sampled to be maximally representative of a language or variety (ibid).

Likewise, Francis (1982: 7) points out that the word 'Corpus' should be used in terms of a large collection of texts supposed to be 'representative' of a given language, dialect, or other sub-branch of a language, just for the sake of a linguistic analysis.

Kennedy (1998: 1) says that CL is so much like an evidential source for making improvements to describe language use and different kinds of applications such as the natural language processed by machines and the process of learning or teaching a particular language. This explains how quantitative analysis can be useful for linguistic descriptions (ibid).

As a methodology, CL is seen as a "pre-application methodology'' (Tognini-Bonelli, 2001: 1). This shows that CL includes an empirical approach to describe language use within which a contextual functional theory of meaning takes place (ibid: 87). Meyer (2004: 141), however, sees CL as one of the most important methodological developments in linguistics. That is to say, CL makes use of new ways or technologies so that the kinds of empirical analyses of a language can workably do their tasks (ibid).

Moreover, CL development deals with other developments taking place in the computer field as a tool (Hornero et al., 2008: 11). This again leads CL to be a methodology which has an empiricist approach to probe into language—it is ''the ability of the computer to carry out the processes of searching, retrieving and calculating linguistic data [which] allows the exploitation of corpora on a large scale with speed and accuracy'' (ibid). Inductively, CL is a research approach that simplifies empirical investigations of language use as well as language variation leading to research findings that have greater validity and generalizability than would be feasible (Biber & Reppen, 2015: 1).

Besides, from a different view, Stubbs (1993: 23-24) argues that "a corpus is not merely a tool of linguistic analysis but an important concept in linguistic theory". Also, Teubert (2005:2) describes CL as "a theoretical approach to the study of language''. Thus, most of the scholars are torn out between these two perspectives of CL: as a methodology and as a self-sufficient discipline (Biber & Reppen, 2015: 2).

2.1.1 What Does 'Corpus' Mean?

Linguistically speaking, Baker et al. (2006: 48) show that the word corpus is descended from Latin. It is used to refer to ''a body of language" or a pack of texts. These texts are stocked whether in electronic format or in normal papers (ibid).

However, it is important to deal with what a corpus actually means so that CL can be more understandable. In this case, many corpus-based studies are based on groups or series of texts. These texts are in fact designed and analyzed electronically. Nevertheless, corpus studies look at the occurrence and reoccurrence of certain aspects of linguistic patterns in order to make it clear that how and where they take place in the discourse (Paltridge, 2012: 144).

Cheng (2012: 3) emphasizes that a corpus is based on a collection of texts which are based, in turn, on a pack of design criteria. One of these criteria is that the corpus must be exemplary (representative) of the inquired question (ibid). Thus, a corpus is usually computerized data that can be readable and able to be investigated with certain tools such as concordances, keyword lists, collocates, etc.. That is to say, the corpus must have been designed to catch a particular purpose of an analysis (Paltridge, 2012: 144).

Moreover, Crystal (2008: 117) gives his own point of view to define corpus as a ''collection of linguistic data, either written or a transcription of recorded speech'', and these data can be used linguistically as a ''starting point of linguistic description or as a means of verifying hypotheses about a language'' (ibid). As a result, Bennett (2010: 12) mentions that corpus looks like a principle which consists of authentic texts packed electronically and can be used to reveal information about a particular language that may not have been observed through anticipation ( intuition) alone. With regard to what has just been mentioned above, a corpus maybe divided into general or specific (Desagulier, 2017: 51). A general corpus aims for representing a language as a whole; a specific corpus deals only with a specific variety or sweep of time (ibid).

2.1.2 The Aims of Corpus Linguistics

The aims of CL vary in different ways; the most important ones are the following:

1. According to McCarthy & O'Keeffe (2010: 7), the aims of CL and its techniques can be applied to gain insight within other domains of study in order to answer different questions by the help of computer programs;
2. With regard to some linguists, CL is no more than a new computer-aided methodology within which a linguistic method appears on the scene (Aarts, 2001:7; Leech, 1991: 105; & Kennedy, 1998: 268);
3. Teubert (2001: 125) mentions that CL is a sub-discipline in its own way. Later on, Teubert & Cermakova (2004: 97) emphasize that CL aims to be a discipline in its own which suggests a different way in looking at language people have not been familiar with before. In this respect, CL deals with language as a social phenomenon (ibid);
4. Aijmer & Altenberg (1991:3) say that the basic goal of CL is to arrive at a better comprehension of the human language. Where different methods have met and reinforced each other, CL tends to be an interdisciplinary domain on its own (ibid), and
5. Matthews (1997: 78, cited in Ahmad , 2008: 60) mentions that the aim of CL relies heavily on the basis of linguistic calculations taken from organized recordings of real conversations. Therefore, a corpus is identified as a systematic representation of speech or writing gathered from a particular language or variety of a language (ibid).

2.1.3 Characteristics of Corpus Linguistics

Linguists have suggested many worthwhile features for the study of CL. The most important of which are as follows:

1. Leech (2007: 133) states that ''the internet provides a virtually boundless resource for the methods of corpus linguistics''. This makes ''corpus linguists appear to inhabit an expanding universe'' (ibid).
2. According to Biber et al. (1998: 9), CL is an empirical approach (experiment-based). That is to say, symbols of language use monitored in real language texts (written or spoken) are taken into pieces to be analysed. More clearly, ''comprehensive studies of use cannot rely on intuition, anecdotal evidence, or small samples; they rather require empirical analysis of large databases of authentic texts'' (ibid).
3. It is used as a method in 'Translation Studies' (TS) to investigate translations (Biel, 2018: 25). This leads to the idea that corpus consists of a large amount of data formed electronically, which can be examined via using a suitable software (ibid: 26). In this regard, She (ibid) mentions that CL is ''empirical, inductive and quantitative; its main advantage is the potential to reduce speculation and verify research hypotheses systematically on the basis of more extensive data''.
4. According to House (2011: 206), statistics and quantitative data should represent ''a starting point for continuing (re)contextualized qualitative work'' and not ''an end in itself''. This means that the object of corpus translation studies is to understand the translation and not to add an explanation (ibid). Accordingly, corpora are perfectly used in translation studies as a quantitative-based qualitative approach (Biel, 2018: 26).
5. CL extensively depends on the use of computer programs used for analysing linguistic patterns of texts (Biber et al., 1998: 4 & Conklin et al., 2018: 161). Accordingly, using an advanced software (e.g., concordancers), corpora should consist of electronic texts so that they can be used to carry out automatic searches and capture different views concerning natural language regularities (Szudarski, 2018:4).

2.1.4 Corpora Typology

Many kinds of corpora can be viewed in applied linguistics. Some of them nowadays have a numerous range of uses. According to Hunston (2002: 14), the kind that should be used depends on the intent of the corpus itself. Hyland (2011: 102), in this regard, mentions that a corpus is ''always representative of itself '' which counts on ''the date a corpus was compiled and the purpose it was intended to serve'' (ibid).

However, appraising the results of the translation process is highly depended on using corpora (Rodriguez-ines, 2017: 265). In this respect, there are two major types of corpora (parallel corpora & comparable corpora), which are used in the field of corpus-based translation studies (ibid). The former is composed of the source texts (ST) in a particular language with their translations or target texts (TT) in stratified way (ibid). On the other hand, comparable corpora include either original and translated texts in a single language that share certain features, or original texts in two or more languages chosen due to certain criteria, such as text type or topic (ibid).

In addition, corpora can be classified due to the rate of linguistic information added to them or the time period they belong to, but the most common way of classifying corpora is deliberately built on a calculated function (Mahadi et al., 2010: 9). General Corpora

Bondi (2017: 48) states that general corpora accord with written discourse rather than the spoken one. This is because of their ''greater availability of written materials'' (ibid). The general corpus is the spacious type of corpora because it is very bulky containing more than 10 million words and including many languages (Bennett, 2010: 13).

Many sub-types, such as The British National Corpus (BNC) & The American National Corpus (ANC), might be included within general corpora (ibid). The written discourse within such corpora can be magazines, articles, newspapers, works of fiction and non-fiction, and scholarly Journals (ibid). Furthermore, general corpora typically avail as a base for a complete description of a language or language multiplicity (Chitez, 2017: 15). Specialized Corpora

A specialized corpus is designed for a specific research enterprise, for example, it studies special genres of English language for the sake of academic purpose, such as ''class discussions'' (Meyer, 2004: 36). Yet, it is meant to be representative of a specific kind of texts. It can be utilized to probe a particular kind of language (Hunston, 2002: 14).

However, Baker (2006: 26) states that specialized corpus is the most significant kind of corpora. It is exercised to study aspects of a specific variety or genre of a particular language. Nonetheless, it is concerned with texts of certain kinds because such texts might be large or small (Bennett, 2010: 13). Eventually, it is used to answer very particular research questions (ibid).

Picton & Pascaline (2017), in their article Diastatic variation in language for specific purposes, work on two specialized corpora. The first corpus is concerned with the domain of nuclear medicine, which has articles written by medical experts as well as selections from machine operator forums (ibid: 57). The second is based on websites, documents, and texts created for academic staff in the domain of higher education. As a result, they have found that the diastratic variation is quite evident throughout any particular field of professional activity and this variation cannot be overlooked by any serious terminology (ibid).

Finally, many examples might be observed concerning such a kind of corpora. Some of them are: The British Academic Spoken English (BASE), The British Academic Written English Corpus (BAWE), The Michigan Corpus of Academic Spoken English (MICASE), and the TOEFL Spoken and Written Academic Language Corpus (Paltridge, 2012: 157-159). Learner Corpora

The learner corpus is defined, by Bennett (2010:14), as a kind of corpora which deals with spoken transcripts of language produced by learners acquiring language in an immediate environment and with the written texts as well. Palacios & Alonso (2006: 750) mention that learner corpus is used to investigate errors that students commonly make. As an example, the International Corpus of Learner of English (ICLE) is the famous kind of such a corpus (ibid).

While acquiring a second language (L2), the learner corpus is engaged with classroom language as well as performances concerned with writing or with a group of learners' speaking (McEnery et al., 2006: 65). Furthermore, Hunston (2008: 426-427) emphasizes that there is another kind of corpora called developmental corpus. Such a kind is considered an opposed to a learner corpus, which is concerned with children acquiring their first language (L1). Nevertheless, detecting both overt and covert errors, learner corpora enable the investigation of such errors, errors that ''have hitherto remained underresearched'' (Thewissen, 2015: 73). Pedagogical Corpora

Some linguists recently stress the probable tie of CL to language teaching and learning in all its shapes and uses (Boulton & Landure, 2016; Bertels, 2017, cited in Zeroual & Lakhouaja, 2018:614). However, within a pedagogical term, they (ibid) theorize that CL should be examined for an educational use to impart a language. Bennett (2010: 14) mentions that pedagogical corpus can be used for making sure that the learners are actually learning actual language samples.

Accordingly, ''corpora have been successfully used as a reference resource by both advanced learners majoring in the language as well as learners with lower levels of proficiency needing language for specific purposes'' (Zeroual & Lakhouaja, 2018: 614). As a result, a corpus with a pedagogical aim concerns itself with the language applied in the classroom settings (Bennett, 2010: 14). Thus, pedagogical corpus includes academic curricula, transcripts of interactions inside the classroom and any other transcripts whether written or spoken within an educational context (ibid). Opportunistic Corpora

An opportunistic (or cannibalistic) corpus is based on the claim that every corpus is lopsided (Teubert & Cermakova, 2004: 120). It can be defined as ''the result of collecting all the corpora one can lay hands upon" (ibid). However, two kinds of corpora (special corpus and reference corpus) are identical to the concerned type (the opportunistic one) because they ignore genre, domain, and size (Hussein, 2014a: 61). Thus, such a kind of corpora is ''principally open-ended'' (Teubert & Cermakova, 2004: 121). Furthermore, Otlogetswe (2011: 90) believes that such a type might be ideal in Natural Language Processing (NLP). But, lexicographically, it is desirable with broad coverage (ibid). Monitor Corpora

A corpus should be seen either statically or dynamically within language model. The static view is normally utilized for a sample corpus; while the dynamic view is utilized for a monitor corpus (Xiao, 2010: 150). The monitor corpus, however, is ''primarily designed to track changes from different periods'' (Hunston, 2002: 16). This means that it is beneficial to track language alternation, such as the cyclic life as well as the developmental side of neologisms (Xiao, 2010: 150).

However, monitor corpora are permanently (e.g., daily, monthly, or even annually) completed with re-creative material and keep increasing in size (ibid), such as the Bank of English (BoE) which has around (524) million words presently (Hunston, 2002: 15). Counter to the sample corpus, it has been designed to perform a static snapshot of a specific language variety at a specific point of time (Xiao, 2010: 150). Nevertheless, Desagulier (2017: 51) also mentions that ''a dynamic corpus is also known as a monitor corpus, i.e. a corpus designed to keep track of current changes in a language''. Parallel Corpora

Hussein (2014a: 63) mentions that some linguists glance at parallel corpus as a particular kind of multilingual corpus ─ that is to say there is an enumerated relationship occurs among texts within different languages (ibid). However, ''parallel corpora are used as data basis for multilingual grammar induction, automatic lexicography and many other tasks in information extraction and language processing across different languages'' (Neumann et al., 2017: 1).

Nevertheless, such a kind is known as translation corpus (Teubert & Cermakova, 2004: 122). Concerning translation studies, the minuteness is verified by defining features that discriminate translations from pioneer texts (Neumann et al., 2017: 1). Thus, parallel corpora can be used in different ways for the usefulness of translation studies, including machine translation, computational linguistics, or frugally the human translator (ibid). Comparable Corpora

Tognini-Bonelli (2010: 21) states that comparable corpora deal with two or more corpora that can be classified comparable when they are built on the equal standards of size and design. In fact, such a kind of corpora is minimized in number for the time being but in future it will be of more concern in translational researches. This is because they offer a chance in comparing an original language with a language of a particular translation (Arhire, 2011: 9).

However, comparable corpora are used in Statistical Machine Translation (SMT) as pivotal resources to train models of translation (Sellami et al., 2017: 659). Thus, they require identical text types with regard to length, genre, time span, etc. (Arhire, 2011: 9). Such a kind of corpora is widely available for general-domain, such as society, art, and media (Sellami et al., 2017: 659). Virtual \ On-Line Corpora

More recently, language corpora have begun to be freely ready-made or available online to: the normal browsers of internet, the learners of a particular language and comparatively beginner students (Anderson & Corbett, 2017: 2). In this regard, CL aims to be involved with the language of internet (Hussein, 2014a: 64).

However, the availability of online corpora has turned the job of linguists, especially those whose interest is in the meanings of patterns of phrases and words (Anderson & Corbett, 2017: 2). Nowadays, enormous bodies of data are existed at the click of a mouse and the press of a keyboard (ibid). At the beginning of the century, Hockey (2000: V) points out that ''The World Wide Web" (WWW) is a means for searching material but not a tool for analyzing that material. Ever after, Anderson & Corbett (2017: 2) note that:

The resources available on the web have improved to the extent that students of language and linguistics can make considerable inroads into linguistic study solely by using freely available corpora, provided that they know where to look, have an appreciation of a few basic notions and know how to maximise the potential of different language corpora, with all their idiosyncrasies. So, as written and spoken corpora become available to ever-wider network of potential users, guidance is needed in the use of them to explore aspects of language.

Furthermore, within ''naturally-occurring conversations'' (see Jones, 2017: 161), the virtual/on-line corpora give examples for authentic language learning leading to support authoritative and accurate models of conversational interaction and of spoken grammar (Pope, 2012: 13). In addition to this, such corpora help English Foreign Learners (EFL) to get multiple corpus instances from various kinds of corpora (Luo et al., 2015: 39). Diachronic \ Historical Corpora

According to McEnery et al. (2006: 65), a ''diachronic (or historical) corpus contains texts from the same language gathered from different time periods". Such a type of corpora is used to track down changes occurred in language evaluation (ibid). Thus, the Helsinki Diachronic Corpus of English Texts is the well-known example of such a type in English language (ibid).

However, diachronic corpus cannot always be representative, that is either when the written language is involved (Conde-Silvestre, 2014: 191). The very limited groups of population, who can write and read in particular periods of history, indicate that vivid written texts only list the language of a highly small division of society (ibid). For functional reasons, diachronic corpus comprises only written language. Yet, corpus designers have tried to build spoken corpora from previous periods, such as the Corpus of English Dialogues (McEnery et al., 2006: 65).

2.1.5 The Potentialities of Corpus-Based Methods

CL has generated a number of research potentialities and methods, which try moving from data to theory. This makes linguists showing no pact about the estate of CL. Some of them, however, consider it an interdisciplinary branch of linguistics, others regard it a methodology. In this regard, Tognini-Bonelli (2001: 1) debates that CL ''goes well beyond this purely methodological role'' and becomes an interdisciplinary field with its own theoretical establishments. Then, she (ibid) says that CL ''has become a new research enterprise and new philosophical approach to linguistic enquiry'' (ibid). This supports the view that CL is an interdisciplinary domain within the broad area of applied linguistics (ibid).

However, some other linguists catch the view that CL is considered as a methodology. For instance, McEnery et al. (2006: 7) debate that CL is not an autonomous branch of linguistics like semantics or pragmatics but is a methodology and a system of principles or methods of how to utilize corpora in language, and this theoretical estate is not a theory by itself (ibid: 7-8).

Likewise, Lindquist (2009: 1) mentions that CL is a methodology consisting of an enormous number of related methods which can be used by many scholars of many various theoretical staff. In fact, the attempt of considering CL anything other than a methodology goes down because even those linguists, who emphatically debate that CL is an interdisciplinary field of linguistics, use in their exploration of a particular corpus the terms methodology and approach (McEnery et al., 2006: 7).

Thus, whether theoretical or practical, all approaches to investigate a language should deal with digitalized corpora so that they can be listed on a shape of experimental foundation (Hussein, 2014a: 68). In the next few pages, the researchers will shed some lights on a set of methods used in processing the corpus. Concordances

In CL, Desagulier (2017: 87) states that ''a concordance is also known as KWIC display, where the acronym stands for 'Key Word(s) In Context' ". However, a table of concordances shows a word or a sample of words in context (ibid). The context itself is an overture of words to the left of the sample (recognized as the left context or preceding context) and another overture of words (recognized as the later context) (ibid).

Whatever the case is, Sinclair (1991: 32) has suggested an integrated definition in terms of form and function saying that:

A concordance is a collection of the occurrences of a word-form, each in its own textual environment. In its simplest form it is an index. Each word-form is indexed and a reference is given to the place of occurrence in a text.

Francis & Kucera (1982: 1) elaborate the term word-form (or lemma) as ''a set of lexical forms having the same stem and belonging to the same major word class, differing only in inflection and /or spelling''. Thus, concordances are especially useful in studying collocation, the capability of two words to take place next or near to each other in the discourse (Firth 1957, cited in Bondi, 2017: 49). In this regard, Bondi (ibid) observes that concordances retrieve ''all the occurrences of a word or expression in the corpus and showing the node word at the centre of the concordance line in their lexico-grammatical, semantic and pragmatic environment''.

As a result, concordance is one of the most popular tools used in analyzing the corpus that should be taken into account by any researcher concerned with corpus-based studies (Hussein, 2014a: 68). Frequency Lists

A frequency list is the most popular corpus-linguistic tool. It can be generated when someone wants to know how usually words take place in a particular corpus (Gries, 2017: 12). He (ibid) , then, adds that the frequency list of the corpus consists of a two-column chart with the whole words occurring in that corpus in one column and the frequency list occurs in the other. Thus, producing a particular frequency list for a particular corpus depends on making a specialized program (for example WordSmith Tools) which can process the whole items within the corpus creating a basic statistics interested in the overall number of tokens and that of types handed out towards the entirety of these tokens (Evison, 2010: 123-124).

However, it is better and useful to make a prevalent distinction between types and tokens since the concept of words shows a sort of ambiguity. For example, the phrase and the word consists of five words /tokens (''the'', ''phrase'', ''and'', ''the'', & ''word''), but only four words/types (''the'', ''phrase'', ''and'', & ''word''), because (''the'') takes place twice in the chain (Gries, 2017: 12). In this respect, ''a frequency list lists the types in one column and their token frequencies in the other'' (ibid).

Overwhelmingly, any researcher may find the term type frequency indicates the number of diverse types documented in the corpus ''(or in a 'slot' such as a syntactically defined slot in a grammatical construction)'' (ibid). Moreover, Scott (2010b: 148) mentions the advantage of such a list, saying that its usefulness is characterized by particular global features of certain texts, or even of certain languages. So, by making a comparison of frequency list of two or more corpora, linguists can find out what sort of words industrialize the most frequent items and how this can be linked to the text-type s ( or ''text files'' ) (ibid). Keyword Lists

Baker et al. (2006: 97) believe that it is not an easy task to figure out what actually a keyword is. It might be a specific word which occurs more frequently in a particular corpus than what is expected. It might be used more frequently in a very small number of texts in a specific corpus (ibid). Accordingly, Kemppanen 2008 & Probirskaja 2009 (cited in Mikhailov & Cooper, 2016: 134) use an analytic keyword list to find out the coded words in their data of study. They use the easiest way to take out a keyword list using a very interesting software called WordSmith Tools (WS) (ibid).

But, there are some requirements need to be done before using such tools. These requirements are studied by Mikhailov & Cooper (ibid) as in the following:

1. The frequencies of a particular word in the corpus the researcher is interested in (research corpus).
2. The frequencies of a particular word in the reference corpus.
3. The size of the corpus-research.
4. The size of the corpus-reference.

All what is needed to create a keyword list, whatsoever, is two word lists or two frequency lists: one for the corpus the researcher is interested in and one for the reference corpus (ibid). It is even prospective for any researcher to find out a keyword list manually by making a comparison for the two word (frequency) lists in Excel: ''one simply calculates the difference between the frequencies in the test corpus and the frequencies in the reference corpus, and when the list is sorted by difference in descending order, the potential keywords will be at the beginning of the list'' (ibid).

Moreover, the output indicates the significant differences as a filtered word frequency list of the analyzed corpus. These differences between the two lists are computed on the valuable average of kinds occurring in both the reference corpus and the research corpus, using algorithms of different statistical analysis, such as a chi-square test or log-likelihood ratio (Weisser, 2016: 169).

A further distinction can be made between two kinds of keywords: positive and negative keywords, ''where the former represent types with an unusually high frequency in the source corpus'' (SC); ''while the latter are types with an unusually low frequency in comparison to the target corpus'' (TC) (ibid). Collocate Lists

Firth (1957: 14, cited in Hussein, 2014a: 76) describes the notion of collocation taking into consideration the matter of fact that particular words serve out to appear in amalgamation with each other within specific linguistic contexts. Subsequently, a collocate is undoubtedly a word that occurs in the surrounding context of another word (Baker et al., 2006: 37). Nevertheless, Hunston (2010: 162) states that ''the collocates of a given word can be used to narrow down the search''. Likely, Scott (2010a: 121) mentions that collocates form the words that circle a specific search word.

However, a collocation is a sort of multiword expression (MWE). That is to say, one of the constituents of a collocation is called the base which is selected freely and the other component is called the collocate, which is selected by a speaker to the base to explain a particular thought (Fonseca et al., 2017: 447). Moreover, within collocate lists, the focus is not the search word but the set of words as handed out around (Scott, 2010a: 121-122). This indicates that collocate lists are linked with concordances. The latter produce the genuine occurrence of a searchable word joined with its textual setting (ibid).

Identifying collocations by an electronic tool is a difficult task because collocations add that difficulty by having a collocate word, whose meaning is commonly connotative or idiomatic (Fonseca et al., 2017: 448). For example, the collocation pay attention, pay is a collocate word, whose meaning is not like the exact meaning of ''exchanging money for a service'' (ibid).

In this case, certain approaches have been proposed to identify collocations, such as n-grams and syntactic parsing (ibid). In n-grams approach, Ma et al. (2016: 37) define it as a neighboring sequence of n-items taken away from a sequence of texts. It is a window which contains certain words that are extracted, and within this window all combinations of trigrams or bigrams are generated (Fonseca et al., 2017: 448). In syntactic parsing approach, pairs of words are extracted by having some particular syntactic relations. Thus, methods based on such an approach depend on three kinds of word combinations: noun-noun, verb-noun, and adjective-noun (ibid).

Furthermore, they (ibid) add that these two approaches have their own associative measures, such as log-likelihood. Associative measures are in fact used to rank n-grams to create a particular list of the most prospective candidates. But, when these measures are based on syntactic parsing, they are utilized for ranking the best candidates (ibid). In addition to what has just been mentioned above, WS Tools provide any researcher with a distinct window of collocational occurrences and their frequencies that can take place in rows and columns (Hussein, 2014a: 77). Dispersion Plots

Scott (2010a: 130) states that dispersion plots is a method used to be dependent on the concordance and extracted from its lists. Tabbert (2015: 63), however, says that the ''analysis of dispersion or distribution of words in a corpus by using the dispersion plot tool allows conclusions about whether a particular word or keyword accumulates only in one part of the corpus or whether it is evenly dispersed over the corpus''.

Accordingly, it is particularly relevant if the corpus under investigation consists of different kinds of texts where a specific word might be important only for one kind of texts, but not for the whole corpus (Baker, 2006: 68; Scott & Tribble, 2006: 45).

Moreover, Culpeper (2009: 40) uses dispersion plots in his study of Shakespeare's Romeo and Juliet, where the core of the analysis is concentrated on the keywords in the speeches of the lovers. He has found that the word love in Romeo's speech occurs mostly in a couple of scenes; while Juliet's words are equally distributed (ibid). Nevertheless, this tool can also be used to mark which key-words may be of benefit to look at qualitatively (Tabbert, 2015: 63-64).

2.2 Stylistics and Style: Areas of Interest

Stylistics, in fact, has a wild truth which causes a controversy among linguists within linguistics and the critics within literature. Some would decline the benefits of such a field; others see that it is such an important branch within the science of Applied Linguistics (AL). Since it is an extensive term, this controversy comes from the fact that it is hard to determine what Stylistics precisely is.

Crystal & Davey (1969: 9) denote that linguistics is an academic discipline which is concerned with studying languages scientifically; while stylistics is a branch of such an academic discipline, which deals with studying certain aspects of variations occur within the language (ibid). Stylistics, in this sense, is a sub-discipline of linguistics which is interested in the systematic analysis of investigating style in a particular language and how this might vary due to some factors, such as the context, genre, author and historical period (Leech, 2013: 54). Wales (2001: 2, cited in Waheeb, 2017: 2121) as well, authenticates that ''stylistics is a branch of applied linguistics concerned with the study of style in texts (especially but not exclusively) literary works''.

According to Jeffries & McIntyre (2010: 1), ''there is the individual style that distinguishes one writer from another, the styles associated with particular genres (e.g., 'newspaper language' or the gothic novel), or the characteristics of what might constitute literary style''. In this case, Wales (2016: 438) states that analyzing the style implies that the formal characteristics of a particular text should be analyzed in a systematic way and determined their functional importance for interpreting the text in question. Whatever the fact is, Leech & Short (2007: 11) have arisen an important question about the status of style wondering that: why does the author select this kind of expression and not the other?

Accordingly, the aim of exploring the connections between language use and artistic function falls within the term literary stylistics (ibid). Speaking about such connections, Chomsky (2000: 15) puts it as: ''we find that words are interpreted in terms of such factors as material constitution, design, intended and characteristic use, institutional role, and so on''. In this sense, Finch (2005: 187) points out that every time people use their language, they adopt a particular style of some sort. They make ''a selection from a range of syntactic and lexical possibilities'' due to the purpose they aim for in communicating with each other (ibid). Therefore, Leech & Short (2007: 11) define stylistics simply as the linguistic study of 'style'.

Nevertheless, stylistics aims at making relationships between linguistic properties and their effects on the readers (Rundquist, 2017: xvi). Roger Fowler (1986, cited in Douthwaite et al., 2017: 3), is one of the founders of stylistics who believes that stylistics is not an elite, ivory tower of intellectual exercises of training human minds and characters, but it is a performance of the significant task of social criticism (ibid).

However, Norgaard et al. (2010: 1) point out that stylistics studies the ways within which the meaning is generated via language in literature and other sorts of texts. Staying in the forefront of such a line of inquiry, Van Peer (cited in Martindale, 2008: 229-230) implies that stylistics is developed from Russian Formalism via Prague Structuralism in which the concept of estrangement, or ''deviation from normal usages'' (Francis, 2017: 44) is followed by. This indicates that style is involved with a deviation from a norm, or the standard use of the language, to reach at literary, persuasive, rhetorical or other effects (Hickey, 1993: 574). Thus, stylistics is identified as a sub-branch of linguistics which aims to study the distinctive linguistic expressions or the style itself (Verdonk, 2002: 2). It also aims to provide critics or linguists with the huge potentialities of interpretation (Simpson, 2004: 2).

Moreover, Stylistics is interested in using linguistic methodologies to study the notion of style in a given language (Finch, 2005: 187). Therefore, Norgaard et al. (2010: 4) sketch out that the central aim of stylistics is to determine the style of specific texts, genres or authors and domesticate the intuitions of analysts about the texts making them aware of the linguistic patterns and their features. Stylistics, as it is seen by Trask (2007: 280), deals with the aesthetic uses of language, especially in literature.

Over and above, the term Style , at the beginning, comes from stylus which is a Latin word used to refer to ''hard wood, bone, metal or reed'' (Stokes, 2011: 81). It refers to a stick used for writing. Thereafter, stylus is transmuted into style, specifically style of writing (ibid). However, different interests, situations, occupations or even social roles evoke different and various uses of language (Stern, 1983: 125). This leads to make various styles, domains, or registers which are adopted by any native speaker (ibid). Styles, for example, have been categorized from high to low on five scales; frozen, casual, formal, intimate and consultative (Joos 1961, cited in Stern, 1983: 125). Thus, a various number of concepts are involved within linguistics to denote choices and functional variations within one language. One of these concepts is the notion of style (ibid).

Furthermore, Lucas (1974: 31) concludes that the goal of studying style takes place in its relation to English language. Using Lucas' words, he says that ''the study of style in education has three main objects—the appreciation of English; the mastery of English; and the purity of English. If anyone does not find these three important enough, he seems to me greedy'' (ibid). In this case, studying a particular style is a means to characterize the formal characteristics of the texts to elucidate their workable purposes for the interpretations of the texts; or ''to relate literary effects to linguistic 'causes' where these are felt to be relevant'' (Thornborrow & Wareing, 1998: 3). Finally, Crystal & Davey (1969: 9) give their viewpoints about what 'style' is referred to: it refers to the selection of language habits which are characterized as ''an individual uniqueness'' that is based on ''occasional linguistic idiosyncrasies''. This indicates that one's style is different from the other (ibid).

To sum up, stylistics has many areas of interest that any researcher tries to reach at including: the investigation of the linguistic features, the style of the authors and their texts as well as the language itself. Through these areas, stylistics proves itself to be a powerful, useful and significant approach to study texts, discourses and even the ways people use their own languages. So, given a vast variety of perspectives and definitions of the term stylistics, one can further notice that there is no total agreement among linguists about what stylistics is attempted to reach at. This is because every linguist adopts his own view and perspective about such a term.

2.2.1 Stylistics as an Advanced Level of Practicing Linguistics

The most important question that should be addressed here is this: Does stylistics hold a firm position within linguistics ? Or does it consider as an advanced level of practicing linguistics?

Since stylistics deals with the investigation of texts: spoken, written and alike literary and non-literary texts (Zyngier & Watson, 2006: xiv; Jeffries & McIntyre, 2010: 4; & Lambrou, 2016: 96), then it seems that texts are concerned with linguistic performance (Nivre, 2006: 19; & Zamojska-Hutchins, 1986: 13). The use of a particular language in a particular text, of course, reflects certain features of the language being used, such factors as aspects of the situations in which they are used, turn-taking structures, the structures of narratives, principles of conversation in a given discourse, the background information needed in understanding the texts and the identification of what and who is being described in extended stretches of written and spoken languages (Johnston & Schembri, 2007: 253).

However, Alderson & Short (1989: 72) support the idea that stylistics is ''intended to help determine interpretation through the examination of what a text contains, by describing the linguistic devices an author has used, and the effect of such devices. Such an analysis is predominantly text-based, and has tended to see texts as containing meanings''. In this case, ''stylistics is concerned with aspects of language variation'' (Fishman et al., 1994: 231). It deals with describing and analyzing the variability of linguistic features in an actual language use (Rubio, 2011: 1043). This makes it clear that stylistics deals only with ''samples of authentic texts'' (Povolna, 2005: 67). These samples are concerned with, for example, publicism, administrative style, scientific prose style, the language of stylized dialogue and conversation (ibid).

Moreover, the subject-matter of stylistics is 'style'; while the subject-matter of linguistics is 'language'. On the one hand, the term style, ''from which stylistics is derived, has acquired a number of meanings in linguistics'' (ABioye & Ajiboye, 2014: 118). Leech & Short (2007: 9) say that style refers to the way in which a language is used in a particular context. They (ibid) add that the word style is a selection from ''a total linguistic repertoire''. Likewise, Lawal (1997:6, cited in ABioye & Ajiboye, 2014: 118) sees style as a particular aspect of language, which is concerned with certain choices of phrases, sentences, or words as well as the linguistic contents in a connection with the sociolinguistic context of a specific literary text. Burns (2006: 17), in this sense, states that style refers to the total effect ''created by the way in which words are used''.

On the other hand, the term linguistics is simply defined as a science of language, or ''the scientific study of language'' (Lyons, 1981: 37 ; & Alhaj, 2015a: 15). So, what is s tyle after all? And what is linguistics after all?

Both of them deal with same matter, which is 'language', but they take it from two different aspects: style is interested in studying the way in which a language is used (Leech & Short, 2007: 9); linguistics, as a general discipline, is interested in exploring the language scientifically (Lodge et al., 1997: 1). In this respect, ''we are all familiar with the idea of linguistic style, and most people will think first of language in literary style'' (Coupland, 2007: 2).

Moreover, the same is true for sociolinguistics in relation to stylistics. Sociolinguistics is concerned with studying language in relation to society (Hudson, 1996: 4). It is interested in ''how people use language in their everyday lives across a variety of life events and language experiences'' (Vickers & Deckert, 2011: 1). The core point of sociolinguistics is embedded in ''the observable facts of language variation'' (Liamas et al., 2007: xv). So, it aims at exploring ''language variation'' in terms of certain factors such as ''age, sex, social class, etc.'' (Skandera & Burleigh, 2011: 2). Stylistics, within this matter, is quite helpful in a way of using the same logic. It is a closely related discipline that also deals with ''language variation'' (ibid).

One can further notice that ''stylistic variation involves alternation between, for example, casual and formal styles of speech used by an individual speaker, often reflecting differing degrees of attention to speech due to changes in topic, setting, and audience'' (Schilling-Estes 2002, cited in Pfau et al., 2012:789). The whole idea is supported by Jeffries & McIntyre (2010: 3), when they say:

Stylistics has a firm place within linguistics, providing theories of language and interpretation which complement context-free theories generated within other areas of language study.

Thus, in this way the researchers can tell that stylistics does really hold a firm position within the kingdom of linguistics. In fact, it is an advanced stage of practicing linguistics.

2.2.2 The Need of Stylistics in Analyzing Texts

Stylistics and linguistics are linked together within a strong relationship because the broad term linguistics creates theories which facilitate the processes of language interpretation (Simpson, 2004: 2). Hall (2008: 31) observes that stylistics examines the texts in a systematic way within which the linguistic features are related to each other, even as it is identified that ''a relatively fuller understanding of what and how those features mean and to who necessarily exceeds a bare and frozen textual account'' (ibid).

The pretension that stylistics deals with literature rather than linguistics itself, however, is a popular viewpoint among critics within the domain of stylistics. Simpson (2004: 3), for example, props this pretension by emphasizing the significance of literary stylistics for language studies. Different stylisticians see that stylistics deals with literary or non-literary texts since many shapes of discourse, such as casual conversation, journalism, advertising, and even popular music, are manifested in stylistics (ibid).

However, stylistics, or to be more specific contemporary stylistics, shows the unified case of literature and language (Lambrou & Stockwell, 2007: 1). Thus, Burke (2014: 1) mentions that stylistics or literary stylistics is about analyzing and studying the texts, particularly literary texts. On a slightly different side, Jeffries & McIntyre (2010: 8) state that stylistics can take advantages from other branches, like pragmatics and discourse analysis and this is, in turn, may give a particular calculation of the absolute manipulative uses of the language.

Nevertheless, stylistics aims for accounting how texts visualize meaning, how the readers construct this meaning and why those readers respond to the texts differently in the way they do (McIntyre & Jeffries, 2017: 155). Then, they (ibid) add that ''the object of study of stylistics is often literature, but it is not limited to literary texts as the linguistic choices of speakers and writers in all language use can have important consequences''.

As a result, the exegesis energy of stylistics can help linguists understand more deeply the certain ways in which the styling of texts can impact the observance of the readers in daily attitudes (Jeffries & McIntyre, 2010: 8). So, the need of stylistics in analyzing the texts is significant and of benefit since the overall life is deliberated through language and this language structurally and contextually is well-described in stylistic modes (ibid).

2.2.3 Stylistics and Aspects of Style

Stylistics has a variety of aspects which are concerned with the term style. These aspects might be gathered to be three issues which represent the most questionable linguistic definitions of such a term (see Hussein, 2014b). He (ibid: 2), in his book Issues in Literary Stylistics, addresses three basic issues on style, as in the following:

1. Style as choice.
2. Style as deviation from a given norm.
3. Style as recurrence.

However, style as choice marks the repository or linguistic system from which the writers have made their stylistic choices (ibid). Style as deviation from a given norm is especially appropriate to cut off one particular text from other texts (ibid: 2-3). Style as recurrence explores the co-occurrences of specific linguistic features, such as lexical, structural, etc., it helps researchers to reach at a particular idea about the nature of the writer's language (ibid: 3). With regard to what has been mentioned above, the researchers will hopefully lay out the headings of these three basic issues. Style as Choice

Laying more stress on the term style, Abrams & Harpham (2015: 383) define it as ''the manner of linguistic expression in prose or verse ــit is 'how' speakers or writers say whatever it is that they say''. L inguistically, Ghazala (2011: 41) states that style is a choice made by a specific author within the boundaries and resources of language or grammar. This occurs in ''the total options available in the syntactic, semantic, phonological and pragmatic systems'' (ibid).

Relatively, the word style shows ''the way in which language is used in a given context, by a given person, for a given purpose and so on'' (Leech & Short, 2007: 9). In Saussurean terms, this leads to a differentiation between langue and parole, where langue refers to ''the code or system of rules common to speakers of a language'' (ibid); while parole refers to the specific uses of this system or to the selection from this system, ''that speakers or writers make on this or that occasion'' (ibid). Nida & Taber (1969, cited in Farghal & Almanna, 2015: 140) touch the patterned choices in their definition of style which play fatal roles in defining the style of the author. In this case, style refers to a stereotypical choice accomplished at certain linguistic levels, such as phonological, lexical, syntactic and pragmatic levels (Traugott & Pratt, 1980: 409).

Thus, one way of utilizing the store of language system to reveal textual exemplification is looking at certain choices in relation to style. Choices within style, then, are ''motivated, even if unconsciously, and these choices have a profound impact on the way texts are structured and interpreted'' (Simpson, 2004: 22). Therefore, every writer makes his own moves of choices from the system of language use. These choices, however, are assumed to be activated by particular artistic and aesthetic perceptions. This entails that certain factors, which make a text more effective as a written language, are not those factors of clarity or accuracy but rather they are factors that wrap up the relationship that connects the writer with his reader (Kirkman, 2005: 1). Style as Deviation

Enkvist (1985: 40) clarifies the notion of deviation to study style. Such a notion is outstandingly significant in exploring style since texts can be stylistically discriminative when the language deviates from certain expected norms. The norms, however, can be of two types: relative (i.e., norms which are related to some set of texts); or absolute (i.e., norms which are related to the language itself as a whole) (ibid). In this respect, deviation, as a term, refers to a departure from the norms of language as a whole; or from the norms constructed in literary composition and even to those norms which are centered in a specific text (Aquilina, 2014: 13).

In a direct link to the language people use, a language contains all sorts of normal expectations recorded through the grammar of that language (Ellis, 1974: 160). So, the style might be a veer from what is normally predictable (ibid). Traditionally, two semiotic dichotomies are identified by Barthes (1971b, cited in Noth, 1995: 344), to determine the notion of style: ''content vs. form (style as elocution, ornament, or ''dress'', thus: form) and code vs. message (style as a deviation of the message from the coded norm)'' (ibid). Thus, Riffaterre (cited in Wetherill, 1974: 186) describes style as ''a constantly repeated set of deviations from the norm'' in the work of literature. But why is that? Why does the concept of deviation even exist?

It seems that deviation does exist to create a sense of interest and surprise (Leech, 1969: 56-57). Persuasively, ''language can be seen as a deviation from a given expectation'' people have (Renkema, 2004: 148).

Moreover, various kinds of deviation can be differentiated in terms of three levels: realization level (the level of phonology and graphology), lexical or grammatical level, and the third level is the semantic level (Leech, 1969: 57). Nevertheless, determinate deviation and statistical deviation are the subdivisions of deviation (Enkvist, 1985: 40). Where the first subdivision can be seen by what is permitted in the system of the language and what commonly occurs in texts (ibid); while the second is concerned with the linguistic distinctions occur between the norm and the domain itself (ibid). What is more is that Leech (2013: 63) recognizes another kind of deviation, called internal deviation which stands out as a point of climax . This kind can be seen where some features of a language within a given text may stand out against the expectation (or the background) the text leads to (Leech & Short, 2007: 44). Style as Recurrence

Recurrences in a literary form or in a language are basically existed in essential meanings and messages. For example, changes occur in form may signal various messages (Niditch, 2017: 29). Various assumptions can be realized on the term style. One of them is that style is viewed as a function of frequencies (Jones, 2012: 172). Bernard Bloch (cited in Jones, 2012: 172) supports such an assumption in his definition of the term 'style' saying that:

[T]he message carried by the frequency distributions and transitional probabilities of its linguistic features, especially as they differ from those of the same features in the language as a whole.

So, what is the remarkable way of making a style so special in a given text?. Leech & Short (2007: 35) reveal that one can detect what is special about a particular style in a given text by finding out certain frequencies of the linguistic features the text contains, then measuring up the statistics of certain figures belonging to these frequencies against the normal frequencies of linguistic properties. Furthermore, two terms should be distinguished from each other: parallelism and simple repetition. At a first glance, ''parallelism seems to be simple repetition of words or phrases'' (Kolker et al., 1993: 16). But at a closer perusal, a particular idea is stated, then repeated; yet repetition is close to the first glance (ibid). Thus, ''parallelism helps us maintain a balance of thought, all the time contributing to a pleasing poetic meter and rhythm'' (ibid).

2.2.4 Corpus Stylistics

McIntyre (2012, para:1) defines corpus stylistics (henceforth CS) as ''a sub-branch of stylistics that uses corpus techniques to support the analysis of stylistic effects in texts''. It has been argued that CS is expanded in 1928 depending on an investigation of certain features of data. These data are based on a specific type of texts, such as newspaper texts; or based on particular texts of authors, such as Shakespearean texts (Wales, 2014: 92). It is used ''to identify distinctive ‘clusters’ of words or systematic linguistic patterns, often against the norms of everyday language'' (ibid). In other words, CS is ''an emerging branch of corpus linguistics'' (Mason & Giovanelli, 2018: 101). It uses CL to analyze issues or features belonging to style (ibid). More clearly, Mason & Giovanelli (2018: 101-102) mention the key advantage of how CS and CL can be integrated so as to be beneficial in exploring texts:

The key advantage of [such a combination] is that patterns can quickly and easily be identified across whole texts in a way that would be very difficult to achieve by going through a text manually. Thus, corpus stylistics offers the opportunity to uncover characteristics of an author's style or to look at the representation of a character or theme across a whole text.

They (ibid: 102) further state the mechanism of how CS can do its work:

Typically, corpus stylistics involves putting an electronic copy of a literary text into a corpus program and using the software [e.g., WordSmith Tools, see 4.7.1] to either help identify patterns, such as frequent words or phrases, or to quickly support or reject patterns that seem exist when the text is read 'manually'.

An important issue must be seriously taken here. This issue is concerned with the distinction between (CS) corpus stylistics and (CL) corpus linguistics. So, what is the difference between the two?

McIntyre (2015), in his article Towards an Integrated Corpus Stylistics, sheds the light on such a distinction. He (ibid: 60) affirms that many works have developed CS , but none of these works specify a precise definition of what CS actually is, and distinguish it from CL.

Accordingly, he (ibid: 61) gives his own view to define what CS exactly is: CS is ''the application of theories, models and frameworks from stylistics in corpus analysis''. Besides, he (ibid: 60) elucidates that the distinction between them is that CS is not only scrounging (or borrowing) tools that are taken from CL. But it makes itself unrivaled by utilizing qualitative techniques and tools of stylistics to analyze the texts with the support of computational procedures. That is to say, CS ''needs to incorporate theories, models and methods from qualitative stylistic analysis to augment computational techniques'' (ibid).

However, with regard to Mahlberg (2015: 346), CS is interested in studying the literary texts by utilizing the methods of CL to support the texts in the analysis. It aims for analyzing the texts linguistically – texts that are stored electronically (Stracke, 2010: 1). In regard to what has been mentioned, corpus stylistics is the collaboration occurred between the areas of stylistics and corpus linguistics (ibid). Thus, such a collaboration leads to create corpus stylistics (Norgaard et al., 2010: 9). CS, in turn, is extremely recognized as an offshoot of stylistics in which it uses the modernistic methods of CL to scout about the literary and non-literary texts (ibid). Ho (2011: 5) states that language patterns in literature and all other sorts of texts are studied by methods and approaches produced by CS. What is more, he (ibid: 10) mentions that CS must be identified not as ''purely a quantitative study of literature'', but as a qualitative stylistic approach to study language patterns of literature merged with corpus-based quantitative methods and technology (ibid).

Moreover, Mahlberg (2013: 5) observes that CS focuses on the application of corpus methods so that the texts can be analyzed by the combination of linguistic descriptions and literary appreciations (see In this case, CS is the result of the combination of CL and literary stylistics (ibid). Notwithstanding the evidence, she (2014: 378-379) pays much attention to the truth that corpus stylistics connects certain principles taken from corpus linguistics and literary stylistics. So, depending on a corpus-based stylistic approach, one can find out the linguistic ingenious uses of the language used in the texts by using the methods quantitatively (ibid: 380). Naturally, doing a corpus-based analysis is decidedly quantitative but this fact does not aim to exclude the qualitative analysis from the analytical processes (Semino & Short, 2004: 7). In other words, if both quantitative and qualitative analyses are combined, they would provide an ample space for literary texts or other collected data (ibid).

Following corpus-based techniques, however, linguists may ''adopt a confident stand with respect to the relationship between theory and data in that they bring with them models of language and description which they believe to be fundamentally adequate'' (Tognini-Bonelli, 2001: 66). In this respect, Mahlberg (2005: 18) states that the data used in a corpus-based analysis might modify or even adjust a specific theory and sometimes they might be used as a sort of ''quantitative evidence'' (ibid); whereas McEnery & Hardie (2012: 6) point out that corpus-based studies utilize the data to investigate certain hypotheses or a particular theory so that they can be refined, refuted or validated. Thus, corpus-based studies use the corpus as an exporter of instances to assay the intuition of the researcher and the plausibleness, and even the frequency range of the language included within a junior set of data (Baker et al., 2006: 54). The Aims of Corpus Stylistics

CS follows up a variety of aims. But the most important ones are the following:

1. CS aims to define and describe style by identifying the distinction that occurs between ''a sample and a norm, a text and a reference corpus'' (Halliday, 1971; Fowler, 1966; Leech & Short, 2007, cited in Mastropierro, 2018: 1).
2. Stylistics has two types: ''descriptive'' and ''explanatory'' stylistics. The former has the purpose for describing the style; the latter has the purpose for explaining something (Leech, 2013: 54). In this sense, an explanatory aim might be ''extrinsic'' or ''intrinsic''. An extrinsic aim is used to determine the author of a specific text; while an intrinsic aim is used to identify the meaning of the text (ibid). With regard to this, CS has the extrinsic aim which draws an analogy within certain texts and estimates certain linguistic properties in terms of broader linguistic patterns (Mahlberg, 2015: 347).
3. Two aims, concerning CS, are mentioned by Stracke (2010: 1). Firstly, CS aims at studying the meaning which is encoded in languages as well as developing this meaning by using the appropriate techniques to decode it. Secondly, it aims at studying the literary meaning implicated in texts. Characteristics of Corpus Stylistics

According to Mahlberg (2007: 219), she argues that ''corpus stylistics can make use of innovative description tools that not only fit into linguistic frameworks but also leave room to account for individual qualities of texts and thereby link in with literary interpretation''. In this case, CS is viewed as a particular way to bring the study of literature and language ''closer together'' (ibid). CS has a range of properties that play a pivotal role in its construction. The most important of these are dealt by Norgaard et al. (2010: 9-11):

1. CS is the result of the collaboration happens to occur between corpus linguistics and stylistics by the application of the modernistic techniques of corpus linguistics which takes place in the texts.
2. It is interested in the form and function of the texts.


Excerpt out of 208 pages


Translation Assessment and Lexical Loss. A Corpus-Based Approach
A measure of Lexical Drain in Arabic-English translation using Type/Token Ratio and Guiraud’s Index
Thi-Qar University  (College of Arts)
Catalog Number
ISBN (eBook)
ISBN (Book)
lexical loss, type/token ratio, Guiraud's index, Qur'anic Text
Quote paper
Khalid Shakir Hussein (Author)Abdul-Haq Abdul-Kareem Abdullah Al-Sahlani (Author), 2019, Translation Assessment and Lexical Loss. A Corpus-Based Approach, Munich, GRIN Verlag,


  • No comments yet.
Look inside the ebook
Title: Translation Assessment and Lexical Loss. A Corpus-Based Approach

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free