Corpus Linguistics - An Introduction to the Field and its Use in Linguistics

Term Paper, 2010

12 Pages, Grade: 2


Table of Contents

1. Introduction

2. What is a corpus?

3. What is the use of a corpus?

4. What are potential risks?

5. What are the fields of application of corpus linguistics?

6. What is the use of corpora in syntax and morphology?

7. Conclusion

8. Works Cited

1. Introduction

This paper will be dealing with the topic of corpus linguistics. It will try to give an overview about this topic, as I think that, although it already gained popularity, not everyone is familiar with it.

First, I want to explain what a corpus actually is, and what it is useful for. The different types of corpora will be described and also potential risks of depending to much on computer-processable corpora. Then the focus will shift to the fields of application of corpus linguistics, and also the use in syntax and morphology will be discussed. I will also try to illustrate the opportunities a corpus provides by using an example for better understanding.

The main aim of this paper is to give an overview about corpus linguistics and the fields of application, with attention to syntax and morphology.

2. What is a corpus?

Before going deeper into the topic of corpus linguistics, it has to be clear what a corpus actually is. In The BNC Handbook (Aston and Burnard 1998), several definitions of the term ‘corpus’ are given. From the options the OED offers, two fit the linguistic term:

- “A body or complete collection of writings or the like; the whole body of literature on any subject”

- “The body of written or spoken material upon which a linguistic analysis is based”

(qtd. in Aston and Burnard 1998, 4)

Especially the latter describes the term that is relevant for this paper. According to John Sinclair, who plays an important role in corpus linguistics, a corpus is “a collection of pieces of language, selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language” (qtd. in Aston and Burnard 1998, 4). To serve as an illustration of a specific language already names a significant feature of corpora which will be dealt with in detail later in this paper.

But how does such a corpus look like? Nowadays, most corpora are computer-processed. They are huge collections of written and spoken material from different sources of all kinds of fields. The British National Corpus (BNC), for instance, contains 100 million words at the time this paper is written (2010).

What distinguishes a corpus from other collections of texts is that it is “an object designed for the purpose of linguistic analysis, rather than an object defined by accidents of authorship or history” (Aston and Burnard 1998, 4). The samples are collected with specific criteria in mind to serve as a representative of a language, or a certain field of language.

There are many different kinds of corpora that focus on different aspects of language, e.g. geographical varieties, historical varieties, child and learner varieties, genre- and topic-specific corpora, but also spoken language or multilingual corpora (cf. Aston and Burnard 1998, 10ff.).

3. What is the use of a corpus?

As it has been said above, the samples in a corpus are selected according to particular criteria, depending on its purpose. The selection does not so much depend on the content of the text samples, but more on “external” features such as “the situation of their production or reception” (Aston and Burnard 1998, 5).

The biggest advantage of a corpus for linguists is its size. The history of corpus linguistics only dates back to the 1960s, and even shorter is the history of computer-processable corpora. Before they had the possibility to browse large corpora for signs for language rules, linguists were working a lot with introspection. No individual is able to have the whole “repertoire” of a language, as there are many different fields of language with different terminologies. Corpus data can be seen as an extension of “linguistic intuition” to create an objective impression of a language: “A corpus can enable grammarians, lexicographers, and other interested parties to provide better descriptions of a language by embodying a view of it which is beyond any one individual’s experience” (Aston and Burnard 1998, 5). For grammarians, a corpus provides, due to its size, information about the frequency of certain combinations of words and about sentence structure; lexicographers use it to find out the frequency of words, which is useful e.g. in composing dictionaries. Also, information about different uses of words, register, diachronic varieties and different uses of language in general can be found.

(cf. Aston and Burnard 1998, 5ff.)

In his introduction at the Nobel Symposium 82 in 1991, Jan Svartvik names several reasons for using a corpus. First, there is more objectivity in linguistic analyses than in relying on introspection, as many different sources can be used to make a statement about a specific phenomenon.


Excerpt out of 12 pages


Corpus Linguistics - An Introduction to the Field and its Use in Linguistics
University of Innsbruck  (Anglistik)
Synchronic and/or Diachronic English Linguistics: English Syntax and Morphology
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
410 KB
corpus, linguistics, introduction, field
Quote paper
Theresa Rass (Author), 2010, Corpus Linguistics - An Introduction to the Field and its Use in Linguistics, Munich, GRIN Verlag,


  • No comments yet.
Read the ebook
Title: Corpus Linguistics - An Introduction to the Field and its Use in Linguistics

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free