Corpus Linguistics - Differences and similarities between German and English Fairy Tales

Table of Contents

1. Introduction

2. Theoretical Part
2.1. What is a corpus?
2.2 Tools for examining corpora

3. Practical Part
3.1. Corpus Design
3.2. Corpus analyzes and results


Corpus Linguistics has developed extensively in the last decades, thanks to the growing availability of computers and computer programmes.

Corpora are used in a wide range nowadays, not only language teachers who use corpora to get information about how a language works, but also students using corpora themselves to be able to make comparisons between languages.

Also translators utilise corpora to see how words or phrases have been translated in former translations. And even studies of society can resort to corpora to find out about cultural attitudes and class varieties expressed through the language.

But before talking about Corpus Linguistics and the usage of corpora the question “What is a corpus actually?” should be answered.

I also asked this question when I entered the class in the beginning of the semester and received an answer.

During the class we got to know the meaning, importance and different types of corpora. By compiling our own corpus we got a practical introduction on how a corpus works.

For me, the most important question was what a corpus does and what it is good for.

During the work on our own small corpus I soon got an answer, by analyzing the word lists, concordance lines and comparing the results of different texts.

A corpus helps you to look at the text in a different way, to focus on the language and make up your own ideas about why a certain word is used more frequent for example.

In this paper I will try to briefly answer the question what a corpus is, by giving the most important facts about corpora. I will show some examples of different corpora and introduce the main tools for working with a corpus.

The second part will include the results that my group ascertained during our group work in the seminar by working on our corpus that compared fairytales with English and German origin. I will focus on the use of certain words concerning animals, royalty and special persons who are typical for a fairy tale and try to explain the results.
In my conclusion I will give a critical view on my experiences working with the corpus.

2. Theoretical Part

2.1. What is a corpus?

The word corpus itself derives from Latin and means body, which refers to any text in written or spoken form.

But in the last decades the word is more often used to refer to a collection of texts and recorded speeches which represent a specific use of language and which are stored and accessed electronically.

To sum this up one could say a corpus is “A collection of naturally occurring language text, chosen to characterize a state or variety of a language.”[1]

There are some various types of corpora, such as the specialised corpus. This corpus includes texts of a particular type and investigates a particular type of language. It uses for example a particular topic or conversations on a special topic or in a certain environment, the used texts may also be limited in the time they have been written.

A general corpus is usually much bigger due to the many types of texts included. This type of corpus is used as reference materials for language learning or translation.

Furthermore there are comparable corpora which consist of two or more corpora. Each corpus has to include the same number of texts, newspaper articles and so on. The text may be written in different languages and be used to find out differences and similarities in each language. They can also be written in varieties of the same language to compare those varieties. A comparable corpus can also consist of texts of native speakers and a learner corpus, which includes texts written by learners of a language.

Another corpus that deals with differences and similarities in different languages is the parallel corpus. Each corpus includes the same text but in a different language to find out about same expressions and differences.

There also are pedagogic corpora, historical corpora and monitor corpora.

The design of a corpus has also become more important since corpora are used more frequently. Decisions about the size have to be made, a small corpus contains less than one million words and therefore a big corpus counts more than one million words. Since texts are storable on computers in larger size, big corpora became more common, but there are also situations, where small corpora are useful.

The way the corpus should be used depends on the content of the corpus. By choosing text from a special genre or on a certain topic, the user determines the representativeness. Another issue that has to be taken into consideration when talking about representativeness is the balance. A corpus is balanced and consequently representative when it includes an equal number of words in each category that is observed.

2.2 Tools for examining corpora

It might be […] proper to say that corpora are a way of collecting and storing data, and that it is the corpus access programs – presenting concordance lines and calculating frequencies – that are the tools.[2]

With the help of these programs the user can analyse the used texts in frequency, collocation and phraseology of the words.

The frequency can give a closer look to the different words used in different texts. The most frequent words are usually grammatical words like the, of, and, a, in and to. But looking at the lexical words is more interesting and more important when comparing texts and languages respectively.

To look at the special use of certain words, one can use phraseology or concordance lines. They can for example show regularities of words that, most of the time, aren’t recognized in a bigger context. It can also help to proof or abandon made cases.

Collocation shows which word co-occur with a certain, chosen word and can therefore also give a detailed look at the usage of a word.

3. Practical Part

The corpus we compiled during our project work included English fairy tales and fairytales with German origin, for the most part famous tales by the Brothers Grimm. We used 21 fairy tales for each category and even one fairy tale that was basically the same, but with different origin, which means we analyzed a German and an English version of “Tom Thumb”.

3.1. Corpus Design

Our corpus is a balanced corpus with 198.614 Words for the German fairy tales and 198.387 Words for the English ones. Because we intended to find differences in the tales with different origin our corpus is considered to be a comparable corpus with specialised texts.

The corpus consists of a folder called “Fairytales” which is divided into two folders one for the English and one for the German Fairytales.

3.2. Corpus analyzes and results

With the help of frequency shown in the word list and the concordance lines, we found out some special things concerning the animals used in the tales. But also the usage of royalty and the persons that are usual for a fairy tale, such as witch and fairy.

3.2.1. Animals in fairy tales

As it is known animals play a major role in German tales, so we decided to take a closer look to see if this also applies to the English stories. By inspecting the frequency lists one could see that animals in English tales are even more used than in the German ones.


