1.1. Aims
1.2. Structure

2.1. Corpus Linguistics
2.1.1 Areas of corpus research
2.1.2 What is a corpus?
2.1.3 Kinds of corpora
2.1.4 Using the corpus
2.2. Corpora used in this paper
2.2.1. British National Corpus
2.2.2. BYU Corpus of American English
2.2.3. TIMES Corpus of American English
2.3. Search Tools

3.1. Earlier works on idioms
3.2. Defining idiom
3.2.1. Idiomaticity
3.2.2. Definitions
3.2.3. Properties of idioms Noncompositionality Fixedness/Transformational deficiency
3.3. Idiom variation
3.3.1. Grammatical variation Variations in tense Variation in number
3.3.2. Lexical variation Replacement/ Substitution Addition Deletion
3.3.3. Syntactic variation Passivization Nominalization Particle shift
3.3.4. Variation for stylistic effect

4.1. Frozenness Hierarchy
4.2. Level 0
4.2.1. Searching for variant forms
4.2.2. Idioms that show variation bite off one’s tongue bleed one white build castles in the air dip into one’s pocket let off some steam pluck up courage sit on pins and needles turn a deaf ear to
4.2.3. Frozen idioms blow one’s cool face the music kick over the traces stew in one’s own juice
4.3. Variation across text type
4.3.1. Distribution of the selected idioms and their variant forms
4.3.2. Variation in journalism
4.3.3. Summary of the research




1.1. Aims

The central topic of the present paper are the idioms in the English language. Interesting and peculiar, they are a very important part of the lexicon and exist in every language- even the artificial languages may produce idioms. (Čermàk, 1988: 16)

In some earlier studies on idioms it has been claimed that they are nothing more than a fixed string of words with a meaning, different from the meanings of its composite elements.

Psycholinguists have lent a hand in supporting this view as well. Scholars generally have assumed that idioms exist as frozen, semantic units within a speaker’s mental lexicon in the same way that words or stings of them are represented mentally (Gibbs, 1993: 57). Thus they need separate entries in the dictionaries and have to be learned by heart and kept in mind as single words- so they appear to be nothing more than long lexemes.

Idioms have been also commonly thought of as metaphors that have become fixed or fossilized over time and have become “dead” expressions in a language.

Taking this into consideration, in this paper I aim at proving that idioms are not as frozen and fixed as they are supposed to be and that on the contrary, these expressions are quite “alive”- varying, changing and colouring the language.

Actually long time has passed and a lot of research has been done since the time when idioms were defined as completely frozen items and kick the bucket was a representative example of a typical idiom. Idioms are no longer considered just expressions the meaning of which cannot be understood from the meanings of their constituents. During the last few decades of research many investigations in various branches of linguistics - sociolinguistics, psycholinguistics or corpus linguistics, to name a few, have proved that idioms are much more than a simple fixed string of words with own meaning. Now we know that there are quite a large number of idiomatic expressions in language, varying in their degree of compositionality, fixedness and opacity. In spite of the fact that idioms have been always considered to belong to the group of the fixed expressions, nowadays their absolute fixedness is a myth. In fact, different authors in the idiom literature give varying degrees of importance to this property. Sinclair (1996: 83) has reached to the conclusion that the “so-called ‘fixed expressions’ are not in fact fixed” and then Moon (1998: 2) put also an emphasis on the fact that “many fixed expressions … are not actually fixed”.

Many earlier linguists have dealt with the disability of idioms to undergo various transformations (Chafe, 1968; Fraser, 1970; Newmeyer, 1972), there are also tests developed to check their stability (Gläser, 1988).

Latest studies (Cacciari and Tabossi, 1993, Fernando 1996, Moon 1998, Cignoni, Coffey and Moon 2002, Riehemann, 2001; Philip 2005) prove that variation is very common for idioms in natural language and although variant forms of idioms are still considered to be exceptions to the norm, they are much more common in the language corpora than their canonical forms. (Philip 2005: 2)

Thus in the present paper I will take into observation fixedness as one of the most important properties of idioms. There is no doubt that idioms belong to the group of fixed expressions, however it is important to note that their fixedness is not an absolute notion. The degree of fixedness varies in the different idiomatic expressions.

In my research I will focus on searching for variant forms of some selected English idioms in a corpus and then I will investigate the tokens found. I will aim at proving the high cline to variation of English idioms and to give a descriptive account of the variant forms of the idioms selected for my research. The set of idioms that I have chosen is not a random one. My choice was provoked by Bruce Fraser’s (1970) most influential research in the area of idiomaticity- the frozenness hierarchy. According to Fraser, idioms can be ordered into six levels of frozenness, depending on the kinds of transformations that they permit. Thus he claims that there are idioms in English that do permit a lot of transformations like nominalization, passivization, insertions and others that do not permit any transformations at all, which he called “completely frozen idioms” (Fraser, 1970: 41).

Taking into consideration Fraser’s model, I will carry on an investigation in the BNC, searching for transformations and variant forms of the idioms, which Fraser has put on the Level 0 in his hierarchy- the group of idioms that do not permit any transformation and variation and, according to Fraser, are completely fixed and frozen.

For the purpose of this study I will use mainly the British National Corpus. It consists of a very large amount of texts- about 100 Million words of written and spoken English and thus is a very suitable source for my research.

As long time has passed since Fraser wrote his work and many linguists have further commented on it, I have to point out that my objective is not mainly to disregard Fraser’s claim that the selected idioms are not flexible. Rather I seek to offer a descriptive study of the possible variation that these idioms are able to undergo. With modern methods of Corpus Linguistics I want to offer an “updated” view on the idiom frozenness, as defined by Fraser. My goal is to illuminate the phenomenon of variation, to give reasons for it and to exemplify the vividness of the authentic language to which idiomatic expressions contribute a lot.

1.2. Structure

I will begin my paper with a brief summary of the most important aspects of Corpus Linguistics. I will define Corpus Linguistics and corpus in particular and will give an account of the ways and the linguistic areas in which it can be useful for investigation. This outline is important first of all to help me explain my motives for choosing a corpus-based research for defending my thesis and second, it will throw light on the way in which I have been working with the corpus and analyzing the results.

I will describe in short the British National Corpus, the BYU Corpus of American English and the TIME Corpus of American English putting an emphasis on the reasons why I have chosen this set of corpora.

Searching for variants of idioms was not an easy task; it involved a great deal of random searching. I had to try to foresee all the possible idiom variations and check out in the corpora whether such variations exist and if they are used in natural language.

I will also describe the software that I have applied for carrying out my research- Mark Davies’ interface on the BNC found as free software on the Internet and I will explain the ways I have used the software in searching for variant forms of the idioms in the corpora.

The next main part of this paper is dedicated to the idioms themselves. Much has been written and said about these peculiar figures of speech with scholars holding contradictory opinions about them. I will mention the most important steps in the previous research done in this area and I will give an account of the definitions of an idiom found in the literature.

Then, in separate chapters I will concentrate on the main characteristic properties of idioms- their non-compositionality and frozenness and I will discuss in short also some properties like institutionalization, proverbiality and informality. A discussion of this kind is very important for my research not only because it will throw light on the fixedness as a significant characteristic of idioms, but it also will explain the important role, which their other specific properties like compositionality, opacity and institutionalization, play to determine their degree of fixedness.

I will go on with the particular variations that an idiom may undergo such as tense and number variations, additions, deletions and other, dividing these in three main groups- inflectional, lexical and syntactic variations.

Section 4 presents the core of this paper- the corpus research. Before presenting the results of the investigation I will summarize the most important features of Fraser’s work. I will explain the essence of his frozenness hierarchy and will pay some special attention to the idioms, belonging to Level 0 in order to make it clear why I have chosen this group of idioms for my research.

After that I will discuss every single idiom from this group and its variant forms found in BNC and then I will draw a profile[1] of the variant forms of every idiom found in the corpus data.

In addition to this, I will analyze the results also according to the text type in which the idioms and their variant forms appear, proving that idioms and their exploitations are common for specific genres and many of the variant forms are ad hock exploitations for creating of some stylistic effect.

To complete the investigations on the selected idioms, I will search for variant forms in journalistic discourse, in particular in the TIME Corpus of American English, hoping to discover interesting and specific variations of the selected idioms.

In the Appendix at the end of this paper I have attached the results of my research in the BNC. There are tables containing all the tokens of the investigated idioms found in the corpus. The tables represent nearly the form in which the results appear in the search with Mark Davies’ interface in the Internet. I have roughly ordered the results in groups according to the variations that they show. Thus the reader may gain an idea how many variations of the given idiom there are in the BNC and in which close environment they appear in a text.


The role of computers in the area of linguistics has become more and more important during the last couple of decades. At present, research on various linguistic issues, among them idioms as well, is very limited without the help of corpus linguistics and corpus-based approaches.

The most important reason for me to investigate idiom variation by means of a corpus research is the fact that only in this way I have access to an enormous amount of authentic language. This is needed, because idioms are not so common in language like they are assumed to be (Moon, 1998; Strässler, 1982) and the more the language material, the greater the probability to find tokens of idioms and their variant forms.

Furthermore, the methods of Corpus Linguistics give the opportunity to approach Fraser’s model from a different perspective. It should be noted that Fraser based his assumptions on single examples and mainly on his personal perception of language. Corpus research, in this case, makes it possible to analyse the validity of these assumptions among a large number of examples produced by numerous language users.

As already pointed out, doing a corpus research on idioms is not an easy task. Moon mentioned the fact that finding non- canonical forms of fixed expressions in corpora is a matter of good fortune. (Moon, 1993: 51) However, in my opinion it is the best possible way to find, locate, describe and analyze idioms and their variant forms.

Thus in the following chapters I am going to summarize the most important aspects of Corpus Linguistics and explain some significant termini used in this field.

2.1. Corpus Linguistics

Corpus Linguistics is the way of studying the language used in naturally occurring texts with the help of computers and computer technology. It is very important to note that Corpus Linguistics is not just a branch of linguistics or a separate paradigm within it. It is, rather a way of “doing linguistics“ (Meyer, 2002: xi), a “methodological basis for pursuing linguistic research“ (Leech, 1992: 105).

Biber, Conrad and Reppen (1998) summarize the most important characteristics of the corpus-based analysis:

- it is empirical, analyzing the actual patterns of use in natural texts;
- it utilizes a large and principled collection of natural texts, known as a “corpus”, as the basis for analysis;
- it makes extensive use of computers for analysis, using both automatic and interactive techniques;
- it depends on both quantitative and qualitative analytical techniques

(Biber, Conrad and Reppen, 1998: 4)

It is important to note that Corpus Linguistics is being described as a bottom- up approach. “It analyses the evidence with the aim of finding probabilities, trends, patterns, co-occurrences of elements, features or grouping of features.” (Teubert and Krishnamurthy, 2007: 6) It is in fact a very useful method to make people see and explore interrelations and features of language that they will not notice otherwise.

2.1.1. Areas of corpus linguistic research

Generally, corpus-based linguistics is very useful as a research method when searching for answers to most of the questions about language use. It offers a great number of possibilities for analysing language content in any linguistic area.

In the area of grammar, for example, corpus-based research can include examinations on various matters- from morphology and word classes to syntactic structures. Through a corpus analysis we can check for example the frequency of occurrence of given morphemes and find systematic differences in their distribution. We can also find out whether some morphemes are typical for certain types of roots and make a comparison of their distribution in different registers.

Furthermore, using corpora makes it easy to single out specific words and to locate and analyze the syntactic frame they occur in. This is especially interesting for synonyms, because an investigation may reveal specific differences in syntactic or stylistic distribution. (Biber, Conrad and Reppen, 1998: 84)

Very important to mention here is also another area of research- the investigation and comparison between different registers in a corpus. Corpus Linguistics makes it possible to compare between various text types and genres, for instance between spoken and written discourse, between academic texts and fiction. The huge amount of language data gives us the opportunity to draw conclusions on the language use in the specific genre. Furthermore, corpus research may be very useful in the area of language acquisition, because it provides the possibility of studying certain linguistic features across a large amount of speakers and thus establishes a basis for generalisations across language learners. (Biber, Conrad and Reppen, 1998: 12)

Among the areas of linguistics, which can benefit a lot from a corpus research has to be mentioned Historical Linguistics as well. With the help of these methods we can follow the development of a morpheme, word or a construction across time and make a comparison between different historical periods.

In order to summarize I would say that corpus- based methods of research are very useful in any aria of linguistics; they make investigations easier and the results more reliable. These are also the main reasons why I have chosen the methods of Corpus Linguistics for conducting my investigation, presented in this paper.

2.1.2. What is a corpus?

A corpus is a linguistic database, which means a database of language use, of either spoken or written language. In other words it is a collection of a large number of natural texts. (Biber, Conrad and Reppen, 1998: 12)

Most important to be noted here is that a corpus is not only a large amount of text material, but it is usually collected and constructed in some systematic way. This text material very often has to be further processed in order to be useful. Of great importance for a corpus are some aspects, among which is the quantity, size, type or style of its consisting texts. Depending on these aspects there are various types of corpora that are used for various research purposes.

2.1.3. Kinds of corpora

A great deal from the existing corpora are the specialized ones. Such are for example the CHILDES Corpus, which contains children’s speech (Meyer, 2002: 26), the Helsinki Corpus of written texts from earlier periods of English (Meyer, 2002: 20) or the COLT corpus, which contains speech of London teenagers (Meyer, 2002: 18). According to their content, these corpora are used for very specialized and limited number of studies. The CHILDES Corpus for example has been studied by psycholinguists interested in child language acquisition, historical linguists, studying the evolution of English are interested in the Helsinki Corpus, while the COLT corpus helps sociolinguists, studying the language of a particular age group. (Meyer, 2002)

Such kind of corpora as described above is also called non- balanced corpora. The balanced corpora, on the contrary, are larger and consist of texts from various registers or genres, both written and spoken language. Examples of such corpora are the BNC and BYU Corpora, which I have used for my investigations.

Furthermore, corpora can be either tagged or untagged. An untagged corpus is not processed at all- it presents simply a raw text material. A tagged corpus, on the contrary, is a corpus where all the words have been marked in some way. They can be marked for instance for word category or syntactic function, which basically means that nouns are tagged as nouns, verbs as verbs, adjectives as adjectives, etc.

Besides these basic types of tagging there are also some additional like semantic, discourse or problem-oriented tagging. Semantic tagging means the marking of words according to their meaning- cheeks for example is given the tag “Body and Body Parts” and lovely is tagged under “Aesthetic Sentiments”. (Meyer, 2002: 97)

Discourse tagging includes the marking of semantic phenomena such as anaphora and helps analysts recover the discourse structure of a text. (Meyer, 2002: 97) By using a problem-oriented tagging the analyst chooses the tags that are needed for a particular research and assigns them manually to the text for analysis. (De Haan, 1984 cited in Meyer, 2002: 97-98)

Furthermore, corpora are often lemmatized. A “lemma” is synonymous to a “lexeme” (Moon, 1998: 5) and describes the base of a word, including all its inflectional changes (Biber, Conrad and Reppen, 1998: 29). Thus, in a lemmatized corpus if we search for the lemma deal, the results will show all instances of the lexeme deal, including deal, deals, dealing and dealt.

2.1.4. Using the corpus

In order to use and analyse the corpora, one needs special computer software. There are basically two main kinds of software used for the purposes of Corpus Linguistics- concordance programs and specifically made programs used to answer certain research questions. (Biber, Conrad and Reppen, 1998: 15) Concordance programs are search engines, which give the result of the search as text samples. Depending on how the corpus has been tagged, a concordance program can give us text samples of a specific word, a collocation, a lemma or a syntactic construction. Very commonly used is the KWIC (Key Word In Context) concordance. In many cases the concordance programs are not useful enough for answering some specific research questions. In such cases computer linguists are often developing their own programs like the one that I have used for this research, created by Mark Davies and accessed on the Internet.

In addition I should mention in short also another kind of programs used mainly for the creation of a corpus- taggers and parsers. These are programs made particularly to tag raw text material- taggers are used to designate the part of speech while parsers- to designate grammatical structures like phrases and clauses. (Meyer, 2002: 81)

2.2. Corpora used in this paper

Although Hockett (1958) has claimed that almost every element of a language is an idiom or a part of an idiom, later studies (Strässler, 1982; Moon, 1998; 2001) proved that idioms are actually not as common for a language as they are expected to be.

Taking this into consideration, I had to choose a large corpus for my research in order to ensure satisfactory number of tokens needed for reliable results. Most suitable in this case appeared to be the British National Corpus (BNC), consisting of 100 Million words, which I accessed though Mark Davies’ interface on the Internet.

This interface makes it also possible to access two other corpora- TIMES Corpus of American English and BYU Corpus of American English. These appeared to be very useful when searching for unusual exploitations of the selected idioms and also in the cases when I could not find any, or there were very few tokens of a given idiom in the BNC. In such situations it was possible that the idiom might be characteristic for American English and I have checked this probability as well.

I have to stress that have not done a profound investigation on the selected idioms and their variant forms in American English, nor did I make a comparison of their use in both British and American English as this would not serve the purposes of my investigation. However, in this paper I will present some specific and interesting examples of variant forms of the selected idioms found in American English as well.

2.2.1. British National Corpus /BNC/

The British National Corpus (BNC) is a 100 million-word collection of samples of written and spoken language from a wide range of sources which was completed in 1994. The BNC is monolingual and deals exclusively with modern British English from the later part of the 20th century. The corpus includes many different styles and varieties from various fields and registers. Therefore, the BNC is a balanced corpus and consists of a written (90% from the whole material) and a spoken (10%) part. The written part of the BNC includes extracts from regional and national newspapers, some journals and periodicals for all ages and interests, academic books and popular fiction, school and university essays and many other kinds of texts.

The spoken part includes a large amount of unscripted informal conversations, recorded by volunteers selected from different age, region and social classes in a demographically balanced way, together with spoken language collected in all kinds of different contexts, ranging from formal business or government meetings to radio shows and phone-ins. (source: [bnc] About The British National Corpus)

The BNC is a tagged corpus- the CLAWS part-of speech tagger for English has been used for encoding it.

2.2.2. Brigham Young University (BYU) Corpus of American English

This corpus, just released for public in February 2008, was created by Mark Davies from the Brigham Young University. It contains 360 million words (until December 2007) and it is expanded at least four times each year.

The texts in the corpus originate from various American sources- TV, radio, magazines, newspapers and journals published in the US. The corpus is divided into five equally-sized registers- spoken (contains transcripts of unscripted conversations from more than 50 different TV and radio programs), fiction (short stories and plays, movie and TV scripts), popular magazines (over 100 different magazines that present a mixture of specific domains- news, health, home and gardening, women, financial, religion, sports, etc), newspapers (ten newspapers from across the US) and academic journals. (Source: BYU Corpus of American English (1990- 2007). Mark Davies/ Brigham Young University)

The BYU Corpus of American English is tagged by the same tagger that is used for the BNC (CLAWS) and thus gives better opportunity for comparative searches and similar display of the results.

2.2.3. TIME Corpus of American English

The corpus is relatively new and consists of more than 100 million words of text of American English from 1923 to the present, found in the TIME magazine.

This weekly magazine has one of the largest circulations in the world and covers many kinds of topics, such as politics, world affairs, science and technology, art and entertainment and the language used in it is characterized as standard North American English. (Source: TIME Corpus of American English (1923- 2006). Mark Davies/ Brigham Young University)

2.3. Search Tools

In order to search for the selected idioms in the corpora I used an interface found on the website http://corpus.byu.edu/bnc/, created by Mark Davies, a professor of Corpus Linguistics in the Department of Linguistics and English Language at Brigham Young University in Provo, Utah.

This interface allows quick and easy search for words and phrases in the corpus by an exact word or phrase, wildcard or part of speech, or combinations of these. It also allows various searches for collocations within a ten-word window.

Furthermore, the interface enables the search in a single register- spoken, fiction, news, academic, non-fiction and other. Every register is divided into sub-registers, for example the spoken register consists of broadcast discussions, classroom and courtroom discourse, interviews, lectures, parliament and public debates and many others and the interface is designed in a way which enables the search within the sub-registers. One can also easily find the frequency of words and phrases in any combination of registers and sub- registers and make various comparisons between them.

It was very difficult for me to predict what kind of variations of the idioms there are in the corpus in order to search specifically for them. Moon (1993: 51) pointed out that in order to find all variations of an idiom one needs to carry on several queries and even then some exploitation may escape. That is why I had to use every search options of the interface in order to be sure that I had not missed any token and had to try also various combinations of searching.

I have done a query by considering the key words in the idiom clause, following Moon, who claims that searches in the corpus are most successful when the query consists of two lexical words, fairly close together. (1993: 51)

For example for the idiom to leave no stone unturned I searched for the key word stone and its surrounding words in a ten-word window. Since I was searching for the combinations of stone + unturned, I limited the search only to adjectives, surrounding stone. Unturned appeared at the 40-th position on the scale; 74.2 % nearby, which means that in a ten-word window it is usually very close to the word stone.

Furthermore, Mark Davies’ interface makes the search for lemmas possible as well. This means briefly that I could find all the tokens of words and their inflectional forms that appear in the immediate context of a 10-word window. Thus for example in searching for variant forms of build castles in the air, it was necessary to carry out a search of the lemma [build] that appears closely to the lemma [castle]. The results included all the inflectional forms of both lemmas- build, builds, building, built, castle and castles. This was very important in order to locate all the inflectional variant forms of a given idiom only with one search. However, as the given example may suggest, the displayed results after such a search have to be further processed manually. In the case discussed above in the bulk of results I got many examples of a combination of the lemmas [build] and [castle], used with their literal meaning. Such literal examples and also duplicate examples were removed from the lists of tokens.

In the updated version of the interface (available from January 2008) it also became possible to carry out a search of a lemma plus preposition or particle in a context, for example [blow] off + [steam], which made it easier to sort out the results.

These are in short the most important aspects of Corpus Linguistics and the ways I used the corpora and the available research tools. In the next section I am going to focus on the main subject of this paper- the English idioms.


This part of my work is devoted to the idioms- peculiar in nature and very difficult to be studied and organized after some rules of linguistics. It is a fact that idioms pose a lot of interesting problems in linguistics.

One of the well-known difficulties, which one has to face while studying and interpreting idioms, is to find an adequate definition of this phenomenon. Defining idioms is quite difficult, since there are numerous expressions in language like fixed phrases, sayings, binominals, proverbs, phrasal and prepositional verbs, etc., which are idiomatic in nature to one or other extent. These are all fuzzy categories and many authors (Strässler, 1982; Gläser, 1986, 1988; Čermák, 1988; Nunberg et al., 1994; Barkema, 1996) have proposed various separation and categorization of these expressions.

A common definition of an idiom is a group of words whose meaning is different from the meanings of the individual words. This generally means that idioms are phrasal by nature and that their final meaning is not the simple accumulation of meanings carried by each word taken separately and that is why their meaning cannot be approached in a straightforward manner, according to the common rules of semantics. This linguistic phenomenon is present in all human languages (Ćermac, 1988: 16) and it is often used as a very strong stylistic device and represents the important way in which people perceive the surrounding world.

However, no matter how common this phenomenon may be, the idioms are idiosyncratic by nature and one can hardly adopt a simple approach for their study in any language.

3.1. Earlier works on idioms

It is a fact that there was not true interest in idioms before the beginning of the 20th. Century. In 1925 Logan P. Smith published a book entitled “Words and Idioms”, which presented a collection of essays, among them also one, called “English Idioms”, where the author has gathered and classified a huge number of examples. Thus, he put the beginning of a long era of idiom research that in the course of time has gone its way in various directions.

About 20 years after Charles F. Hockett (1958) used the term “idiom” as a cover term for all lexicographic and syntactic phenomena for which is it true that their meaning is not predictable from the composition. He wrote:

“Let us momentarily use the term Y for any grammatical form the meaning of which is not deducible from its structure. Any Y, in an occurrence in which it is not a constituent of a larger Y, is an idiom. (… ) If we are to be consistent in our use of the definition, we are forced also to grant every morpheme idiomatic status, save when it is occurring as a constituent of a larger idiom, since a morpheme has no structure from which its meaning can be deduced.” (Hockett, 1958: 172)

According to this definition, in language it is hardly to find an element, which is not an idiom or a constituent of an idiom. In fact, he distinguishes the following types of idioms: substitutes, proper names, abbreviations, phrasal compounds, figures of speech and slang expressions. (1958: 310-318) However, this definition was quite broad and too vague to satisfy the grammarians in the second half of the 20th Century.

During the time from the late fifties until the seventies there was a great deal of idiom research done, but there is no need here to consider all the works in this area. Important to mention here are a number of Soviet phraseologists- Vinogradov, Kunin, Amosova and others, who have contributed a lot to the development of the phraseologic theory.

It is of far greater significance to my paper however, to pay attention to the scholars, who examined idioms according to the theories of transformational- generative grammar.

In 1963 Katz and Postal made a distinction between lexical idioms and phrase idioms. To the first type belong those, which are syntactically dominated by one of the lowest syntactic categories like noun, verb, adjective and etc. The second group- the phrase idioms are not dominated by any syntactic category. Katz and Postal also proposed to separate the dictionary into two parts according to these two types of idioms and were the first to mention some transformational deficiencies of idioms.

Very important for the study of idioms and idiomaticity is also the contribution of Uriel Weinreich (1969). According to him, only multiword expressions can be called idioms, but he emphasized also the fact that the contrary is not true- not all multiword expressions can be called idioms:

“A phraseological unit that involves at least two polysemous constituents, and in which there is a reciprocal contextual selection of subsenses, will be called an idiom. Thus some phraseological units are idioms, others are not.” (1969: 42)

Supporting Katz’s and Postal’s proposal to divide the dictionary into a lexical and a phrase idiom part, he furthermore suggested a separate rules in grammar called Idiom Comparison Rule or Matching Rule. These rules, according to Weinreich, are used for a comparison between literally processed material and material, which is interpreted idiomatically. It has to be noted that Weinreich excluded also the syntactically ill-formed constructions from the definition of an idiom and proposed these to be treated like lexemes, listed in the lexical-item part of the dictionary.

Another linguist who should be mentioned here is Wallace Chafe. With his article “Idiomaticity as an anomaly in the Chomskyan paradigm” he (1968) demonstrated that the principles of transformational-generative grammar cannot be applied to idioms and that is why they should be considered as an “anomaly in the Chomskyan paradigm”. He claimed that the meaning of an idiom does not present the sum of the meanings of its parts and that most, if not all idioms exhibit certain transformational deficiencies. He noted also that some idioms are syntactically ill-formed (Chafe, 1968: 111).

Furthermore, according to Chafe, every well-formed idiom has a literal counterpart, but the frequency of usage of the idiom is usually much higher than the frequency of the usage of its literal counterpart. (Chafe, 1968: 123)

Another linguist, investigating the behaviour of idioms in the transformational grammar is Bruce Fraser. With his paper “Idioms Within a Transformational Grammar” (1970) he made a very important step in the development of the idiomatic theory. Fraser observed that some idioms can be transformed according to the syntactic rules and they are very flexible, while others do not permit any change and are completely frozen in their form. Thus he proposed that syntactic transformations could be grouped into five classes, which can be ordered in terms of the extent to which idioms are acceptable in transformations from each class and classified idioms according to seven levels of frozenness.

I will come back to Fraser’s frozenness hierarchy later in this paper and I will give a more detailed account of his work and its importance for the present investigation.

During the last couple of years idiom research has in fact turned in another direction. Interest in idiomaticity has been shown in the area of pragmatics, psycholinguistics, sociolinguistic and corpus-linguistics. Jürg Strässler (1982) was the first who explained idioms as pragmatic phenomenon- something that is considered from the point of view of the language user.

The interest in idioms in the sphere of Psycholinguistics has grown as well. Gibbs and Nayak (1989), Gibbs, Nayak and Cutting (1989), Gibbs (1993), Colombo (1993), Cacciari and Tabossi (1993) have carried out investigations mainly concerning idiom processing by native speakers and language learners. The main hypothesis regarding idioms in Psycholinguistics is that there is a conceptual mapping between an expression and its meaning and Gibbs (1993) argues that this mapping is motivated by pre-existing metaphorical connections between the concepts.

Gibbs and Nayak (1989) further demonstrate that people seem to have strong intuition enabling them to judge an idiom as being decomposable or non-decomposable. Psycholinguistic research proposes also that decomposable idioms are also easier for children to acquire and faster for adults to process. A special attention should be paid to the work of Reagan (1986) who, basing his assumptions on Fraser’s frozenness hierarchy, investigated why some syntactic variation of idioms are acceptable among language users and others- not.

To the group of modern linguists I will add those who have examined the behaviour of idioms in the natural language with the help of methods of corpus linguistics (Moon, 1998; Barkema, 1994, 1996b; Riehemann, 2001) Of a special interest to my paper is the investigation of the fixed expressions and idioms made by Moon (1998) and Riehemann (2001), who prove the high cline to variation of these expressions.

A recent research (Coffey, 2001; Tucker, 2001) has been showing how idiomatic expressions are very frequently transformed for stylistic effect by means of lexical or grammatical variations. At present there are also some investigations of idioms across different languages or registers: Coffey (2001), Cignoni, Coffey and Moon (2002) for example make a comparison between idiom variations in Italian and English and Minugh (2001) investigates idioms in the lyrics of popular songs, talk shows and TV soap operas.

3.2. Defining idioms

In order to define idioms, I will first illuminate the phenomenon of idiomaticity. Most important to note here is that there is no universal opinion in the idiom literature about the classification of idiomatic expressions and their degree of idiomaticity.

3.2.1. Idiomaticity

Idiomaticity is the tendency of phrases to take on meanings that go beyond the meanings of their parts and “is important for this reason, if no other, that there is so much of it in every language.” (Weinreich, 1969:23) In every language there is a great number of polylexemic expressions, whose meaning cannot be deduced from the meanings of their constituents. This widely spread language phenomenon is however also very problematic and very difficult to be defined and organized according to language principles and rules.

Very helpful in trying to understand idiomaticity appears to be John Sinclair’s (1991) view of language. According to him language is governed by two main principles: the open choice principle and the idiom principle. The former views language text as the result of a very large number of complex choices: “At each point where a unit is completed (a word or a phrase or a clause), a large range of choice opens up and the only restraint is grammaticalness. (Sinclair, 1991: 109) This principle is also called ‘slot-and-filler’ model, because texts are presented as series of slots, which have to be filled in. The underlying assumption is that in each of these slots any single word can occur.

However, using the open choice principle is not enough to produce a normal text. Sinclair (1991) points out that we need the so-called idiom principle in order to account for the restraints that are not captured by the open- choice principle. The idiom principle presupposes that every language user has a large number of semi pre-constructed phrases that constitute single choices. (Sinclair, 1991: 110)

Thus, in every language there are a great number of idiomatic phrases. Čermák claims that “if a language has metaphors, then it has idioms too” (1988: 16). He points out that idioms exist in every natural language and even the artificial languages may acquire them if they undergo some development. However, it is important that not all idiomatic phrases can be called idioms. There are various categorizations of idiomatic expressions found in the literature, and there is no common opinion about this. Some scholars for example put phrasal and prepositional verbs in the group of idioms, some not; often binominals and frozen similes are counted as idioms as well.

The table below represents a categorization of idiomatic expressions, summarized by Strässler (1982: 15-16).

Abbildung in dieser Leseprobe nicht enthalten

Table 1. Different examples of idiomaticity, summarized by Strässler (1982:15-16)

What all these expressions have in common is that their meanings are not deduced from the ordinary meanings of their components by the usual rules of compositional semantics. However, only a small part of these can be called idioms. Strässler (1982) uses the label “tournure idioms” in his classification in order to exemplify quite non-compositional expressions like kick the bucket or come hell and high water.

In the following chapter I will give a definition of the term “idiom” and I will account for the characteristics that make an idiom to be considered as such.


[1] I used an approach similar to what Barkema (1994) describes as a Received Form Profile- making an inventory of the canonical forms and all the variant forms of a given idiom.

