Corpus Pattern Analysis and Sense Disambiguation. A Case Study


Bachelor Thesis, 2018

78 Pages, Grade: 2,3


Excerpt


Table of Contents

List of Figures

1. Introduction

2. Morphological Issues
2.1 Suffixation
2.1.1 The Suffix -ify
2.2 Adjectives

3. Approaches to Lexical Analysis
3.1 The Theory of Norms and Exploitation
3.1.1 Pattern Dictionary of English Verbs
3.2 Frame Semantics
3.3 Word Sense disambiguation
3.4 Valency
3.5 Polysemy
3.6 Collocation
3.7 Colligation
3.8 Semantic Preference
3.9 Semantic Prosody

4. Corpus Linguistics
4.1 The British National Corpus
4.2 The NOW Corpus

5. Etymology and meaning
5.1 Pretty
5.2 Vivid

6. Method and Analysis
6.1 Analysis of to prettify
6.2 Analysis of to vivify

7 Conclusion

8. Sources
8.1 Literary Sources
8.2 Electronical Sources

9. Appendix
9.1 The tokens of to prettify
9.2 The tokens of to vivify
9.3 the first pattern of to prettify
9.4 the second pattern of to prettify
9.5 the first pattern of to vivify
9.6 the second pattern of to vivify
9.7 the third pattern of to vivify

List of Figures

Figure 1. Selected Pattern Dictionary examples of the verb to say

Figure 2. Selected examples of the first pattern of to say from the Pattern Dictionary

Figure 3. to prettify - the first pattern modelled after the pattern dictionary

Figure 4. to prettify - the first pattern modelled after the model of extended lexical units

Figure 5. to prettify - the second pattern modelled after the pattern dictionary

Figure 6. to prettify - the second pattern modelled after the model of extended lexical units

Figure 7. to vivify - the first pattern modelled after the pattern dictionary

Figure 8. to vivify - the first pattern modelled after the model of extended lexical units

Figure 9. to vivify - the second pattern modelled after the pattern dictionary

Figure 10. to vivify - the second pattern modelled after the model of extended lexical units

Figure 11. to vivify - the third pattern modelled after the model of extended lexical units

1. Introduction

The aim of the present paper is to figure out if and how the target words of this study, verbs derived from adjectives with the suffix -ify that were selected from the NOW Corpus, differ in use and show different meanings in context. In order to analyse the target words to prettify and to vivify this paper will utilise of the technique promoted by Patrick Hanks. His theory of norms and exploitation (TNE) investigates meaning under the aspects of the mental lexicon and collocations. The target words will be analysed in the manner of Patrick Hanks Pattern Dictionary approach as well as the model of extended lexical units by John Sinclair in order to give an appropriate overview of the different aspects and nuances of their meaning.

The second chapter will discuss certain morphological aspects such as suffixation in general, the suffix -ify and the topic of adjectives as all of these topics contribute to the morphology of the target words. Following the morphological aspects, the next chapter will outline certain theories concerning lexical analysis such as the theory of norms and exploitations by Patrick Hanks. The Pattern Dictionary of English Verbs will be shortly described before the chapter continues with frame semantics, valency, polysemy as well as the model of extended lexical meaning which consists of collocation, colligation, semantic preference and semantic prosody. The succeeding fourth chapter will shortly summarise the most important aspects of corpus linguistics, before turning to the British National Corpus and the NOW Corpus. Chapter five is going to lists the meanings and etymology of the target words. The subsequent sixth chapter will present the method and the findings of the analysis and evaluate the results.

A final outlook will be given in the final seventh chapter on how to expand this study and whether this approach was successful in its findings.

2. Morphological Issues

The following chapter will discuss certain morphological topics such as suffixation, the suffix - ify which is of interest for this study as it is the deriving suffix of the target words. Furthermore, as both of the derivatives were formerly adjectives, the qualities of this particular word-class will also be outlined.

2.1 Suffixation

English absorbed many words from other languages and a basic knowledge of Greek and Latin may help to interpret these words correctly. Words can be complex and can include several bound elements: prefixes, suffixes or circumfixes (Stein 2007: v). This paper, though, will only concentrate on suffixation.

In English, the most common way of creating new words is by adding affixes onto another word or base. Across all languages, the suffix proves to be the most productive morphological element as it is employed for multiple purposes such as inflection or derivation (Bauer 1988: 19). Suffixes are bound elements attached to the end of a word. There are two different functions represented by two different types of suffixes. One of these types is the derivational suffix which compels the word to change its class. The second type is the grammatical suffix which purely expresses grammatical information such as tense or number (Stein 2007: vi). The suffix determines the word-class of a derivation. A change in word-class may arise when a suffix is combined with a base of a word. This is perceived as the key function of suffixation (Schmid 2016: 163).

Typically, the base suffix does not undergo any morphological or phonological changes but there are some derivations which diverge from this pattern. In some cases, bound elements are functioning as bases instead of free morphemes (Schmid 2016: 163). Phonologically, a change can occur in the vowel of the stressed syllable when a suffix is added (Schmid 2016: 164).

As the suffix of interest to this study is a verb-forming suffix, only this aspect will shortly be described while noun- and adjective forming suffixes will not be considered due to the limited space of this thesis. There are four important suffixes contributing to the productivity of verb-forming suffixes: -en, -ate, -ize, and -ify. The suffix - en is no longer as productive as it used to be, though the other three suffixes occur mainly in technical registers (Schmid 2016: 178).

2.1.1 The Suffix -ify

As the preceding chapter discussed suffixation more generally, this chapter will now outline the certain characteristics of the suffix -ify.

The suffix is - ify can also be realised as - fy as there are some instances where the derived word only requires the shortened version of the suffix such as in rarefy. The suffix is of Romance origin and stems from the Latin verb facere which can be translated into English as to make. Facere evolved into the derivational suffix - ificare which was commonly used in later stages of the language. English borrowed it from the subsequent French borrowing of this Latin word and fier turned into - (i)fy. It is commonly applied to Latin and Germanic roots such as beaut - and ugli - (Dixon 2014: 191).

Suffixes such as - ify transfer the stress on the preceding syllable (Huddleston 2002: 1670). Furthermore, - ify selects predominantly monosyllabic bases or bases that consist of an iambic foot (Bauer, Lieber & Plag 2013: 271). This suffix is primarily added to nouns, adjectives and bound bases in order to form verbs. Some listed meanings of the suffix denote a change of state. “To become or to become [to a greater extent] like what is denoted by the verb” or “to make look like, to give appearance of what is denoted by the noun” as well as “to cause” are definitions listed in a dictionary concerning the affixes of the English language (Stein 2007: 82). In conclusion, when added to an adjective the derived verb usually means to make while the combination with a noun base usually means make into (Huddleston 2002: 1714). Some combinations imply a derogatory meaning such as Frenchify (Stevenson 2010: 698). The suffix -ify is still productive although it has never been used in order to derive many words. Its continuing productivity can be seen in neologisms such as yuppify (Huddleston 2002: 1714). Another function mentioned by Dixon is that it can have a jocular use in an intimate register such as argu-ify-ing (Dixon 2014: 191).

According to the Cambridge Grammar, - ify can occur with prefixes denoting reversal, such as declassify (Huddleston 2002: 1689). It can also occur with another subsequent suffix such as -ion when forming nouns. This can result in words such as glorification and purification (Huddleston 2002: 1701)

2.2 Adjectives

In English, finer notions of meaning are expressed by adding adjectives into a discourse. Adjectives serve the purpose of altering, clarifying or adjusting the meaning contributions of nouns and verbs (Huddleston 2002: 526). “Adjectives can be defined as a syntactically distinct class of words whose characteristic function is to modify nouns” (Huddleston 2002: 527). They typically denote properties concerning the domains of shape, colour, size, worth and age as well as physical and emotional properties such as the heaviness or kindness of an individual. Adjectives possess the function of being gradable as well as appearing in three main functions (attributive, predicative or postpositive) and being dependent in the form of adverbs. Other characteristics include their inability to either inflect tense or number. Moreover, they do not take noun phrases as complement and they cannot be modified by other adjectives (Huddleston 2002: 528).

3. Approaches to Lexical Analysis

Whereas the previous chapter summarised the morphological aspects which are significant to this study, the next chapter will outline the theory of norms and exploitation by Patrick Hanks which heavily influences the analysis of the target words in the succeeding chapters. Additionally, a subchapter will present the Pattern Dictionary of English Verbs, which was also created by Hanks. The subsequent three chapters will explain the fields of frame semantics, valency grammar and polysemy as all of these topics have contributed to the TNE. Another approach which has also contributed is the model of extended lexical units which consists of collocation, colligation, semantic preference and semantic prosody. The last subchapters will explain this model before the thesis will turn to the subject of corpus linguistics.

3.1 The Theory of Norms and Exploitation

While different words participate more or less naturally in a vast number of different constructions, at the heart of any language lie a few very straightforward frameworks consisting of sets of simple, prototypical phraseological patterns that go to make up phrases and clauses and are used by people to make meaning (Hanks 2013: 93).

How do people use words to make meaning? There have been many different propositions to answer this question. Patrick Hanks created his own approach concerning the mental lexicon and the notion that meaning is significant.

Language users communicate with each other using word conventions which have one or more universal meanings. These shared beliefs are closely connected to phraseology and enable conversational communication, a principle set out by H.P. Grice (Hanks 2013: 105).

Although collocations and prototypes “form the very foundation of meaning in language”, they have not been sufficiently investigated by many linguists (Hanks 2009: 1). Attempts were made to create a perfect language and create a collection of precise definitions. This took place during the European Enlightenment. Some approaches even continued to use invented examples of language in order to avoid certain defects such as fuzziness, vagueness and variability, all of which are more than common in natural language (Hanks 2013: 345). According to Hanks, the “supposed imperfection of natural language is in fact a basic design feature, contributing power and flexibility within a robust framework” (2013: 346). His theory centres around the idea that while acquiring natural language, a person must also acquire two competences: the first of which is a rule-governed norm and the second is the ability of semantic exploitability (Hanks 2013: 8). A semantic exploitation of a norm, a salient prototype of phraseology, is a use of a word which is deliberately departing from an established pattern of normal word use. This is done in order to either say something old in an original way or when discussing something unusual. This phenomenon relates to linguistic creativity but is also used for reasons of economy of utterance, rhetorical effect or sheer fun. There are a number of instances including ellipsis, anomalous collocation, semantic-type coercion, metaphor and similies which contribute to these exploitations of norms (Hanks 2013: 250). It has been shown that each new generation manages to find a way to exploit norms in a different manner and therefore create new meanings, thus promoting meaning change (Hanks 2013: 282). These norm exploitations can also lead to new word meanings as they are typically unstable and can suddenly develop new meanings and subsequently new phraseological patterns and collocational preferences. In some cases, these words can have one or two dominant meanings to which the new, less frequent, meaning is added while other new senses completely drive out the already existing ones (Hanks 2013: 171). It has been observed that secondary norms, meanings of a word which are common now, stem from past exploitations of the primary norm. Furthermore, it has also become apparent that primary norms become rare over a period of time or even die out (Hanks 2013: 303). Exploitation, though, must be distinguished from mistakes as not every new phraseological pattern is intentional or meaningful (Hanks 2013: 250).

Following these two competences, Hanks compiled salient properties of natural language in order to show how this manner of exploitation works. While analysing examples taken out of a corpus, Hanks differentiates between the normal use of a word or an exploitation of the semantic meaning. Furthermore, the category of lexical sets, words which are linked by a common semantic type, is added. The lexical set and its semantic type is given a name, such as [[firearm]] which is a [[physical object]]. He also distinguishes between the open- choice principle and the idiom principle which differ in the respect that there are on the one hand pre-constructed phrases and on the other hand a large variety of choices to combine words with the only constraint of grammaticalness (Hanks 2013: 11-15). When conversationally employed, words can make use of their meaning potential. This is described by Hanks as being transferred from a state (the meaning potential) to an event (the actual meaning). As meanings are usually associated with context rather than words in isolation, they are defined by their interactions of contexts and meaning potentials. Nevertheless, the meaning potentials can differ depending on the context (Hanks 2013: 82f.). This approach relates to Ludwig Wittgenstein who claimed that “meaning is use” as well as J.R. Firth who “[knew] a word by the company it [kept]” (Cruse et al. 2005: 1698; Firth 1957: 11).

According to Hanks, identifying a pattern is not sufficient enough for lexical analysis. There has to be a certain meaning that demands for an individual pattern which has to be investigated (Hanks 2013: 96). These patterns form lexical or linguistic gestalts which typically consist of one or more prototype of phraseology, the norms of which each is associated with a prototypical meaning. These prototypical meanings are called an implicature. The linguistics gestalts can be either simple or complex. Usually, one or two dominant meanings exist next to some additional, less salient meanings (Hanks 2013: 302). Some norms are closely connected in their semantic and phraseological meaning but then again there are others which have no immediate connection at all (Hanks 2013: 305).

It is of importance to Hanks to find out how lexical items relate to perceptions, emotions, attitudes and conceptual representations. In order to investigate this properly, a theory must be shaped to analyse syntax, valency and dependency. Furthermore, the nature of linguistic creativity must be taken into account. “Such a theory must be based on close and careful analysis of large quantities of evidence of real language used by real people for real purposes” (Hanks 2013: 407). These real purposes can include newspaper articles, which this paper employs for analysis as well as speeches, letters and narrative fiction but “the most basic function of language is seen as interpersonal communication” (Hanks 2013: 409). Communication is predominantly built up by words following each other while syntax can be viewed as secondary. For communication in a foreign language to work, a person has to be able to communicate with words which do not have to be syntactically correct in order to convey the meaning. Subsequently, this type of speech can be seen as purely rudimentary. If the meaning has been understood, the communication can be seen as successful. This does not work the other way around as it is not possible to use syntax without words. In more sophisticated terms, words are constructs and are to be seen as events involving action between two or more participants (Hanks 2013: 409). In order to study language correctly, syntactic frameworks and selectional preferences must each be taken into account. This is where the theory of norms and exploitation begins to have increasing significance:

TNE is a ‘double-helix' theory of language. The set of rules that govern normal conventional use of words is intertwined with a second-order set of rules that govern the ways in which these norms can be exploited and that contribute very largely to the phenomenon of language change. [.] TNE is a theory of prototype and preferences, based on extensive analysis of actual traces of linguistic behaviour - what people say and what they may be supposed to mean - as recorded in large corpora (Hanks 2013: 411).

Hanks claims that in order to analyse a verb and its use properly, the knowledge of clausal roles is crucial. In comparison to tense and aspect, they matter more when analysing lexical patterns. All variations of sentences consist of a different amount of clausal types, the predicate being obligatory. Further clausal roles are the subject, the object, the complement and the adverbial. These clausal roles provide a “lexically real rather than a syntactically and logically abstract apparatus for practical analysis of words and meaning [as well as the valency of a predicate]” (Hanks: 2013: 95). At least there have to be the predicate, the subject and an adverbial for a basic valency framework (Hanks 2013: 143). A major aspect of TNE are collocations as meaning is dependent on and triggered by certain lexical sets and they prove to be “integral to the system that each speaker has internalized since birth” (Hanks 2013: 413).

Based on this theory, a new kind of dictionary, the Pattern Dictionary, is being proposed and already in work. This idea was created by Patrick Hanks and James Pustejovsky. Their method, sorting the corpus evidence and then assigning meaning to the patterns and not the word in isolation, is being recreated in this particular thesis (Hanks 2009: 224). This Pattern Dictionary is being described in more detail in the following chapter.

3.1.1 Pattern Dictionary of English Verbs

Recently, there has been a new technique for mapping out meanings onto words. The Corpus Pattern Analysis (CPA) is based on the TNE approach. The TNE approach was further influenced by the Generative Lexicon by Pustejovsky, the theory of semantic preference by Wilks, the works on corpus analysis and collocations by Sinclair as well as frame semantics by Fillmore. This approach attempts to offer a systematic analysis of the meaning pattern as well as the use of each verb. This lexicocentric approach centres around prototypical syntagmatic patterns. The pattern for nouns consists of general statements including all possible collocates while the pattern for verbs include the valency structure as well as individual semantic complements. The meanings of the target words are always investigated in context. The CPA does not offer word meanings in isolation but rather prototypical sentence contexts that are grouped into semantically motivated syntagmatic patterns. The task of selecting the most general description in order to find the right nuances of the distinction of the different implicatures is very challenging. These patterns help to establish an effective disambiguation. The corpus samples that form the basis for the CPA are 250 randomly selected samples in order to focus on statistically significant collocations, not unlike the search engine for language learners, Sketch Engine. All of the examples include the three types of exploitation: the semantically anomalous arguments, the figurative uses and the syntactically anomalous structures and thus creating an unrivalled collection of creative language use. The relative frequency of each phraseological pattern is also additionally listed (http://www.pdev.org.uk/#about_cpa).

Abbildung in dieser Leseprobe nicht enthalten

Figure 1. Selected Pattern Dictionary examples of the verb to say

An example of how these patterns are realized visually is given above. The instances are the first three examples that were listed of the verb to say. This example was chosen randomly. The list continues and consists in total of nine patterns, two of which are patterns consisting of idiomatic expressions (http://www.pdev.org.uk/#browse).

say (1 )

Abbildung in dieser Leseprobe nicht enthalten

Figure 2. Selected examples of the first pattern of to say from the Pattern Dictionary

When continuing on the page on the button ‘more data', the website shows a concordance list with a varying number of examples, depending on how frequently the pattern is occurring (http://www.pdev.org.uk/conc.php?verb=say&patnum=1&expl=both&ssize=).

3.2 Frame Semantics

The theory of frame semantics centres around the idea that language may be used in order to create perspective as “underlying conceptualization of the world” (Geeraerts 2010: 225). Playing an important role in the study of frame semantics, Charles J. Fillmore expressed that although we do not express our view of the world, the way an uttering about the world is indicates a certain perspective of how life is perceived individually. Fillmore's approach investigates how the purely linguistic part of language indicates how different grammatical patterns and expressions might verbally highlight or alter a situation. The term ‘frame' consists of certain aspects which all contribute to the meaning of an event. An example would be the risk frame: this frame consists of a protagonist, a possibly bad decision, a goal, the setting and possible possessions (Geeraerts 2010: 227). All these factors contribute to a situation or event which might be familiar to language users. Subsequently, all language users have an understanding of a familiar situation which may only differ in certain details that depend on individual experiences.

As new word senses can be discovered in natural language use, this means that there must be a distinction between de-contextualised coded meanings and contextualised ones which are realised in different discourse types. Therefore, the source is a starting point for a proper analysis: when encoding an utterance, a contextualisation is being created through invited inferences which are “interpretations that are not expressed explicitly but are nevertheless intended or at least allowed by the speaker” (Geeraerts 2010: 230). In order to figure out if semantic change has taken place, contextualised readings that leave traces are necessary. “[T]he traces are needed to ensure that, as a result of the growing entrenchment of peripheral readings, the internal structure of a category may change” (Geeraerts 2010: 232). Hypothetically, categories that are organised in a prototypical manner are more likely and more conventional when compared to less frequent meanings. This extensional non- equality of lexical semantic structure is highlighted by the range of the prototype theory which is dynamic, able to change and extend its core range to integrate incidental, transient changes of a word meaning (Geeraerts 2010: 233f.). Within the study of frame semantic all possible arguments of every verb have been classified as a semantic role which according to Fillmore can help to “identify certain types of judgements human beings are capable of making about the events [...], judgements about such matters as who did it, who it happened to, and what got changed” (1986: 24f.). Sentences can consist of complements such as the agent, the experiencer, a theme or patient, a goal, a recipient, a source, a location, and instrument and a beneficiary (Riemer 2010:338).

3.3 Word Sense disambiguation

Lexical disambiguation is a field of computational linguistics which centres around the purpose of determining the meaning of words in context. The field is more specifically called word sense disambiguation (WSD) and is defined as “the problem of computationally determining which ‘sense' of a word is activated by the use of the word in a particular context” (Mehler 2012: 1).

Another definition depicts disambiguation, or ambiguity resolution, as the “process of assigning a word its appropriate meaning within a given context” (Gelbukh & Kolesnikova 2013: 2). Word pairs, such as to plant a garden or to sign a contract, are called lexical functions and these lexical functions serve the purpose of generalising semantic and syntactic properties of a high number of word combinations. These word combinations are predominantly collocations, meaning that the lexical function is to show clear semantic affinities of a word regarding the selectional preferences. These are not easily analysed by a computer system as the meaning of a collocation is not assumed by the meaning of its constituents (Gelbukh & Kolesnikova 2013: 5f.). This proves to be problematic as computers should be able to identify the correct meaning while they attempt language processing such as a Google search, an information retrieval or machine translation (Stevenson 2003: 1-3).

The task of classification is defined by the classes which are the word senses for which the context provides the evidence. Each singular occurrence is assigned a preference of class, depending on which context it appears in (Mehler 2012: 1f.). A traditional view of WSD sees its purpose in finding fixed word senses as words are assumed to possess only a finite number of meanings (Mehler 2012: 2). This is suggested by traditional dictionaries which only list the most common meanings of a word. When using a dictionary, one has to consider that the author might have been biased by his choice of meaning of a specific word whether this was intentional or not (Stevenson 2003: 16).

WSD is related to other fields in linguistics such as lexical semantics as it analyses words in context in order to find out their meaning although it never found a place in this particular field. This could be due to the fact that lexical semantics is more concerned with representational issues while WSD is more closely connected to language technology and shares common features with modern lexicography (Mehler 2012: 2).

There was an approach of human sense disambiguation. A certain number of native speakers of a specific target language were asked to annotate a number of sentences which contained the word of interest. This approach is still central although the native speakers have been traded in for trained lexicographers and the data is more empirical due to the help of corpora. The problem was that, in addition to the project being costly and time consuming, the native speakers were not as intuitive regarding semantics than they were in the field of syntax (Dobric 2014: 59f.).

In the 1950s and 1960s, computer-based disambiguation gained more importance and systems were developed that helped with sense disambiguation such as a thesaurus, dictionaries, corpora and WordNet (Dobric 2014: 60). Despite its numerous advantages, such as speed and storage space, problems with computer based approaches exist within the quantification of a context or the objective evaluation of WSD (Dobric 2014: 63).

3.4 Valency

The valency framework resulted from the dependency grammar which was especially influenced by the works of Lucien Tesnière. Dependency centred around the relations and interdependencies of different sentence parts. These interdependencies are the result of the valency of a single word that is integrated into the sentence. The verb is considered to be the most significant and central part of a sentence and therefore the most dominant determinant of the sentence. Valency ideas have also influenced cognitive grammar and lexical-based approaches. (Faulhaber 2011: 5). Next to the verb which forms the centre of a phrase, a valency pattern consists of complements. If a complement is optional or obligatory is decided by its pragmatic value and structural necessity (Faulhaber 2011: 8f.). The occurrence of a complement is necessary to a valency pattern while all non-necessary elements are called adjuncts (Uhrig 2018: 50). Valency is a common error in foreign language learning which is why there are many valency dictionaries that exist to help avoiding common mistakes. These dictionaries provide the valency patterns, word meanings, collocational ranges as well as semantic roles. They cover the most problematic areas of language learners and are structured by the frequency and complexity of the valency patterns (Herbst et al. 2004: viif.).

There are verbs which take no valency which is called avalent. Intransitive verbs only demand one complement while transitive verbs demand two. Verbs which demand more than two complements are called ditransitive (Kroeger 2005: 69). As the term stems from chemistry, one can compare a sentence to an atom which also forms bonds in order to create certain elements such as H2o.

3.5 Polysemy

The term polysemy was coined by the historical-philological semanticist Michel Bréal (Dobric 2014: 39). Following the definition of Riemer, polysemy can be understood as “the possession by a single phonological form of several conceptually related meanings” (2010: 161). Robert proposes that “because of the absence of one-to-one relations between forms and meanings in language, linguistic units are by nature polysemous” (2008: 90). It was observed that a single concept could be expressed by a number of different words and that one word could carry a number of different meanings on its own. Although different concepts can be expressed by the same word, this does not pose a threat to human communication as language users are able to select and express their message while communicating, depending on the context of the conversation (Ravin 2002: 1).

What would become polysemy was first discussed as a plurality of meaning among philosophes such as John Locke and Gottfried Wilhelm Leibniz. This debate was taken over by linguists and investigated from different perspectives. Historical linguistics looked at polysemy as a form of lexical ambiguity which resulted from the process of diachronic semantic change. While new meanings had been established, the older ones still appeared in dictionaries. Structural semantics depicted polysemy as the opposition of homonymy which means that polysemy was simply regarded as a part of sense relation. The field of cognitive semantics found a middle ground. They did not only see the core meaning or the stored meanings of polysemy but rather interpreted them as stored senses while others were produced by the context. The presence and absence of given features, the prototype theory, increased the importance of the contemporary cognitive semantists view of polysemy. According to cognitive semantics some meanings are more central in the mental lexicon while others are less salient and/or contextual extensions. The central meanings are recognised as prototypical while the decreasing level of entrenchment regarding some words, peripheral members of a group, can be seen as vague or ambiguous (Dobric 2014: 39-43). Therefore, any analysis has to look for links and paths of semantic extensions between readings themselves (Dobric 2014: 44).

As Stephen Ullman claims, the fact that some words have more than one meaning is “the pivot of semantic analysis” (Nerlich & Clarke 2011: 3). Some linguists share the opinion that polysemy should have significant importance for any semantic study although polysemy causes certain problems in the field of linguistics. One problem is the fact that lexical ambiguity arises from homonymy and polysemy and that is not easily distinguished from vagueness. This obstacle is especially apparent in the fields of translation and lexicography (Ravin 2002: 1-4). At the present time, polysemy is regarded in a pragmatic context rather than a lexical notion. According to Dobric, meanings should be seen as the process of meaning construction through context (2014: 45). This statement is reminiscent of Geeraerts who proposed the thesis that the context is able to alter the meaning of a word (2010: 21)

There are certain strategies of sense extension such as amelioration, pejoration, specification or generalisation which indicate that polysemy does not occur randomly. Nevertheless, these elements are usually connected to semantic change and not to synchronic account of sense relations (Brugman 1984: 35).

3.6 Collocation

Collocations are frequently occurring word-pairs or phrases but the term can also refer to unusual word-pairs. (Philip 2011: 40). When studying collocations, the word which is investigated, namely the node, primes its collocates. These co-occurring words of the target word can vary between one word up to a whole phrase (Geeraerts 2010: 170). Collocations have been investigated widely, particularly by scholars such as John Firth, John Sinclair, Michael Halliday and Michael Hoey. In the 1950s, Firth gave the term collocation a key place in the linguistic sphere and Halliday and Sinclair further developed the notion in the following decade (Sinclair, Jones & Daley 2004: ix).

Meaning by collocation is an abstraction at the syntagmatic level and is not directly concerned with the conceptual or idea approach to the meaning of words. One of the meaning of night is collocability with dark, and of dark, of course, collocation with night (Firth 1951: 196).

Sinclair's analysis of natural language employed the model of extended unit of lexical meaning which consists of collocations, colligations, semantic preference and semantic prosody in order to analyse the meanings of words in context (Philip 2011: 35). While J.R. Firth considers language as a result of usage rather than a collection of words, Sinclair's model forms a network of meaning when considered together (Pace-Sigge 2013: 2).

Michael Hoey defines collocations in his Lexical Priming approach as a “psychological association between words [.] up to four words apart and is evidenced by their occurrence together [.] more often than is explicable in terms of random distribution” (Hoey 2005: 5).

His approach is centred around individual psychological notions and how language is used naturally. In this regard, his definition of collocations is therefore very fitting as Hanks is an advocate of using natural language for lexical analysis as well.

3.7 Colligation

Colligations concern the meaning of a word according to the habitual linguistic environment (Pace-Sigge 2013: 16). Additional to words and phrases, the nodes can also prime relational processes. Colligations do not only refer to grammatical constructions but to the position of the sentence, paragraph or text in which the collocation occurs which is called textual colligation (Hoey 2005: 40f.). According to Sinclair, colligation and collocation are more closely related than Firth originally suggested. Hoey on the other hand perceives that textual position and grammatical context are relevant to investigate colligation. This also implies that different word senses can be found out when looking at the grammatical aspects and not only the semantic ones (Pace-Sigge 2013: 18).

3.8 Semantic Preference

The semantic preference is determined by the semantic field which is preferred by a certain word (Pace-Sigge 2013: 5). Hoey describes this phenomenon with the phrase two- hour bus ride. As both of the collocates of the node bus are not necessarily connected in their meaning, it must be assumed that collocations are used in a wider sense of meaning, connoting a certain semantic set. This specific semantic set creates the notion of a journey (Hoey 2005: 17f.). Some semantic sets are primed locally but how these semantic associations were connected to the node is not traceable through corpus study as these associations rely on undocumented and unique experiences with language (Hoey 2005:19).

3.9 Semantic Prosody

Semantic prosody “expresses [the] attitudinal und pragmatic meaning” of a phrase (Sinclair 2004: 174). If any two lexical items have a similar pattern of collocation, they tend to appear in similar context and “will generate a cohesive force if they occur in adjacent pairs” (Halliday & Hasan 1976: 286). When evaluating the semantic set, there are always additional pragmatic meanings to be found which connote the speakers attitude (Stewart 2010: 20). The target words usually display a preference for either negative or positive connotation, though both can occur. An example would be the verb to cause which has been analysed by Michael Stubbs and shows a strong preference for negative consequences (Stewart 2010: 28).

4. Corpus Linguistics

As has been stated before, this paper employs the British National Corpus (BNC), more specifically its subsection of the NOW-Corpus in order to investigate the different meanings of the words ending with the suffix - ify. This following chapter will give a short overview of the field of corpus linguistics and the corpora, the BNC and the NOW-Corpus.

The term ‘corpus linguistics' is somewhat misleading as it describes the method of research and not its field of study. It is furthermore associated with a usage-based perspective on linguistics as corpora exclusively contain natural language. Additionally, the idea is promoted that changes in language occur when language users are communicating with each other (Lindquist 2009: 1). Although compiling corpora can be costly as well as time consuming, it is very advantageous for linguists as they can cover a high amount of material which would be impossible without technology (Lindquist 2009: 5). Furthermore, the results are efficiently presented, either in concordances or frequency figures. These keyword-in- context concordances (KWIC) help to discover how certain words are used naturally in context while the frequency figures give an idea of how common a certain phrase or word is (Lindquist 2009: 5f.). Some scholars argue for corpus-driven approaches instead of corpus- based approaches. The corpus-driven approach inspects language without preconceived ideas while the corpus-based studies inspect language under the focus of already existing theories (Lindquist 2009: 10). Some corpora are designed for a specific purpose. They can represent a certain variety of the target language or cover specific points in time regarding their medium. In comparison, other more general corpora serve the purpose of unspecified linguistic research (Kennedy 1998: 19f.).

A corpus research can either be conducted with a qualitative or a quantitative approach. While the quantitative approach centres around numbers and classifications of the tokens in order to generalise them to a larger population, the qualitative approach does not assign frequencies (McEnery 1996: 76). The qualitative approach rather concentrates on individual texts or grammatical constructions such as relative sentences (Lindquist 2009: 25). The inherent ambiguity of language is taken into consideration and thus the qualitative approach can provide precise and rich analyses of the tokens (McEnery 1996: 77). It has to be noted that a quantitative study always needs a qualitative element as there have to be certain categories from which the quantitative study can arise (Lindquist 2009: 25f.).

4.1 The British National Corpus

The British National Corpus is “the largest easily accessible linguistic corpus of British English” (Schröder 2011: 145). The BNC serves the purpose of representing how language is used naturally on the British Isles at the end of the 20th century. 90% of the BNCs' content consists of written texts while there are only 10% of spoken language (Schröder 2011: 145). Between 1990 and 1995, multiple academic and corporate instances, such as the University of Oxford and Lancaster, the Oxford University Press and the Longman Group (UK) Ltd., among others, shared their resources in order to “design, develop and annotate [the British National Corpus]” (Kennedy 1998: 50).

Subsequently, the BNC can be considered valuable source for linguistic studies that is freely accessible through the internet, which was not always the case because “[...] until 1996 its use was restricted to researchers within the European Community [...] (Kennedy 1998: 53). Hanks furthermore names it a “large and reasonably representative source” which is still, despite being over twenty years old, valid for a pattern analysis of the present-day English (Hanks 2013: 93).

4.2 The NOW Corpus

The NOW in NOW Corpus is an acronym for ‘News on Web'. The corpus contains about 6.0 billion words of data from web-based newspapers and magazines, dating back to 2010. Each day, about 4-5 million new words are added to the corpus. This is particularly beneficial as the corpus contains language from the recent past in contrast to other corpora that consist only of language gathered at a specific point in time. Although these studies are just as interesting, current uses of the target word are more relevant for instance for foreign learners of English who may experience difficulties to communicate with appropriate words in a certain context. Through the current context of their specific target word they can see how the word is properly used. It is possible to inspect keywords at a given date for instance several of the terrorist attacks that have shaken Europe in the past years. Additionally, there is the possibility to compile a corpus from the NOW data in order to focus in chosen target words in a specific time span (https://corpus.byu.edu/now/).

5. Etymology and meaning

After the preceding chapters defined the terminology and different concepts as well as the corpus of this analysis, the following chapter will concern the actual target words. The etymology will be investigated as it is important to understand the past and present meaning of the words before turning to the analysis of their derivatives.

5.1 Pretty

The Oxford English Dictionary lists the meaning of pretty as being “attractive in a delicate way without being truly beautiful” or as “pleasing to the eye” (Stevenson 2010: 1407). The word pretty was first recorded in the Old English period, first in the form of pra’Hig and conveyed the meaning of ‘cunning' or ‘crafty' (Durkin 2009: 72). From the fifteenth century onwards, there were numerous senses belonging to the word pretty, them being: “clever, skilful, able, cleverly or elegantly made or done, ingenious, artful, well- conceived, attractive and pleasing in appearance, pleasing to the senses, aesthetically pleasing, attractive or charming, considerable, sizeable” (Durkin 2009: 72). It is similar to the Dutch pertich which means humorous or sporty and has also relations to a West Germanic base meaning ‘trick'. There is a development that ranges from ‘deceitful' and ‘cunning' to ‘pleasing' and ‘nice' that is similar to adjectives such as canny, fine or nice (Stevenson 2010: 1407).

5.2 Vivid

The adjective vivid is defined as producing “powerful feelings or strong, clear images in the mind” (Stevenson 2010: 1987). When occurring with a colour, the adjective can also describe the colour as intensely deep or bright. In connection to animate beings the meaning also conveys the qualities of lively or vigorous. (Stevenson 2010:1987)

It originated from the Latin verb vivere which can be translated into English as ‘to live' (Stevenson 2010: 1987).

6. Method and Analysis

After discussing the relevant terminology and concepts as well as the etymology of the target words, this chapter will outline the method and demonstrate the findings of the analysis.

The data was selected from the NOW Corpus which was described in the previous chapter. To vivify and to prettify were chosen because they were the least frequently occurring derivatives of the suffix -ify in the corpus. The investigation of infrequent verbs can help improve the knowledge about the proper usage of these words which might be beneficial for foreign learners of English.

There are 99 examples sentences of the verb to prettify and 100 examples of to vivify and additionally they contain a marker for chronological order, the publishing date and the newspaper in which the sentence was published in. Due to the size restriction of this thesis, the information about the different newspapers will not be taken into account. The examples of this study stem from the 16th of July 2018 in order to give an overview of the different meanings of the target words until that specific point in time. Naturally, since then new examples might have possibly been added as the NOW Corpus is adding new words every day, thus making the meaning of these words yet again more complex.

At first, the meanings of the target words will be listed, modelled after the Pattern Dictionary of English Verbs. Afterwards, their semantic senses will be further analysed in the sense of Sinclair's model of extended lexical units in order to give a structured overview of all the different nuances in meaning. Special occurrences as well as findings will be explained and discussed in the following passages.

In the manner of the Pattern Dictionary of English verbs, the instances belonging to the specific patterns can be found in the appendix as well as the original order of the material. It is not incorporated into the text as this would occupy too much space.

6.1 Analysis of to prettify

Prettify displays two distinct meanings, of which one is more dominant.

Abbildung in dieser Leseprobe nicht enthalten

Example: Google is trying to prettify its Maps app on iOS.

Figure 3. to prettify - the first pattem modelled after the pattern dictionary

This first CPA model depicts the numerous complements of the target word although the succeeding complements a more diverse. As was mentioned in chapter 2.1.1, the suffix - ify always relates to an action that is ‘making the object into' what is denoted by the derivative. When no agent is available and the construction is passive, the meaning relates to ‘become to a greater extend' of what is denoted by the derivative. In the sense of pretty, this means that all objects are becoming prettier than before. In relation to a human agent, this means there is a desire for making someone or something prettier. This reveals the inherent wish of improvement of humanity. In the examples, actions are described that are able to prettify the object. This slightly differs from the pattern that a human is the preceding instance and reminds of an advice in order to achieve an aim.

As was mentioned before, the succeeding complements are numerous and depict that the instances to be prettified can be animate or inanimate. Furthermore, the instances describe exclusively physical complements. While an action can prettify one of the above-mentioned complements, no actions can be prettified which further proves that prettification relates to physical attributes that are to be optimised. The only complement which slightly differs is the ‘system'. Digital systems can be visually improved. Numerous updates for iOS and Android constantly increase the visual qualities and efficiency of a smartphone. Although these systems are not inherently physical, they are instrumentalised in order for devices, physical objects, to work more efficiently and are therefore per association related to the quality of physicality.

The verb is almost exclusively demanding two complements which makes it a transitive verb. In 3,6% of the examples (6, 26), to prettify appears as intransitive although this is such a small amount that it can be said to be an exploitation of a commonly transitive pattern.

[...]

Excerpt out of 78 pages

Details

Title
Corpus Pattern Analysis and Sense Disambiguation. A Case Study
College
University of Hannover
Grade
2,3
Author
Year
2018
Pages
78
Catalog Number
V1027140
ISBN (eBook)
9783346430281
ISBN (Book)
9783346430298
Language
English
Keywords
corpus, pattern, analysis, sense, disambiguation, case, study
Quote paper
Hannah Koch (Author), 2018, Corpus Pattern Analysis and Sense Disambiguation. A Case Study, Munich, GRIN Verlag, https://www.grin.com/document/1027140

Comments

  • No comments yet.
Look inside the ebook
Title: Corpus Pattern Analysis and Sense Disambiguation. A Case Study



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free