Register or log in at GRIN

Your e-mail-address or password is wrong
Register now
For new authors: free, easy and fast
This will be used as your user name, please specify a valid e-mail address

Lost password

Your e-mail-address or password is wrong

Request a new password
Computing Semantic Relatedness - An Overview close

Please wait

Please install the Adobe Flash Player if no e-book is displayed.

Computing Semantic Relatedness - An Overview

Scholarly Paper (Advanced Seminar), 2007, 21 Pages
Author: B.A. Niels Ott
Subject: Computer Science - Miscellaneous

Details

Category: Scholarly Paper (Advanced Seminar)
Year: 2007
Pages: 21
Grade: 1,0
Bibliography: ~ 30  Entries
Language: English
Archive No.: V91614
ISBN (E-book): 978-3-638-05182-8

File size: 274 KB

Abstract

To give a quick outlook on the paper: firstly we discuss the fundamentals required for computing semantic relatedness. After a definition of semantic relatedness as such, several example applications are presented. Then an introduction to linguistic resources used in the field is given. Secondly, we present a number of selected algorithms that compute semantic relatedness. These serve as examples for the usage of several linguistic resources such as dictionaries, lexical semantic networks and measures using the WWW. Thirdly, we discuss how the quality of methods for computing semantic relatedness can be assesed by comparison to human performance. Finally, an example study is presented.


Excerpt (computer-generated)

Computing Semantic Relatedness: An Overview

Niels Ott <niels@drni.de>

Computing Semantic Relatedness:

An Overview

Niels Ott

September 16, 2007

U

Pages:

20

Course:

ISCL MA

Course Semester:

1st

Seminar:

Text-oriented Applications of

Computational Linguistics

Semester:

Winter 2006

SemanticRelatedness.tex ­ September 16, 2007

page 1 of 20


Computing Semantic Relatedness: An Overview

Niels Ott <niels@drni.de>

Contents

1. Introduction

3

2. Fundamentals for Computing Semantic Relatedness

3

2.1. Definitions of Relatedness, Similarity, and Distance .

3

2.2. Applications of Semantic Relatedness .

4

2.3. Linguistic Resources .

5

2.3.1. Dictionaries and Thesauri .

5

2.3.2. Lexical-semantic Networks .

6

2.3.3. GermaNet .

7

3. Selected Algorithms

8

3.1. An Early Approach by Lesk: Gloss Overlaps

.

8

3.2. Rada et al: Following Paths in a Taxonomy .

9

3.3. Resnik: Following Paths and Using Information Content 10

3.4. Cilibrasi and Vitanyi: Querying Google 10

3.5. Strube and Ponzetto: Exploiting Wikipedia 11

4. Evaluation Strategy

12

4.1. Gold Standards, Human Agreement, and The Correlation Coefficient . . . 13

4.2. An Example Study 14

5. Concluding Remarks

15

A. Tables

19

A.1. Relations in GermaNet 19

A.1.1. Conceptual Relations

19

A.1.2. Lexical-semantic Relations 20

B. List of Abbreviations

20

SemanticRelatedness.tex ­ September 16, 2007

page 2 of 20


Computing Semantic Relatedness: An Overview

Niels Ott <niels@drni.de>

1. Introduction

Being humans, we have an extensive and intuitively accessible knowledge about the

words we use for describing the world. If a friend says that the dinner at the restaurant

was disgusting, we don′t need to ask whether he would describe the evening as being a

pleasant experience. When we say that we lend someone money, we know that someone

borrows money from us. The person who is sitting at the wheel in the car is the driver.

The car has parts such as tires, a trunk, and an engine. Apples can be sour or sweet.

Alcohol makes people drunk.

Trivial, isn′t it? All of this works fine because we have knowledge and our intuitive

means of using it. Computers are not as smart as we are. Even if equipped with

knowledge, they need to be programmed in a way that enables them to relate the facts

they are fed with. After that they can assist users in a better way because they now

have an understanding of how the world (or rather: its lexical representation ) works

­ at least this is what we are hoping for. Semantic Relatedness is one important

concept that allows computers to make use of knowledge.

To give a quick outlook on the paper: firstly we discuss the fundamentals required

for computing semantic relatedness. After a definition of semantic relatedness as such,

several example applications are presented. Then an introduction to linguistic resources

used in the field is given. Secondly, we present a number of selected algorithms that

compute semantic relatedness. These serve as examples for the usage of several linguistic

resources such as dictionaries, lexical semantic networks and measures using the WWW.

Thirdly, we discuss how the quality of methods for computing semantic relatedness can

be assesed by comparison to human performance. Finally, an example study is presented.

2. Fundamentals for Computing Semantic Relatedness

2.1. Definitions of Relatedness, Similarity, and Distance

The notion of Semantic Relatedness is surrounded by the terms Semantic Simi-

larity and Semantic Distance. Resnik (1995) tries to distinguish the first two of

those as follows: "Semantic similarity represents a special case of semantic relatedness:

for example, cars and gasoline would seem to be more closely related than say, cars and

bicycles, but the latter pair are certainly more similar." Lin (1998) defines semantic

similarity as being based on three intuitions: 1) Two objects are more related, the more

commonality they share. 2) Two objects are less related the more different they are. 3)

The similarity of two objects is maximal if they are identical.

A more straightforward definition is given by (Budanitsky, 1999, p. 3), who also refers

to similarity as a special case of relatedness. As the name suggests, semantic similarity

aims to express how similar two objects are. This can be defined via the relations of

hyponymy and synonymy. Hyponymy is also known as the Is-A relation. E.g. `a car Is-

A vehicle.′ Semantic relatedness extends this concept to a wider range of relations: not

only the opposites hypernymy and antonymy but also meronymy (Part-Of), entailment,

causation or simply association (See-Also) can be used to express semantic relatedness,

just to name a view.

According to Budanitsky (1999, p. 3) semantic distance represents the opposite to

both relatedness and similarity: if two words are very distant, they are at the same time

SemanticRelatedness.tex ­ September 16, 2007

page 3 of 20


Computing Semantic Relatedness: An Overview

Niels Ott <niels@drni.de>

neither related nor similar.

Throughout the presented paper we use the term Semantic Relatedness according

to the definition given above. Exceptions to this commitment are made where necessary,

e.g. in the algorithm section (3) for those methods that do not yield a relatedness

measure.

It is important to mention that semantic relatedness is meant to exist between objects,

or rather between the word senses that describe objects. The relatedness of washer and

glas may be high for a dish washer. However, it is rather low if washer is supposed to

refer to a distance disk. Therefore, we stick to the terms Word Sense or Concept in

this paper, which we use exchangeably for most of the times. However, in a more strict

view, a concept can be expressed by numerous word senses. In other words, a concept

is represented by set of one or more word senses.

2.2. Applications of Semantic Relatedness

A rather profane statement: semantic relatedness can be used everywhere in natural

language processing (NLP) where the meaning of words becomes relevant.

One of the countless research areas that can strongly benefit from semantic relatedness

is word-sense disambiguation (WSD). As humans we are not confused by the word organ

meaning both a music instrument and a body part. We intuitively know which is which

in phrases like Jon Lord plays the organ or the surgeon transplants the organ.

According to Stevenson and Wilks (2003, p. 252), Lesk (1986) was the first to present

an approach WSD that made use of a machine-readable dictionary. We consider it

also the first approach that made use of any kind of knowledge or linguistic resource

(more on resources in section 3). Lesk (1986) furthermore pioneered another concept

which is nowadays basic to many WSD approaches: the context of the word in question

is analyzed in order to obtain clues about the actual sense of the word. A detailed

description of his work is given in section 3.1. Most WSD systems make use of semantic

relatedness or similarity to solve their task (Vossen, 2003, p. 476). Considering the

example above, the relatedness of organ[instrument] and transplants is likely to be much

lower than the one of organ[body part] and transplants.

Learners of foreign languages know that translation can be a tricky task. Sticking to

the organ example shows that there are (at least) two words in German for it: Orgel

refers to organ[instrument] and Organ to organ[body part]. The other direction exists as well:

Germans go to Himmel (`Heaven′) when they die, but it is also the Himmel (`sky′) that

can be blue. It does not astonish that machine translation (MT) was one of the first

fields of NLP to come across the WSD issue around fifty years ago (Stevenson and Wilks,

2003, p. 250).

Being situated somewhere in between information retrieval (IR) and WSD, Kruse

et al. (2005) present a wrapper around the Google search engine. Their initial example

is that querying for Java returns many hits on programming but none about the island

by that name or the coffee that is produced there. They present a system that looks up

the query word in a semantic resource and then asks the user to choose a word sense

that he or she might have had in mind. After that, the query to Google is expanded

by relevant context words in order to produce more accurate results. Alternatively, the

query is submitted unchanged but the results from Google are ranked according to their

relevance for the given word sense instead of the given word. They claim that their so

SemanticRelatedness.tex ­ September 16, 2007

page 4 of 20


Computing Semantic Relatedness: An Overview

Niels Ott <niels@drni.de>

Entry Word:

desire

Function:

noun

Text:

a strong wish for something <a desire for adventure and excitement prompted

him to travel to Africa>

Synonyms:

appetite, craving, drive, hankering, hunger, itch, longing, lust, passion, pining,

thirst, urge, yearning, yen

Related Words:

compulsion, impulse, urge, will, zeal; liking, love, taste; eagerness, impatience;

wish, want; necessity, need, requirement; avarice, cupidity, greed, rapacity

Near Antonyms:

abhorrence, aversion, disfavor, disgust, dislike, distaste, hatred, repugnance, re-

pulsion; apathy, indifference, unconcern

Figure 1: Merriam-Webster Thesaurus entry for desire

called pre-filter shows promising results.

The existence of the Google wrapper leads the following question: should the people

at search engine companies develop something smarter? M¨

uller and Gurevych (2006)

report that many attempts to make IR systems more intelligent have had limited success

so far. The general assumption that the use of lexical-semantic information improves IR

systems does not hold. The authors use semantic relatedness as a means to judge how

relevant a document is to the query. In other words, their system does not look up query

words in the investigated documents but it measures the relatedness of the words in the

documents and the query words. They conclude that the use of semantic relatedness in

IR "has the potential to outperform a traditional bag-of-words approach [...]".

2.3. Linguistic Resources

2.3.1. Dictionaries and Thesauri

"Dictionaries are perhaps the resource most readily associated with linguistic knowledge

in people′s minds.", Budanitsky (1999, p. 5) writes. However, only a few sentences later

in his work it comes clear that there is more to it. A Dictionary consists out of words

that are defined by other words. Or: headwords are defined by other headwords plus

maybe some stopwords or other symbols. There are other kinds of dictionaries as well.

Wikipedia1 names bilingual dictionaries and such of specialized domain coverage and

even glossaries. The "Dictionary of Linguistics and Phonetics" (Crystal, 2003) merely

consists out of headwords associated with text interwoven with other headwords, which

can be called an encyclopedic dictionary.2 If not stated otherwise, we refer to a set of

1 : n relations from headwords to other headwords when speaking of a Dictionary in

this paper.

This set of relations allows the construction of a simple Network (Budanitsky, 1999,

p. 6). While this is not a traditional view on dictionaries, it is important to semantic

relatedness algorithms. Moreover, it would be good to know about the nature of those

relations or edges in that network. The next more precise resource with that concern is

the Thesaurus. An example from a modern thesaurus by Merriam-Webster3 is shown

in figure 1. While this thesaurus also has components of a dictionary, namely a line with

an explanation and an example of the word, the main part of the entry looks rather

1 http://en.wikipedia.org/wiki/Dictionary ­ August 11, 2007

2 http://en.wikipedia.org/wiki/Encyclopedic_dictionary ­ August 11, 2007

3 Taken from the online thesaurus at http://www.merriam-webster.com/.

SemanticRelatedness.tex ­ September 16, 2007

page 5 of 20



Comments

No comments yet

Add Comment
Your comment is reviewed before being published

Other users also were interested in the following titles:

Erstellen einer schriftlichen Hausarbeit

Author: Claudia Nickel
Presentations, Models, Tutorials, Instructions, 2006 Download as PDF-file for 4,99 EUR

Grundtechniken wissenschaftlichen Arbeitens

Author: Maik Philipp
Presentations, Models, Tutorials, Instructions, 2004 Download as PDF-file for 5,99 EUR

This text can be quoted and accessed from this url:

http://www.grin.com/e-book/91614/computing-semantic-relatedness-an-overview
please wait Please wait