Please wait
Please install the Adobe Flash Player if no e-book is displayed.
Scholarly Paper (Advanced Seminar), 2007, 21 Pages
Author: B.A. Niels Ott
Subject: Computer Science - Miscellaneous
Details
Institution/College: University of Tubingen (Seminar für Sprachwissenschaft)
Tags: Computing, Semantic, Relatedness, Overview, Text-oriented, Applications, Computational, Linguistics, wordnet, germanet, lexical-semantic networks, semantische netze, wikipedia, rada, lesk, resnik, cilibrasi, vitanyi, google, strube, ponzetto, dictionary, thesaurus
Year: 2007
Pages: 21
Grade: 1,0
Bibliography: ~ 30 Entries
Language: English
ISBN (E-book): 978-3-638-05182-8
File size: 274 KB
Other users also were interested in the following titles:
Abstract
To give a quick outlook on the paper: firstly we discuss the fundamentals required for computing semantic relatedness. After a definition of semantic relatedness as such, several example applications are presented. Then an introduction to linguistic resources used in the field is given. Secondly, we present a number of selected algorithms that compute semantic relatedness. These serve as examples for the usage of several linguistic resources such as dictionaries, lexical semantic networks and measures using the WWW. Thirdly, we discuss how the quality of methods for computing semantic relatedness can be assesed by comparison to human performance. Finally, an example study is presented.
Excerpt (computer-generated)
Computing Semantic Relatedness: An Overview
Niels Ott <niels@drni.de>
Computing Semantic Relatedness:
An Overview
Niels Ott
September 16, 2007
U
Pages:
20
Course:
ISCL MA
Course Semester:
1st
Seminar:
Text-oriented Applications of
Computational Linguistics
Semester:
Winter 2006
SemanticRelatedness.tex September 16, 2007
page 1 of 20
Computing Semantic Relatedness: An Overview
Niels Ott <niels@drni.de>
Contents
1. Introduction
3
2. Fundamentals for Computing Semantic Relatedness
3
2.1. Definitions of Relatedness, Similarity, and Distance .
3
2.2. Applications of Semantic Relatedness .
4
2.3. Linguistic Resources .
5
2.3.1. Dictionaries and Thesauri .
5
2.3.2. Lexical-semantic Networks .
6
2.3.3. GermaNet .
7
3. Selected Algorithms
8
3.1. An Early Approach by Lesk: Gloss Overlaps
.
8
3.2. Rada et al: Following Paths in a Taxonomy .
9
3.3. Resnik: Following Paths and Using Information Content 10
3.4. Cilibrasi and Vitanyi: Querying Google 10
3.5. Strube and Ponzetto: Exploiting Wikipedia 11
4. Evaluation Strategy
12
4.1. Gold Standards, Human Agreement, and The Correlation Coefficient . . . 13
4.2. An Example Study 14
5. Concluding Remarks
15
A. Tables
19
A.1. Relations in GermaNet 19
A.1.1. Conceptual Relations
19
A.1.2. Lexical-semantic Relations 20
B. List of Abbreviations
20
SemanticRelatedness.tex September 16, 2007
page 2 of 20
Computing Semantic Relatedness: An Overview
Niels Ott <niels@drni.de>
1. Introduction
Being humans, we have an extensive and intuitively accessible knowledge about the
words we use for describing the world. If a friend says that the dinner at the restaurant
was disgusting, we don′t need to ask whether he would describe the evening as being a
pleasant experience. When we say that we lend someone money, we know that someone
borrows money from us. The person who is sitting at the wheel in the car is the driver.
The car has parts such as tires, a trunk, and an engine. Apples can be sour or sweet.
Alcohol makes people drunk.
Trivial, isn′t it? All of this works fine because we have knowledge and our intuitive
means of using it. Computers are not as smart as we are. Even if equipped with
knowledge, they need to be programmed in a way that enables them to relate the facts
they are fed with. After that they can assist users in a better way because they now
have an understanding of how the world (or rather: its lexical representation ) works
at least this is what we are hoping for. Semantic Relatedness is one important
concept that allows computers to make use of knowledge.
To give a quick outlook on the paper: firstly we discuss the fundamentals required
for computing semantic relatedness. After a definition of semantic relatedness as such,
several example applications are presented. Then an introduction to linguistic resources
used in the field is given. Secondly, we present a number of selected algorithms that
compute semantic relatedness. These serve as examples for the usage of several linguistic
resources such as dictionaries, lexical semantic networks and measures using the WWW.
Thirdly, we discuss how the quality of methods for computing semantic relatedness can
be assesed by comparison to human performance. Finally, an example study is presented.
2. Fundamentals for Computing Semantic Relatedness
2.1. Definitions of Relatedness, Similarity, and Distance
The notion of Semantic Relatedness is surrounded by the terms Semantic Simi-
larity and Semantic Distance. Resnik (1995) tries to distinguish the first two of
those as follows: "Semantic similarity represents a special case of semantic relatedness:
for example, cars and gasoline would seem to be more closely related than say, cars and
bicycles, but the latter pair are certainly more similar." Lin (1998) defines semantic
similarity as being based on three intuitions: 1) Two objects are more related, the more
commonality they share. 2) Two objects are less related the more different they are. 3)
The similarity of two objects is maximal if they are identical.
A more straightforward definition is given by (Budanitsky, 1999, p. 3), who also refers
to similarity as a special case of relatedness. As the name suggests, semantic similarity
aims to express how similar two objects are. This can be defined via the relations of
hyponymy and synonymy. Hyponymy is also known as the Is-A relation. E.g. `a car Is-
A vehicle.′ Semantic relatedness extends this concept to a wider range of relations: not
only the opposites hypernymy and antonymy but also meronymy (Part-Of), entailment,
causation or simply association (See-Also) can be used to express semantic relatedness,
just to name a view.
According to Budanitsky (1999, p. 3) semantic distance represents the opposite to
both relatedness and similarity: if two words are very distant, they are at the same time
SemanticRelatedness.tex September 16, 2007
page 3 of 20
Computing Semantic Relatedness: An Overview
Niels Ott <niels@drni.de>
neither related nor similar.
Throughout the presented paper we use the term Semantic Relatedness according
to the definition given above. Exceptions to this commitment are made where necessary,
e.g. in the algorithm section (3) for those methods that do not yield a relatedness
measure.
It is important to mention that semantic relatedness is meant to exist between objects,
or rather between the word senses that describe objects. The relatedness of washer and
glas may be high for a dish washer. However, it is rather low if washer is supposed to
refer to a distance disk. Therefore, we stick to the terms Word Sense or Concept in
this paper, which we use exchangeably for most of the times. However, in a more strict
view, a concept can be expressed by numerous word senses. In other words, a concept
is represented by set of one or more word senses.
2.2. Applications of Semantic Relatedness
A rather profane statement: semantic relatedness can be used everywhere in natural
language processing (NLP) where the meaning of words becomes relevant.
One of the countless research areas that can strongly benefit from semantic relatedness
is word-sense disambiguation (WSD). As humans we are not confused by the word organ
meaning both a music instrument and a body part. We intuitively know which is which
in phrases like Jon Lord plays the organ or the surgeon transplants the organ.
According to Stevenson and Wilks (2003, p. 252), Lesk (1986) was the first to present
an approach WSD that made use of a machine-readable dictionary. We consider it
also the first approach that made use of any kind of knowledge or linguistic resource
(more on resources in section 3). Lesk (1986) furthermore pioneered another concept
which is nowadays basic to many WSD approaches: the context of the word in question
is analyzed in order to obtain clues about the actual sense of the word. A detailed
description of his work is given in section 3.1. Most WSD systems make use of semantic
relatedness or similarity to solve their task (Vossen, 2003, p. 476). Considering the
example above, the relatedness of organ[instrument] and transplants is likely to be much
lower than the one of organ[body part] and transplants.
Learners of foreign languages know that translation can be a tricky task. Sticking to
the organ example shows that there are (at least) two words in German for it: Orgel
refers to organ[instrument] and Organ to organ[body part]. The other direction exists as well:
Germans go to Himmel (`Heaven′) when they die, but it is also the Himmel (`sky′) that
can be blue. It does not astonish that machine translation (MT) was one of the first
fields of NLP to come across the WSD issue around fifty years ago (Stevenson and Wilks,
2003, p. 250).
Being situated somewhere in between information retrieval (IR) and WSD, Kruse
et al. (2005) present a wrapper around the Google search engine. Their initial example
is that querying for Java returns many hits on programming but none about the island
by that name or the coffee that is produced there. They present a system that looks up
the query word in a semantic resource and then asks the user to choose a word sense
that he or she might have had in mind. After that, the query to Google is expanded
by relevant context words in order to produce more accurate results. Alternatively, the
query is submitted unchanged but the results from Google are ranked according to their
relevance for the given word sense instead of the given word. They claim that their so
SemanticRelatedness.tex September 16, 2007
page 4 of 20
Computing Semantic Relatedness: An Overview
Niels Ott <niels@drni.de>
Entry Word:
desire
Function:
noun
Text:
a strong wish for something <a desire for adventure and excitement prompted
him to travel to Africa>
Synonyms:
appetite, craving, drive, hankering, hunger, itch, longing, lust, passion, pining,
thirst, urge, yearning, yen
Related Words:
compulsion, impulse, urge, will, zeal; liking, love, taste; eagerness, impatience;
wish, want; necessity, need, requirement; avarice, cupidity, greed, rapacity
Near Antonyms:
abhorrence, aversion, disfavor, disgust, dislike, distaste, hatred, repugnance, re-
pulsion; apathy, indifference, unconcern
Figure 1: Merriam-Webster Thesaurus entry for desire
called pre-filter shows promising results.
The existence of the Google wrapper leads the following question: should the people
at search engine companies develop something smarter? M¨
uller and Gurevych (2006)
report that many attempts to make IR systems more intelligent have had limited success
so far. The general assumption that the use of lexical-semantic information improves IR
systems does not hold. The authors use semantic relatedness as a means to judge how
relevant a document is to the query. In other words, their system does not look up query
words in the investigated documents but it measures the relatedness of the words in the
documents and the query words. They conclude that the use of semantic relatedness in
IR "has the potential to outperform a traditional bag-of-words approach [...]".
2.3. Linguistic Resources
2.3.1. Dictionaries and Thesauri
"Dictionaries are perhaps the resource most readily associated with linguistic knowledge
in people′s minds.", Budanitsky (1999, p. 5) writes. However, only a few sentences later
in his work it comes clear that there is more to it. A Dictionary consists out of words
that are defined by other words. Or: headwords are defined by other headwords plus
maybe some stopwords or other symbols. There are other kinds of dictionaries as well.
Wikipedia1 names bilingual dictionaries and such of specialized domain coverage and
even glossaries. The "Dictionary of Linguistics and Phonetics" (Crystal, 2003) merely
consists out of headwords associated with text interwoven with other headwords, which
can be called an encyclopedic dictionary.2 If not stated otherwise, we refer to a set of
1 : n relations from headwords to other headwords when speaking of a Dictionary in
this paper.
This set of relations allows the construction of a simple Network (Budanitsky, 1999,
p. 6). While this is not a traditional view on dictionaries, it is important to semantic
relatedness algorithms. Moreover, it would be good to know about the nature of those
relations or edges in that network. The next more precise resource with that concern is
the Thesaurus. An example from a modern thesaurus by Merriam-Webster3 is shown
in figure 1. While this thesaurus also has components of a dictionary, namely a line with
an explanation and an example of the word, the main part of the entry looks rather
1 http://en.wikipedia.org/wiki/Dictionary August 11, 2007
2 http://en.wikipedia.org/wiki/Encyclopedic_dictionary August 11, 2007
3 Taken from the online thesaurus at http://www.merriam-webster.com/.
SemanticRelatedness.tex September 16, 2007
page 5 of 20
Comments
No comments yet
Other users also were interested in the following titles:
Formatvorlage / Vorlage für eine Diplomarbeit - Formatvorlage / Vorlage für eine Hausarbeit für Microsoft Word
Author: GRIN VerlagPresentations, Models, Tutorials, Instructions, 2005 Download as PDF-file for 6,99 EUR
Formatvorlage / Vorlage für eine Diplomarbeit - Formatvorlage / Vorlage für eine Hausarbeit für OpenOffice.org
Author: GRIN VerlagPresentations, Models, Tutorials, Instructions, 2005 Download as PDF-file for 9,99 EUR
Formatvorlage zur Erstellung einer Diplomarbeit / Vorlage zur Erstellung einer Hausarbeit
Author: Marco FeindlerPresentations, Models, Tutorials, Instructions, 2005 Download as PDF-file for 6,99 EUR
Formatvorlage / Vorlage für eine Diplomarbeit / Hausarbeit
Author: GRIN VerlagPresentations, Models, Tutorials, Instructions, 2008 Download as PDF-file for 6,99 EUR
Anleitung zum Erstellen schriftlicher Arbeiten: Der Aufbau einer wissenschaftlichen Arbeit
Author: Zoran ZivkovicPresentations, Models, Tutorials, Instructions, 2004 Download as PDF-file for 5,99 EUR
Erstellen einer schriftlichen Hausarbeit
Author: Claudia NickelPresentations, Models, Tutorials, Instructions, 2006 Download as PDF-file for 4,99 EUR
Grundtechniken wissenschaftlichen Arbeitens
Author: Maik PhilippPresentations, Models, Tutorials, Instructions, 2004 Download as PDF-file for 5,99 EUR
Ratgeber zur Erstellung wissenschaftlicher Arbeiten. Diplomarbeiten - Hausarbeiten - Seminararbeiten
Author: Mark RichterPresentations, Models, Tutorials, Instructions, 2008
This text can be quoted and accessed from this url: