Ontology-Based Answer Selection in Dialog Systems

Thesis (M.A.), 2006

135 Pages, Grade: 1,1



List of Figures

List of Tables


1 Introduction
1.1 Objective and Research Questions
1.1.1 Project Goal
1.2 Structure of this Work

I General Linguistics and Research Background
2 Linguistics Background
2.1 Human Computer Interaction
2.2 Natural Language Processing
2.3 Dialog and Discourse
2.3.1 Dialog as Discourse
2.3.2 Dialog as a Teleological Activity
2.3.3 Dialog as a Collaborative/Cooperative Activity
2.3.4 Speech Utterances in a Dialog
2.4 Dialog Systems
2.4.1 Evolution
2.4.2 Field of Application
2.4.3 Architecture and Components
2.4.4 Dialog Management and Important Strategies
2.4.5 Design Criteria
2.5 Metadata and Knowledge Representation
3 Semantic Web, SmartWeb, and Ontologies
3.1 Semantic Web
3.2 SmartWeb
3.3 Semantic Similarity Measures
3.4 Ontology
3.4.1 General
3.4.2 Ontologies as a Base of Modeling
3.4.3 Ontologies for Communication
3.4.4 Ontologies as Conceptual Models
3.4.5 Important Aspects/Advantages
3.4.6 RDF and RDFS
3.4.7 Ontology-Languages
3.4.8 Challenges
4 State of the Art of Answer Selection
4.1 Question Answering
4.2 Answer Selection

II Selecting Answers with Ontologies
5 Underlying Algorithms and Corpora
5.1 OntoScore
5.1.1 OntoScore Algorithm
5.2 AnswerScore
5.3 Agents and WrapperAgents
5.4 Jena
5.4.1 Jena API
5.5 Used Ontology Technology
5.5.1 EMMA
5.5.2 DOLCE
5.5.3 SUMO
5.5.4 Sport Event
5.5.5 SWIntO
6 Evaluating Semantic Relatedness for Answer Selection
6.1 OntoScore Study
6.1.1 OntoScore Analysis Part1
6.1.2 Evaluation Score
6.1.3 OntoScore Analysis Part 2
6.2 Conclusion
7 Semantic Density Approach
7.1 ProperScore
7.1.1 General Idea
7.1.2 ProperScore Algorithm
7.2 Implementation
7.3 Results
8 Conclusion
8.1 Conclusion
A Appendix A
A.1 First Data Set
A.1.1 Question Number One
A.1.2 Answer Number One
A.2 Second Data Set
A.2.1 Question Number One
A.2.2 Answer Number One
B Appendix B
B.1 Protégé Screenshots
B.1.1 Sport Event
B.1.2 SWIntO

List of Figures

2.1 Serial architecture of a dialog system

2.2 Integrated architecture of a dialog system

3.1 Architecture of the Semantic Web [from Berners-Lee [5]]

3.2 Draft of Relevant Parts of the SmartWeb System

3.3 Draft of the Answer Candidate Selection Challenge

5.1 Taxonomy of DOLCE basic categories [from Oberle et al. [40]]

5.2 Taxonomy of SUMO [from Oberle et al. [40]]

5.3 Overview of SWIntO and SmartSUMO as UML package diagram [from Oberle et al.40 ]

6.1 Chart of the OntoScore Original Approach

6.2 Chart of the OntoScore OS1 Approach

6.3 Chart of the OntoScore OS1a Approach

6.4 Chart of the OntoScore OS1b Approach

6.5 Chart of the OntoScore OS1c Approach

6.6 Chart of the OntoScore OS2 Approach

6.7 Chart of the OntoScore OS2a Approach

6.8 Chart of the OntoScore OS2b Approach

6.9 Chart of the OntoScore OS1a Approach (Part 2)

6.10 Chart of the OntoScore OS1b Approach (Part 2)

6.11 Chart of the OntoScore OS1c Approach (Part 2)

6.12 Chart of the OntoScore OS2 Approach (Part 2)

6.13 Chart of the OntoScore OS2a Approach (Part 2)

6.14 Chart of the OntoScore OS2b Approach (Part 2)

7.1 Chart of the PropertyScore Approach (First Data Set)

7.2 Chart of the PropertyScore Approach (Second Data Set)

7.3 Chart of the InstanceScore Approach (First Data Set)

7.4 Chart of the StatementScore Approach (First Data Set)

7.5 Chart of the ProperScore Approach (First Data Set)

B.1 Protégé Screenshot of the Sport Event Ontology

B.2 Protégé Screenshot of the SWIntO Ontology

List of Tables

3.1 Improved N3 Example of the Given Statement

6.1 Analysis of OntoScore: Overview

6.2 Analysis of OntoScore

6.3 Analysis of OntoScore 1

6.4 Analysis of OntoScore 1a

6.5 Analysis of OntoScore 1b

6.6 Analysis of OntoScore 1c

6.7 Analysis of OntoScore 2

6.8 Analysis of OntoScore 2a

6.9 Analysis of OntoScore 2b

6.10 Analysis of OntoScore Ranking

6.11 Example Table of the Evaluation Ranking Score

7.1 Overview of the Final Evaluation Scores (ES)


irst I thank Hans-Peter Zorn for his excellent supervision of this thesis. His in- spirations, comments, and suggestions were of invaluable worth and contributed greatly to this thesis.

My special thanks are devoted to Vanessa Micelli for her support and advices, our discussions, and her priceless proofreading. Without her help this thesis would not look like it does.

I want to extend my thanks and appreciation to PD. Dr. Karin Haenelt for the acceptance of this thesis and her great advising. Further, I wish to thank Prof. Dr. Kurt Eberle for the acceptance of the secondary review.

I would also like to thank Robert Porzel for the many fruitful discussions we had during that time and Hidir Aras for providing me data and ideas. Furthermore, I want to thank the European Media Laboratory GmbH for providing technical devices and research background for this thesis. In this latter view, I thank all EML staff members that have supported the completion of my thesis.

Finally, my warmest and most sincere thanks go to my family for their endless encouragement, patience, and love. Moreover, without their implicit trust in me, their help, and their support throughout the past years, this thesis would probably not exist.


In these days our daily life is more and more affected by computers, chips, electronic equipment, et cetera. Owing to this increase of technology, even in common everyday objects, like for example a fridge, it is necessary to find a simple and intuitive way to interact with complex technology. Natural language dialog systems could be the solu- tion, or at least a part of it, how humans and machines could communicate or interact with each other. With the help of dialog systems people could access information and technical functionality of computers in a natural way using linguistic in- and output. One of the main tasks of such dialog systems is to provide fast and appropriate answers to user’s questions or requests. The challenge, therein, is how do we find these answers out of the flood of information.

1.1 Objective and Research Questions

In this thesis an ontology-based answer selection algorithm for dialog systems is de- veloped, implemented and evaluated. The main focus of this work in particular is the ’semantic coherence’ calculation and ranking of semantically represented queries and answers. Since this latter view personate a broadly based field of research, the interest is restricted at first to only one domain. Therein, the existing techniques are tested for their effectiveness and performance. Beside this, a new algorithm respectively an im- provement is developed and implemented. With the help of this algorithm the semantic information as well as the knowledge of this ’coherence analysis’ of the query and the valuation of the obtained recall of the database are taken into account for a better rating of the possible answer hypothesis. These could contribute to a better answer generation and a better and more specific reply. Then this answer could persuade the users of a better speech understanding and as a consequence this would increase their satisfaction and would result in more confidence in the system.

1.1.1 Project Goal

One important task of a dialog system in general and question answer systems in particular is to provide users with optimal answers. The driving force of users to use a system or not depends on the usage and advantage they are able to gain through. For the generation of good answers one has to determine the best possible answer hypothesis for a question from the web, a database or other information sources.

Nowadays, it is not trivial to detect the correct answer for a corresponding question. For this reason, the aim of this thesis is the development of an acceptable algorithm for ontology-based answer selection in dialog systems. Scope

The scope of this thesis is a description of the obtained results of my investigation about answer selection and as a consequence the improved algorithm respectively system component for one exemplary dialog system. My analysis are confined to the state of the art of ontology-based answer selection and one concrete application. I have limited my analysis to the system used in the SmartWeb project at the European Media Laboratory GmbH (EML), since the complexity of approaching an universal valid algorithm for all dialog systems is too high. Furthermore, I confined my study to the answer selection component and use the implemented functionality by the SmartWeb dialog system, that is necessary for answering questions. In the SmartWeb dialog system an algorithm, called OntoScore, is already used for semantic coherence and semantic relatedness analyses. Therefore, the following two research questions build the scope of this thesis:

1. Is it possible to select the appropriate answer to a given question with the On- toScore algorithm?

2. If the OntoScore algorithm is not able to do this, is there an algorithm that enables answer selection?

1.2 Structure of this Work

This work is structured in two main Parts. The first Part: ’General Linguistics and Research Background’ will introduce the background of this thesis and elucidates some selected concepts and terms and the state of the art of ontology-based answer selection. Therefore, the following Chapter 2 enlightens the general linguistics background and, therewith, also the background of research. This second Chapter is an introduction to human computer interaction, natural language processing, dialog, and discourse, which is necessary to understand the concept of dialog systems. With this introduction the reader’s attention is also called to the challenges of dialog systems and human computer interaction in a ’discourse’ manner. Beside this, the idea of metadata and knowledge representation is briefly explained.

The next Chapter 3 is more ore less an extension of Chapter 2, but the topics Semantic Web, SmartWeb, and ontologies deserve a special and more precise description, because they are not only belonging to the linguistically background. Furthermore, the state of the art of semantic measures is elucidated.

In Chapter 4 I discuss the state of the art of answer selection. Beside the state of the art I also clarify the term ’answer selection’, the difference in the usages and define this term understanding for the present work. Another aspect of this Chapter is an ex- planation of the title of this work: “Ontology-based answer selection in dialog systems”.

In the second Part of this work, called ’Selecting Answers with Ontologies’, the used corpora and technologies, the analysis and implementation of my approach for answer selection is introduced. At first you can find an explanation of the used algorithms On- toScore and AnswerScore at a current project at the European Media Laboratory GmbH (EML) in Chapter 5. Then the concept of wrapper agents and the Jena framework are introduced. Finally, I elucidate EMMA and the used ontologies: DOLCE, SUMO, Sport Event, and SWIntO.

Chapter 6 introduces my evaluation of the semantic relatedness in answer selection. Therein, both parts of my OntoScore study are explained. Furthermore, my EvaluationScore, a ranking score is introduced. At the end of this Chapter the results of the study are presented.

These results caused the idea and implementation of my new approach in Chapter 7. Therein, I describe my semantic density approach, called ProperScore and the implementation of a statement slot-filler for answer selection. Beside this, the overall results are represented at the end of this chapter, that show the ranking scores for the all implemented approaches.

In Chapter 8 in summarize the findings of my thesis. After the short recapitulation, my results are roughly presented and the future work is elucidated.

The following list presents a short overview about all Chapters in this work:

Chapter 1: The first Chapter of this work give the reader an introduction into the topic, purpose and aim of this Magister thesis.

Part I: General linguistic and research background

Chapter 2: In the second Chapter an overview about the general linguis- tics background is given, such as human computer interaction, natural language processing, dialog, dialog systems, and metadata.

Chapter 3: The third Chapter discuss briefly the Semantic Web, the SmartWeb, semantic similarity measures, ontologies, and RDF/RDFS.

Chapter 4: Herein the state of the art of question answering and answer selection is described.

Part II: Selecting Answers with Ontologies

Chapter 5: In Chapter five the reader is given more information about the used data, algorithms, and ontologies in between the SmartWeb project in general and the answer selection part of the dialog system in particular. Furthermore, the Jena framework is introduced.

Chapter 6: The sixth Chapter is a description of my OntoScore algorithm study. Two OntoScore analyses are elucidated and the used evaluation score is introduced, in which the semantic relatedness is evaluated. The results of these analyses is also explained, herein.

Chapter 7: Chapter seven introduces to my new semantic density approach, which is called ProperScore. Therein, the idea, the implementation and evaluation are described.

Chapter 8: In the final Chapter eight a recapitulation of this thesis is given. Furthermore, a summarization presents all obtained results and give a short outlook of future research.

Appendix A: Appendix A shows exemplary question-answer pairs of the first and the second data set.

Appendix B: In Appendix B screenshots of the Sport Event and SWIntO ontology are illustrated with the help of the Protégé-tool.

Appendix C: The last Appendix C presents the contents of the included CD-ROM of this thesis.

Part I General Linguistics and Research Background

Chapter 2 Linguistics Background

Part I is an introduction of the general linguistic and research background of the present work.

This Chapter 2 gives the reader a brief introduction in relevant research topics and concepts. At first some general information about human computer interaction is pro- vided, which leads to the introduction of natural language processing. Hereafter, dialog and discourse (including discourse definitions, natural speech and speech acts) are in- troduced. Therewith, the reader is given a short view about the challenges a dialog system has to deal with. The following description of dialog systems briefly presents the essential parts of dialog systems, such as evolution, field of application, architecture and design criteria. Then, in the dialog system section is also presented where my research interest could be located in a dialog system. At the end of this section metadata and knowledge representation is described to understand the further work, and above all, to understand the ontology and RDF introduction in Chapter 3 better.

2.1 Human Computer Interaction

As the name is already telling human computer interaction (HCI) is about any interactions between human and computers.

One of the main focuses of this field of research is to find an intuitive and very easy way that guarantees a successful interaction. Common approaches are interfaces with which the user can communicate with the system, that is the computer.

Human are used to interact with each other via natural language. Of course, they use other ways of communication as for example gestures, facial expressions, et cetera, but speech is one of the most intuitive and at the same time complex forms of interaction.[Souvignier et al. [50]]

Although HCI is still developing methods that can improve interactions with either graphical respectively traditional (textual) or spoken input, multi-modal interfaces be- come more and more important. For a successful usage of these interfaces an included natural language interface (NLI) is more or less inevitable, according to Androutsopou- los et al.1.

Referring to Odgen and Bernick41 human already have elaborate communication skills, because of their own native or natural language. Therefore, there are many who believe that NLIs “can provide the most useful and efficient way for people to interact with computers.” [Odgen and Bernick41, page 137]

“The view that interactive sessions with NLIs are instances of cooperative problem solving behavior offers a more useful perspective not only on interaction with a database but on human-machine interaction in general.” [Perault and Grosz [42], page77]

But Odgen and Bernick41 already realise that with former NLIs there are the same problems as with all other interfaces. One reason could be the state of such systems but another reason is also the expection of the user. The better the NLIs become the more the user expect like common world knowledge and inferences etc.

“The difficulty is that to expand the front end so that it could answer some of these questions would give a misleading impression to the user about the capabilities of the system as a whole.” [Copestack and Sparck-Jones [13], page 230]

A still ongoing research is the question if a computer is experienced as a social object in the human computer communication.

As I mentioned before, one of the essential parts of the human computer interaction is the usage of natural language. One kind of transferring natural language in computer understandable forms is natural language processing which will be explained in the next section.

2.2 Natural Language Processing

As a subfield of linguistic and artificial intelligence (AI) natural language processing (NLP) is about the transformation and handling of natural language data that both human and computer systems can understand it and work with it in accordance to Barr and Feigenbaum3.

The biggest challenge in NLP is natural language understanding (NLU), which is still not sufficiently defined and, therefore, an ongoing topic in the today’s research work. Beside NLU there are several other tasks that belong to NLP such as automatic summarization, automatic translation, information retrieval, information extraction, machine translation, natural language generation, speech recognition, text-to-speech transformation, text-proofing and also question answering.

“There are, however, a number of challenges for speech technology, in particluar, the issue of robust speech recognition in noisy environments such as a car, requiring techniques such as echo cancellation noise subtraction and other noise reduction techniques.”[McTear [37], page 10]

An important part of NLP in general and NLU in particular is the analysis of dialogs and discourses which is described in the following section.

2.3 Dialog and Discourse

To clarify what a dialog system is, it is necessary to define the notion dialog.

“Dialogue is at the same time the most fundamental and broadly used form of language, as well as the most complex one. And most of all, dialogue is the most natural medium of communication for human beings.” [Tsovaltzi et al. [56], page1]

In accordance with McTear37 dialog in the speech of everyday life is a process with the opinion or meaning exchange of different sides, mostly purpose driven to solve problems or disagreements. As a rule two or more speech partner in the majority of cases humanly take part in a dialog.

Of course there are various kinds of a dialog. A dialog, which is mainly of informal nature with the aim to cherish and establish social contacts, is often called “conversa- tion” in the literature. Thereby, both terms “conversation” and “dialog” are oftentimes used as synonyms. Referring to this, systems with humanly conversation competences are denominated advanced dialog systems. Research of dialog is still a large and field overlapping area. According to McTear[1] 37 the four most important parts about dialogs are the following ones: dialog as a discourse, as a teleological activity, as a collaborating, cooperating activity and speech utterances in a dialog.

2.3.1 Dialog as Discourse

A discourse consists of at least two turns, one of each speaker. Thereby a coherent discourse could show similar discourse or speech phenomenons, analog to text coherence, like for instance anaphora, definite expressions, thematic insertions and so forth. In order to be able to interpret these phenomenons rightly, the discourse context as well as general knowledge have to be factored into the analysis. Many of the approaches that are necessary for a discourse respectively dialog understanding are descended from the field of pragmatic, sometimes also called “computer pragmatic”, which is pointed out in the following.

2.3.2 Dialog as a Teleological Activity

In accordance with McTear37 dialog is always a teleological activity. Often a dialog is also a social action with the purpose to establish (cherish, break up and so on) social relationships. Furthermore dialogs are managed by specific intents, such as information acquisition and other linguistic acts. The interpretation of these intentions respectively of the dialog purposes are indispensable for a correct understanding. Interpretation approaches derive from the pragmatic in which the intention of a speech act takes center stage of the analysis. In this regard the following subsections and will exemplify important pragmatic borrowings. Speech Acts

In 1962 the philosopher Austin has introduced the illocutionary acts in opposite to the until that time common truth conditions of speech utterances in accordance with Juraf- sky and Martin29. In 1969 Searle has augmented this with the required definition of conditions to the nowadays well known speech act theory, which Allan has transformed in 1995 into his BDI-theory (believe, desire, and intention) approach. In the speech act theory there are speech acts such as questions, orders et cetera, of different speech act classes, like assertives, directives, commissives, expressives and dec- larations.

Speech acts, that run like prefabricated patterns or schemata, are determined by rules or conventions. They include the following four speech act components: locution, proposition, illocution, and perlocution.

The locutionary act describes the utterance with its particular meaning, while the proposition refers to literal interpretation of the things and facts of the context, that is of the surrounding and circumstance. Then the illocutionary act delineates how the speech act is meant as an interaction, for example asking, answering et cetera, in uttering a sentence. The perlocutionary act refers to the aim or purpose of the speech act, which is also called intention of the speaker.

Performative expressions make the illocution of the speaker explicit. Thereby a speech act can succeed by understanding the illocutionary act, while all pragmatic conditions are observed. If the speaker is able to gain his perlocution then the speech act can work.

According to Jurafsky and Martin29 speech acts are the base for the dialog acts of Bunt9 in 1994 or conversational moves of Carletta et al.11 in the year 1997. Cooperation Principle

Another philosophical achievement is Grice’s cooperation principle (1975) and his con- versational implicature. His principle postulates, that speaker of a dialog behave in a cooperative manner with regard to their general conversation intentions. In this connection Grice determined the following four maxims for his cooperation principle:

1. Maxim of quantity (information): The speaker should be exactly as infor- mative as necessary for understanding.
2. Maxim of quality (credibility): The speaker should not say something that he does not accept as true respectively consider wrong or for what he lacks evidence.
3. Maxim of relevance (relation): The speaker should be relevant.
4. Maxim of manner (modality): The speaker should preferably be perspicuous, neither obscure, ambiguous, unnecessary prolixly nor orderly.

As a subsumption speech partner bank on the compliance with the maxims by their vis-á-vis. Both partner know that the other implies that they are doing this. This leads to the conversational implicature, in order to be able to explain utterances and dialogs that at the first sight violate the cooperation principle. In contrast to the conventional implicature the uttered and meant expressions diverge in the conversational implicature. The unexpressed respectively unuttered statements can not be deduced from the uttered, therefore, no logical conventional inferences are possible.

2.3.3 Dialog as a Collaborative/Cooperative Activity

In a dialog two or more participants try to pursue their dialog targets by acting in a collaborative, cooperative way. This collaborative activity is based on Grice’s coop- eration principle that lead to a more pragmatical dialog approach. The speech acts are dilated to dialog acts whereat different dialog acts exist for the respective dialog domains. For the structure of dialogs in general and dialog acts in particular the conversational implicature, turn-taking and grounding play an decisive roll by the conversation analysis.

Turn-taking is understood as the change or rotation of the speaking part among dialog participants. Grounding is another strategy of dialog partners to clear up misunder- standings in a collaborative manner. Thereby grounding means a knowledge alignment in which a common ground has to be established to avoid misinterpretations or errors.

2.3.4 Speech Utterances in a Dialog

The explanation of the term speech utterances is not trivial, even if the domain of the dialog is limited, because this term is often not defined precise in literature. There are big differences between written and oral dialogs, for which various utterances are representative at a time.

In spoken dialogs prosody, intonation, accent and pauses are very important whereas in written dialogs word selection, word position and word structure (syntax) are more relevant. Speech utterances include also ellipsis, ungrammatical sentences and so forth. However, in spoken dialogs phenomenons such as false starts, hesitating, pauses, con- firmation words, incomplete utterances/sentences or fragments could be detected. In the first instance silence and pauses are regarded in the meantime as speech ut- terances that need an interpretation. Other non verbal elements like gesture, facial expressions, deixis et cetera could play a role too. These are, of course, much more important for multi-modal dialog systems.

2.4 Dialog Systems

Today there is a multitude of approaches of dialog systems. At the same time a consid- erable quantity of domains, fields of application, for example command-control-systems, natural language interfaces, finite state systems, frame-based systems, plan-based sys- tems, simulated conversations and so on, exist that are the determining factor of the structure and design of dialog systems. [McTear [37]]. With the help of the actual computer power and the achievements of speech recognizer almost all present dialog systems deal with natural spoken language, which are called spoken dialog systems (SDS) according to Glass 19 and are, therefore, in the focus of this work.

“Research in spoken dialogue systems has been increasing steadily over the last decades due to growing interest and demand for human-machine interaction.” [Glass [19], page 8]

“Thus, the global goal is to develop spoken dialog systems with a natural and flexible dialog flow in which the user is unrestricted ...”[Souvignier et al. [50], page 51].

In order to understand the present state of the art it is also important to have a look about former approaches which will be explained in the following.

2.4.1 Evolution

The evolution of natural language dialog systems began more or less after the first stud- ies about automatic (machine) translation programs and with the works about natural language processing and artificial intelligence in the 1960s. Therewith two different approaches can be recognized in the development of dialog systems. One approach is the theoretically motivated dialog rudiment of natural language processing and artifi- cial intelligence, for example BASEBALL (1960) and STUDENT (1968). The opposite attempt is a simulated conversation respectively human computer conversation via pat- tern matching or similar data driven techniques, like ELIZA (1966).

Important works like Winograd’s SHRDLU (1972), Wood’s LUNAR (1972), Shank’s SAM&PAM (1975) and Colby’s PARRY (1975) improved the approaches.

Up to the late 1980s the in- and output was based on written natural language utterances. Hereafter begun the research of spoken dialog systems which was forced by speech understanding research programs, for example of ARPA (US) like HARPY and HEARSAY. A reason for that late switch to spoken language was problems with the speech recognition which lead to quite good speech recognizer in today’s systems. Since the 1980s big federal projects like DARPA[2] (US) and SUNDIAL[3] (EU) forwards the development in accordance with Giachin and McGlashan18. Then in the 1990s the first commercial products were sold and individual research projects took over the driving force of the deployment and commercialization of dialog systems, which is still ongoing.

2.4.2 Field of Application

For dialog systems there are numerous fields of application, like information systems (media, weather, means of transportation), NLIs for databases, assistance or help systems, learn systems, games and entertainment systems.

Up to now dialog systems are developed within distinct and solitary application domains for the main, that include no common knowledge or only in a very limited way. Most of the time information systems or support systems are involved withal. As a basic principle dialog systems can be successfully exert wherever a manual or visual interaction either is not possible, as for instance in a car, or not eligible, according to Kellner30

2.4.3 Architecture and Components

The architecture of spoken dialog systems can heavily vary, viz from so called interactive voice response systems (IVR) to complete flexible natural language ’free form’ dialog systems.

“Interactive voice response (IVR) systems allow users to call a phone number and interact with a machine.” [Yu, Dong, Acero, Alex [64], page 661]

In the IVR of the first generation user could type in several words only or numbers per DTMF (dual tone multi-frequency) technique by touch-tone telephones with the help of keenly guided menus. Present-day systems trend towards mixed-initiative dialog sys- tems, in which a user as well as a system can lead the dialog process. Moreover so called conversational user interfaces become more and more important which realize a coherent interaction of users with several different applications. There- with systems should be able to recognize the user’s intention and map these onto their own application by exchanging context information beyond application borders. The user will be supported thereby within the scope of a cooperative dialog with the inter- action of several applications.

As a general rule the structure of a dialog system is a pipeline similar architecture in which normally the different components are passed through, like the following draft should illustrate Figure 2.1.

User utterances are analyzed by means of speech recognition and speech understanding components and are transformed into a semantic representation. Considering the context the dialog manager plans the system actions that shall be performed, in order to produce an adequate output. With the help of the speech generation components the system output will be turned into a textual representation to enable the speech synthesis component to output natural language.

Of course the respective component of a dialog could be passed through more than one time (cycles).

Beside the serial architecture there is also an integrated architecture in which a complex management component is controlling all other components this can be seen in Figure 2.2.

This integrated structure should solve the problems of error handling/correction of the serial architecture via a fast and efficient answer generation. The assets and draw- backs depend on the application domain, that is on the kind of dialog system itself. An essential part of each dialog system respectively SDS are the following six compo- nents: speech recognition, speech understanding, dialog manager, communication with external systems (database), answer/speech generation and speech output. Thereby, additional components and functions can be added to improve the systems, which is also shown exemplary in Figure 2.1.

illustration not visible in this excerpt

Figure 2.1: Serial architecture of a dialog system

illustration not visible in this excerpt

Figure 2.2: Integrated architecture of a dialog system Speech Recognition

Automatic speech recognition (ASR) realizes the transformation of a speech signal into a sequence of words with the help of acoustic and phonetic parameters. Depending on the SDS design a multitude of challenging problems arise that are caused by natural language.

Referring to McTear37 coarticulation (context dependence by phoneme pronun- ciation) segmentation problems and accentuation variants (prosody) are linguistic fac- tors for example. In addition non-linguistic factors like inter-speaker variability, intra- speaker variability or ’channel’ variability (transmission channel such as telephone) and background noises cause several problems. A special challenge within dialogs is the so called ’barge in’, if the SDS allows it anyhow. There are two types of barge in, namely the system prompt is stopped in the first place by a new input signal, in the second place by confident word recognition (Lombard effect). For solving this, there are various possibilities, which are promising according to the SDS, such as system prompt stop only at successfully recognized words.

Normally, in the speech recognition component all incoming speech sequences are separated up to word or phoneme level in order to determine the most probable word hypothesis respectively sentence hypothesis afterwards. The acoustic model is often represented by statistical methods like Hidden Markov Models (HMM[4] ). To improve the recognition rate the acoustic model is combined with speech models like n-best lists or grammars/algorithms (depending on the state of dialog). In consideration of the speech recognition parameter to restrict the possible HMM amount with the help of a grammar is the most promising approach for word hypothesis detection.

Speech recognition parameter could be different requirements with SDS, like user mod- els, (in-) dependence size of the vocabulary, speech modus (continuous versus discrete) and speech situations (dictation versus spontaneous speech) referring to Leech and Weisser33.

“Usable speech recognition would change the potential context and mode of use of natural language interfaces completely ... ”.[Copestack and SparckJones [13], page 31]

The overall aim is to develop speaker independent, unrestricted, continuous and spon- taneous speech recognition systems, that are robust and show marginal latency as well. Speech Understanding

In speech understanding the structure and meaning respectively intention of word or sentences hypotheses have to be determined. Therefore, one uses syntactical (con- stituent structure) and semantically (predicate logic) analyses to be able to provide meaning representation to the dialog manager. The traditional analysis approach is, nowadays, extended in some SDS (e.g. VoiceXML[5] -based systems) by so called se- mantic grammars. Therein problems of spontaneous speech could be solved by domain specific classifications of function and meaning instead of a syntactical categorization.

Basically two reasons are responsible for the problems in language understanding, namely ambiguity (lexical, structural and content) according to Bunt10 and not well formed input of natural speech/dialogs.

Because of their robustness bottom up chart parsers prevail the syntactical analysis in SDS. This analysis can be improved with the help of chunk parser or lattice parser[6]. Since parsing is pre-implemented in VoiceXML and other systems in their interpreter, it is only necessary to specify the semantic grammars. Speech/Natural Language Generation

Speech generation, also called natural language generation, systems vary from very sim- ple forms with predefined output sequences via alternative output systems (template- filling) to artificial intelligence planning processes depending on the spoken dialog sys- tem. According to Reiter and Dale44 three main groups of planning are relevant, namely document planning, micro planning and realization of the surface. In general the planning process is a representation which begins with a communicative aim and ends with a speech act.

In the document planning usually schemata are used to decide which information output should be given to the user. With the help of concrete logical schemata it is managed in the micro planing how the information should be structured. Therein, the decision is made, whether reference expressions, aggregation and lexical selection are be applied. In the surface realization messages or texts are produced using structural and linguistical selection of realization to transmit these to the speech synthesis component. Speech Synthesis

Analogous to natural language generation the speech synthesis variegate from prede- fined respectively a priori recorded output to the synthesis of flexible spontaneous speech utterances. In the speech synthesis text information respectively message representa- tions of the speech generation are transformed into a acoustic spoken speech output.

As a general rule text-to-speech systems (TTS) are used, therefore. This transfor- mation is a two step process in which the first is the text analysis (NLP) and the second the speech production, that is digital speech processing. In text analysis text is transformed into a linguistically representation form with the help of linguistic analysis methods and for this reason is also called text-to-phoneme conversion. The symbolic processing consist of four steps, namely of text segmentation and nor- malization, of morphological analysis, of syntactical tagging & parsing and of modeling of continuous speech. The phonetic segmentation is transferred from the script to pho- netic transcriptions which is already included in the answer/ speech generation in some SDS.

Afterwards, the phoneme-to-speech conversion or speech production produces the audio/speech utterances, the final speech wave forms.

Thereby, prosody generation, phrase building, rhythm, accentuation, modeling of the duration et cetera are used to transform linguistic units with the help of signal generation to final speech products and output to the user.

2.4.4 Dialog Management and Important Strategies

The dialog manager represents the distributing center of a dialog system. As the central component the manager controls the acceptance of the user utterance, the production of system outputs for the user, the interaction with external knowledge sources, such as data bases, and manages the general control of the dialog process. Thereby, it deals with various tasks like interpretation of the user utterance in consideration of the dialog context, planing and executing of different actions, such as interaction/communication with external systems and an adequate output.

Of course, there are miscellaneous and discriminatingly complex approaches for an successful and for the user satisfactorily usage of a dialog. Important strategies or technologies, therefore, are the dialog initiative, the control technologies, clarification and grounding and external knowledge acquisition. Dialog Initiative

There are three categories of dialog initiatives: system-driven, user-driven and mixed- initiative.

In the system-driven dialog initiative the system guides the user through the dialog. Thereby, the system tries to fill its open information slots. Very often the system has to restrict the user input to utterances like ’yes, no’, numbers et cetera.

The user-driven initiative is characterized by an user initiated question-answer sys- tem (NLI) in which the user is acting and the system tries to react in an adequate way.

The variant of mixed-initiative dialog approaches are most similar to the changing initiatives of human-human dialogs. Both, system and user, are able to take over the control of the dialog by asking questions, change topics and domains and so on. Until now, most of the time only pseudo mixed-initiative systems are implemented. Therein, the system is actual controlling and the user is just allowed to act or ask for domain overlapping information. Dialog Control Technologies

There are three control technologies which influence the dialog flow. The first is the representation implementation of the dialog structure with finite-state machines. Framebased slot-filling systems are the second possibility and agent-based systems the third. Common commercial application in SDS is finite-state machines respectively finitestate-based dialog structures (state graphs). This is the case although these are dialog specific and do not allow divergences such as negotiations.

Frame-based slot-filler, that are similar to finite-state systems, tolerate a more open dialog arrangement. Questions don’t have to appear in pre-defined states and orders, but could be asked flexible. Therefore, the three components: a frame, an advanced recognition grammar and a dialog algorithm are necessary. A frame-based control seems to be mixed-initiative that is possible with agent-based systems.

Agent-based systems use many AI-technologies, like discontinuing theorem proofer, planning, distributed architecture and conversational agents to solve problems and tasks of a dialog in a collaborative way. This could provide cooperative answers, improved error recognition and correction, inferences and expectations for a flexible and collaborative managing of a dialog. Clarification and Grounding

In order to clear discrepancy between system and user the so called clarification and grounding is used in which the system tries to verify its information, according to McTear37.

Clarification treats either with false detections or wrong interpretations of user utter- ances. This method is often used in clarification sub-dialogs in which it is tried to solve these problems step by step. If nothing else works this could lead to the request of the systems to spell it.

It is possible that the system belief/knowledge does not respond to reality which cause further errors. For this reason, there are explicit (yes or no questions) or implicit verification methods with which the system knowledge could be checked and, if neces- sary, corrected too. This process is called grounding and with it the common speech base of a dialog is re-established. As a rule today the implicit verification methods are used which, nevertheless, could be problematic (convention, correction). External Knowledge Acquisition

As soon as the dialog manager knows what the user wants or has enough information for a query external/additional components are involved. Therefore, a database request is exemplary. Problematic are herein non fitting vocabulary, ambiguities, and unknown expressions.

In general a dialog manager depending on the control type should attend and maintain its own knowledge sources, such as a dialog history, a task history, common knowledge (background knowledge) models, domain and user models, conversational competence models etc. These could improve the HCI, that is the communication.

Several parts of the answer selection could be partially based herein, because wrapper agents have to crawl over mostly external web pages. Anyway, answer selection should be located in the dialog management. It takes place after the context planing and discourse history analysis and before the action and output planing in accordance to Figure 2.1.

2.4.5 Design Criteria

Aside from these technical aspects of natural language dialog systems an intuitive us- ability is more pivotal for a successful mainstream usage. For this reason one has to take the user expectation into consideration in every step as well as for the whole system.

This could be evaluated in target group analysis.

According to Uszkoreit57 a very essential part is the dialog handling or dialog modeling.

Investigations by Bernsen et al.6 in 1998 show that design, for example the choice of the speech output, helping forms et cetera, could influence the overall user assessment in a stronger way than word error rate or precision of interpretation. Unstudiedly the common design criteria of software design apply accordingly to the design of SDS. Moreover, a few criteria are especially important for spoken dialog sys- tems. In general in designing primary one should choose and realize the methods and techniques/strategies shown before, that fulfill the particular requirements the best.

2.5 Metadata and Knowledge Representation

Metadata is simply spoken additional information. An important declaration in litera- ture and science says that metadata is data about data. This is simple and complicated in itself.

Normally metadata is embedded in documents, but it’s also possible that is exists separately. For instance it could be stored in another document or in protocol headers and so forth. For metadata exists no limit, because with metadata any object can be described, which results in that almost everything could be enriched with metadata. A decisive facet of metadata is knowledge representation in general respectively are knowledge representation systems in particular, which - regarding to Hjelm25 - in- tend to be general in structure and vocabulary but are not. The problematic assumption is that it is possible to represent universal ways of knowledge “within one, centrally de- termined framework”. [Hjelm25, page 4]

Knowledge representation refers to information representations that could enables com- puters to process these information objects and draw inference from stored knowledge. The concept of knowledge representation and metadata overlap although both have yield to different technologies. For instance the Dublin Core[7] format is a universal for- mat of the library community for describing papers and books in electronic catalogs. Primarily this format is a domain specific set of limited attributes and rules that are important for books.

On the other hand the knowledge representation community consider the problem in another way. In their knowledge systems respectively expert systems rule based sets use knowledge as input to provide different knowledge output sets which means that the represented information have to deal with the system.

Both approaches inspired the World Wide Web Consortium (W3C) to create a ’open domain’ format which can represent and transport information, according to Hjelm25. A standard method to map metadata is the resource description framework, which will be more explained in 3.

Chapter 3 Semantic Web, SmartWeb, and On- tologies

The purpose of the Chapter is the introduction to the ’Semantic Web’ initiative and to the SmartWeb project. The dialog system is developed inside the SmartWeb project. The next section describes ontologies in general but also some important details of the ontology technology, like ontology modeling or ontology languages. Furthermore, short elucidations about RDF and RDFS are given in order to understand the algorithms and data.

3.1 Semantic Web

Nowadays, it is not a problem to find information as such, but we want to achieve, that from the enormous amount of information only relevant information is filtered out. This challenge should be accomplished with the Semantic Web.

The Semantic Web is an initiative of Tim Berners-Lee and the W3C[1]. Thereby, the Semantic Web is not a separate or new web, but rather an extension of the existing World Wide Web (WWW).

“The Semantic Web is an extension of the current web in which information is given welldefined meaning, better enabling computers and people to work in cooperation.” [Tim Berners-Lee [4]]

According to Tietz54, one of the guiding ideas of the development of the Semantic Web is the provision of mechanisms for processing and interpreting data with agents. Ontologies should represent semantic models therein, which should be able to interpret data itself and put them in relation to each other. For this reason the Semantic Web is more or less just something like a vision up to now and therewith only a guidance for future research and development of web-based applications of the next generation. This vision is not exclusive limited to the WWW, which is based on the hyperlink technique and mainly consists of HTML-pages. Technologies of the Semantic Web, in particular those developed for guaranteeing interoperability, are used in other areas as well. For example they could effect substantial improvements like in local knowledge management or Intranet. Another important base of the Semantic Web is a standardized language to describe metadata. Therewith, the exchange of documents is facilitated and automatic understanding of information, if ever, could be enabled.

First and foremost clients or computer of the Semantic Web have to understand information in order to draw conclusions on their own. Furthermore, they should op- erate globally that means as a worldwide information exchange to stem the flood of information. At the same time they have to judge the authenticity. Today everyone can provide information via the internet without any control. Therefore, it becomes more and more important to verify information sources and therewith also to verify their seriousness.

illustration not visible in this excerpt

Figure 3.1: Architecture of the Semantic Web [from Berners-Lee [5]]

Figure 3.1 illustrates the different main layers of the Semantic Web. It shows the hierarchy of the Semantic Web in which XML and XML Schema is a base for RDF on which Ontology technologies are built up. The upper layers, logic, proof, and trust play an important role for the Semantic Web, but are not that important for answer selection.

One of the approaches with the potential of forming the basis for the next generation of the Web is the Semantic Web initiative and projects like SmartWeb. One effort of the Semantic Web is to provide tools to mark-up the content of web pages. Another effort is to develop Semantic Web services that result in a Semantic Web. The W3C standard for the Semantic Web, the Resource Description Framework (RDF/S), is a base of SmartWeb to represent machine interpretable content. “In SmartWeb, multi-modal user requests will not only lead to automatic Web service discovery and invocation, but also to the automatic composition, interoperation and execution monitoring of Web services.”[Wahlster et al. [62]]

Within the Semantic Web initiative4 [2] several efforts and projects, i.e. SmartWeb 62, the SEKT14 project use the modeled information therein. Current dialog- driven question answering systems are starting to employ ontologies for knowledge representation. The SmartWeb system uses various information sources such as:

- a static fact base, filled with knowledge obtained by linguistic information extraction out of web pages
- an open domain statistical question answering and
- web services.

For being able to deliver up-to-date answers for specific domains, wrapper agents are used to extract semantic representations from web pages.

Therefore, the Semantic Web can be summarized as “an initiative that aims improving the current state of the World Wide Web. The key idea is the use of machineprocessable Web information. Key technologies include explicit metadata, ontologies, logic and inferencing, and intelligent agents. The development of the Semantic Web proceeds in layers.” [Antoniou and Harmelen [2], page 19]

3.2 SmartWeb

In the research project SmartWeb at the EML there are several components of the dialog system, about which one is responsible specifically for the control of information of the question-answer-system component. Therein user queries will be represented, conforming to the Semantic Web Standard, via RDF-triples and will be used for the database queries (via SERQL[3] ). In domain overlapping questions for question-answer systems the answers will be determined mostly by means of statistically proceedings. In an ’ontological’ database information of the WWW, exracted by different web agents, will be saved domain specifically. Generally the answers of the database query are as a rule unorganized or unsorted. My assignment ’Ontology-Based Answer Selection in Dialog Systems’ shall be a way of ’ranking’ these answer hypotheses, to ascertain the most probable and best hypothesis for the user with reference to his[4] query-intention. Thereby, I want to test different already existing procedures and contingently develop my own which will be evaluated afterwards.

illustration not visible in this excerpt

Figure 3.2: Draft of Relevant Parts of the SmartWeb System

In Figure 3.2 you can see a rough overview about the relevant part of the SmartWeb system for this work. On the left side you can see the user who asks his question to the system. Then, the system represents this question via the known dialog system steps (with dialog/middle ware components) and with the help of ontologies in a semantic way. The result of this transformation, for example in RDF, is a semantical query that initializes the search engine. For the search of the answer the system considers various kinds of information sources, like databases, a knowledge warehouse, web services and WWW-wrapper agents. The recall is then also transformed into the same semantic representation. Afterwards, both the query and the answer representations are com- pared and analyzed to find the best answer for the question. After the ranking the best answer can be used to build the system output with the dialog/middleware components.

illustration not visible in this excerpt

Figure 3.3: Draft of the Answer Candidate Selection Challenge

The lack of the answer decision can be seen in Figure 3.3 represented by the question mark. This figure illustrates that it is necessary but not trivial to select or rank the best answers properly. It shows three answer types of a semantic similar concept, here FiFaWorldcup. An interesting part is the difference of Squad and FootballMatchTeam and SwimmingTeam. Beside this, the properties inMatch and Country are responsible for the semantic difference.

3.3 Semantic Similarity Measures

For the evaluation of how strong a semantic link between two concepts is, one need a reliable semantic measure. In the literature numerous different semantic measures can be found. Referring this Blanchard et al.7 present a typology of ontology-based semantic measures which I want to recapitulate briefly.

Mathematical analysis, comparison with human judgment and application specific eval- uation are three ways to evoke measure validation. Concentrating on the mathematical analysis the following three characteristics could be specified in conformity with Blan- chard et al.7:

Information sources: A given ontology is the base for each of the considered mea- sures. Text corpora are required by some definitions to add information like the distribution of concept frequencies.

Principles: Functions of the shortest path length or making functions of the informa- tion content serve exemplary as a base on axiomatic principles for the plurality of the measures.

Semantic class: The following discriminative classes have been introduced: semantic distance, semantic similarity and semantic relatedness between two concepts in the same ontology.

“The semantic similarity evaluates the resemblance between two concepts from a subset of significant semantic links. The semantic relatedness evalu- ates the closeness between two concepts from the whole set of their semantic links. All pairs of concepts with a high semantic similarity value have a high semantic relatedness value whereas the inverse is not necessarily true. The semantic distance evaluates the disaffection between two concepts; it is an inverse notion to the semantic relatedness.” [Blanchard et al. [7]]

They have identified four parameters which influence of the measures: the first parameter (p1) is the length of the shortest path, the second (p2) the depth of the most specific common subsumer, the third (p3) the density of the concepts of the shortest path and the last but not least the fourth parameter (p4) is the density of the concepts from the root to the most specific common subsumer.

Semantic measures:[5]

Rada et al.’s distance This measure ascertain the shortest path between two con- cepts ci and cj (p1) based on an ontology restricted to taxonomic links: the minimum length is calculated between two concepts ci and cj . Thereto, between two adjoining concepts all taxonomic links should have the same value, if the shortest path is considered.

Resnik’s similarity The underlying hypothesis of this measure is that the more in- formation two concepts have in common, the more related respectively similar they are. In accordance with Rada et al.’s measure, only taxonomic links are considered. Based on the information theory Resnik’s propose is in the main adding of information content. Therein information sharing of two concepts is an indicator of the information content of their most specific common subsumer.


1 McTear[37] is in fact a base for the dialog and dialog system system sections in general

2 Cf. DARPA at http://www.darpa.mil/

3 Cf. SUNDIAL at http://acronyms.thefreedictionary.com/Speech+Understanding+and+Dialogue

4 Cf. “A tutorial on Hidden Markov Models and selected applications in speech recognition”, from L. Rabiner

5 Cf. VoiceXML at http://www.voicexml.org/

6 Cf. Lattice Parsing for Speech Recognition at http://citeseer.ist.psu.edu/chappelier99lattice.html

7 Cf. Dublin Core Metadata Initiative at http://dublincore.org/

1 Cf. World Wide Web Consortium at http://www.w3.org/

2 Cf. also Semantic Web[58]

3 Cf. The SeRQL query language at http://www.openrdf.org/doc/sesame/users/ch06.html

4 His is used in this work as a representation for ’his and her’ and contains no valuation of gender.

5 Cf. Blanchard [7]

Excerpt out of 135 pages


Ontology-Based Answer Selection in Dialog Systems
University of Heidelberg  (Seminar für Computerlinguistik; Institut für allgemeine und angewandte Sprachwissenschaft)
Catalog Number
ISBN (eBook)
File size
2002 KB
Ontology-Based, Answer, Selection, Dialog, Systems
Quote paper
Bachelor of Arts / Magister Artium Christian Pretzsch (Author), 2006, Ontology-Based Answer Selection in Dialog Systems, Munich, GRIN Verlag, https://www.grin.com/document/63949


  • No comments yet.
Read the ebook
Title: Ontology-Based Answer Selection in Dialog Systems

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free