This thesis proposes a general pipeline architecture for one-on-one dialogues extraction from many different IRC channels to extend the state of art work for the Ubuntu IRC channel. Further more, this thesis takes the advantage of the results from the pipeline and evaluates ESA on different extracted dialogues.
The power of an intelligent program to perform its task well depends primarily on the quantity and quality of knowledge it has about that task. Advanced techniques and applications in Artificial Intelligence are highly depending on data which at the same time getting highly increased and are available over the web. However, for a computer to be able to manipulate information, the latter should be in a form that makes it easy for a computer to manipulate. That is, many available unstructured data need to be collected and post-processed in order to create structured information from the unstructured ones. Recent advances in Data-Driven Dialogue Systems made use of the Ubuntu published IRC channel conversations to extract one-on-one dialogues to use in Deep Learning methods. A best response task performed by a Dialogue System can make use of a trained model on such dialogues. In addition, techniques in Natural Language Processing like Semantic Analysis had a remarkable progress, Wikipedia-Based Explicit Semantic Analysis (ESA) is an example, where the problem of interpretation was improved for both Polysemy and Synonymy.
Inhaltsverzeichnis (Table of Contents)
- Chapter 1: Introduction
- Motivation: Why is the Topic so Important?
- Thesis Overview
- The Problem and Contribution
- Outline of The Thesis
- Chapter 2: Background
- Dialogue Systems
- Introduction
- Data-Driven vs Other Design Approaches
- McGill Ubuntu Dialogue Corpus
- Chapter 3: Methods and Techniques
- Natural Language Processing (NLP)
- Introduction
- Wikipedia-Based Explicit Semantic Analysis (ESA)
- Deep Learning
- Why Deep Learning?
- Deep Neural Networks: Definitions and Basics
- RNN and LSTM Networks
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This thesis aims to develop a general pipeline architecture for extracting one-on-one dialogues from various IRC channels, building upon existing work using the Ubuntu IRC channel. The thesis also explores the application of Wikipedia-Based Explicit Semantic Analysis (ESA) on the extracted dialogues. This work contributes to the advancement of data-driven dialogue systems, particularly in the area of best response selection.
- Development of a general pipeline architecture for extracting one-on-one dialogues from multiple IRC channels.
- Application of Wikipedia-Based Explicit Semantic Analysis (ESA) on extracted dialogues.
- Evaluation of ESA's effectiveness in improving dialogue interpretation.
- Exploration of the potential of deep learning methods in dialogue systems.
- Contribution to the advancement of data-driven dialogue systems.
Zusammenfassung der Kapitel (Chapter Summaries)
Chapter 1: Introduction outlines the importance of the topic, provides an overview of the thesis, and details the problem addressed and the contribution made. It also presents the structure of the thesis.
Chapter 2: Background discusses the concept of Dialogue Systems, with a particular focus on Data-Driven approaches. It introduces the McGill Ubuntu Dialogue Corpus, a significant dataset used for training dialogue systems.
Chapter 3: Methods and Techniques delves into Natural Language Processing (NLP), focusing on Wikipedia-Based Explicit Semantic Analysis (ESA) as a technique for improving dialogue interpretation. The chapter also explores Deep Learning, emphasizing its potential in the field of dialogue systems and introducing the concepts of Deep Neural Networks, RNNs, and LSTMs.
Schlüsselwörter (Keywords)
The central focus of this thesis lies on Dialogue Systems, Data-Driven Approaches, IRC Channel Dialogue Extraction, Natural Language Processing (NLP), Wikipedia-Based Explicit Semantic Analysis (ESA), Deep Learning, and best response selection in unstructured dialogue systems.
- Quote paper
- Ahmed Abouzeid (Author), 2017, General Pipeline Architecture for Domain-Specific Dialogue Extraction from different IRC Channels, Munich, GRIN Verlag, https://www.grin.com/document/365283