This thesis proposes a general pipeline architecture for one-on-one dialogues extraction from many different IRC channels to extend the state of art work for the Ubuntu IRC channel. Further more, this thesis takes the advantage of the results from the pipeline and evaluates ESA on different extracted dialogues.

The power of an intelligent program to perform its task well depends primarily on the quantity and quality of knowledge it has about that task. Advanced techniques and applications in Artificial Intelligence are highly depending on data which at the same time getting highly increased and are available over the web. However, for a computer to be able to manipulate information, the latter should be in a form that makes it easy for a computer to manipulate. That is, many available unstructured data need to be collected and post-processed in order to create structured information from the unstructured ones. Recent advances in Data-Driven Dialogue Systems made use of the Ubuntu published IRC channel conversations to extract one-on-one dialogues to use in Deep Learning methods. A best response task performed by a Dialogue System can make use of a trained model on such dialogues. In addition, techniques in Natural Language Processing like Semantic Analysis had a remarkable progress, Wikipedia-Based Explicit Semantic Analysis (ESA) is an example, where the problem of interpretation was improved for both Polysemy and Synonymy.

Excerpt

Inhaltsverzeichnis (Table of Contents)

Chapter 1: Introduction

Motivation: Why is the Topic so Important?
Thesis Overview
The Problem and Contribution
Outline of The Thesis

Chapter 2: Background

Dialogue Systems

Introduction
Data-Driven vs Other Design Approaches

McGill Ubuntu Dialogue Corpus

Chapter 3: Methods and Techniques

Natural Language Processing (NLP)

Introduction
Wikipedia-Based Explicit Semantic Analysis (ESA)

Deep Learning

Why Deep Learning?
Deep Neural Networks: Definitions and Basics
RNN and LSTM Networks

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This thesis aims to develop a general pipeline architecture for extracting one-on-one dialogues from various IRC channels, building upon existing work using the Ubuntu IRC channel. The thesis also explores the application of Wikipedia-Based Explicit Semantic Analysis (ESA) on the extracted dialogues. This work contributes to the advancement of data-driven dialogue systems, particularly in the area of best response selection.

Development of a general pipeline architecture for extracting one-on-one dialogues from multiple IRC channels.
Application of Wikipedia-Based Explicit Semantic Analysis (ESA) on extracted dialogues.
Evaluation of ESA's effectiveness in improving dialogue interpretation.
Exploration of the potential of deep learning methods in dialogue systems.
Contribution to the advancement of data-driven dialogue systems.

Zusammenfassung der Kapitel (Chapter Summaries)

Chapter 1: Introduction outlines the importance of the topic, provides an overview of the thesis, and details the problem addressed and the contribution made. It also presents the structure of the thesis.

Chapter 2: Background discusses the concept of Dialogue Systems, with a particular focus on Data-Driven approaches. It introduces the McGill Ubuntu Dialogue Corpus, a significant dataset used for training dialogue systems.

Chapter 3: Methods and Techniques delves into Natural Language Processing (NLP), focusing on Wikipedia-Based Explicit Semantic Analysis (ESA) as a technique for improving dialogue interpretation. The chapter also explores Deep Learning, emphasizing its potential in the field of dialogue systems and introducing the concepts of Deep Neural Networks, RNNs, and LSTMs.

Schlüsselwörter (Keywords)

The central focus of this thesis lies on Dialogue Systems, Data-Driven Approaches, IRC Channel Dialogue Extraction, Natural Language Processing (NLP), Wikipedia-Based Explicit Semantic Analysis (ESA), Deep Learning, and best response selection in unstructured dialogue systems.

Excerpt out of 73 pages - scroll top

Details

Title: General Pipeline Architecture for Domain-Specific Dialogue Extraction from different IRC Channels
College: Eötvös Loránd University
Course: Master's Degree in Computer Science
Grade: 4.6/5
Author: Ahmed Abouzeid (Author)
Publication Year: 2017
Pages: 73
Catalog Number: V365283
ISBN (eBook): 9783668468634
ISBN (Book): 9783668468641
Language: English
Tags: IRC Pipeline Architecture computer ESA Ubuntu AI
Product Safety: GRIN Publishing GmbH

Quote paper: Ahmed Abouzeid (Author), 2017, General Pipeline Architecture for Domain-Specific Dialogue Extraction from different IRC Channels, Munich, GRIN Verlag, https://www.grin.com/document/365283

General Pipeline Architecture for Domain-Specific Dialogue Extraction from different IRC Channels