Grin logo
de en es fr
Shop
GRIN Website
Publicación mundial de textos académicos
Go to shop › Ciencia del lenguaje / Lingüística

Development of an automatic news summarizer for isiXhosa language

Título: Development of an automatic news summarizer for isiXhosa language

Tesis de Máster , 2017 , 115 Páginas , Calificación: 75

Autor:in: Zukile Ndyalivana (Autor)

Ciencia del lenguaje / Lingüística
Extracto de texto & Detalles   Leer eBook
Resumen Extracto de texto Detalles

From practice perspective, given the abundance of digital content nowadays, coming up with a technological solution that summarizes written text without losing its message, coherence and cohesion of ideas is highly essential. The technology saves time for readers as well as gives them a chance to focus on the contents that matter most.

This is one of the research areas in natural language processing/ information retrieval, which the dissertation tries to contribute to. It tries to contextualize tools and technologies that are developed for other languages to automatically summarize textual Xhosa news articles. Specifically, the dissertation aims at developing a text summarizer for textual Xhosa news articles based on the extraction methods.

In doing so, it examines the literature and understands the techniques and technologies used to analyze contents of a written text, transform and synthesize it, the phonology and morphology of the Xhosa language, and finally, designs, implements and test an extraction-based automatic news article for the Xhosa language. Given comprehension and relevance of the literature review, the research design, the methods and tools and technologies used to design, implement and test the pilot system.

Two approaches were used to extract relevant sentences, which are, term frequency and sentence position. The Xhosa summarizer is evaluated using a test set. This study has employed both subjective and objective evaluation methods. The results of both methods are satisfactory. Keywords: Xhosa, Automatic Text Summarization, Term Frequency and Sentence Position.

Extracto


Table of Contents

1. INTRODUCTION AND BACKGROUND

1.0 Overview

1.1. Automatic Text Summarization(ATS)

1.2. Motivation

1.3. The Problem Statement and Justification of the Study

1.4. Research Questions

1.5. Objectives of the Study

1.4.1. Specific Objectives

1.6. Significance of the Study

1.7. Research Methodology

1.8. Literature Review

1.9. Data Source Collection and Preperation

1.9.1. Corpus Preparation

1.9.2. Manual Summary Preparation

1.10. Summarization Method and Tools used in this Study

1.10.1. Development Tools

1.10.2. The Natural Language Toolkit (NLTK)

1.10.4. Installing the NLTK data

1.10.8. Operating System

1.10.9. The Python Programming Language

1.10.10. The Numpy Library

1.10.11. Charm Integrated Development Environment (IDE)

1.12. Scope and Limitations of the Study

1.13. Outline of the Dissertation

CHAPTER TWO

1.1. LITERATURE REVIEW

2.0 Introduction

2.1. Automatic Text Summarization

2.2. Processes of Automatic Text Summarization

2.2.1. Summarization Parameters

2.2.2. Methods of Summarization

2.3. Linguistic Concepts to Consider

2.3.1. Coherence

2.3.2. Cohesion

2.3.3. Lexical Cohesion

2.4. News Writing Structure

2.5. Evaluation Methods used in Automatic Summarization

CHAPTER THREE

THE XHOSA LANGUAGE

3.0 Introduction

3.1. Xhosa Consonants and Vowels

3.1.1. The Vowel System

3.1.2. Consonants

3.2. Overview of Xhosa Orthography

3.3. Xhosa Morpheme Types

3.3.1. Xhosa Nouns

3.3.2. Xhosa Prefixes

3.3.3. The Xhosa Noun Stems

3.3.4. Xhosa Suffixes

3.3.5. Pronouns

3.3.6. Verbs

3.3.7. Adjectives

3.3.8. Apostrophe

3.4. Abbreviation

3.5 Summary

CHAPTER FOUR

METHODOLGY AND SYSTEM DESIGN

4.0 Introduction

4.1. Methodology

4.2. Proposed Algorithm

4.4.1. How the Algorithm Works

4.3. Preprocessing

4.3.1. Tokenization

4.3.2. Stop Words

4.3.3. Stemming

4.6 Sentence Ranking

4.7 Summary Generation

4.8 System Design

4.10 Summary

CHAPTER FIVE

IMPLEMENTATION

5.0 Introduction

5.1. Tokenization

5.2. Stop Word Removal

5.3. Stemming

5.4. Implementation

5.4.1. The IsiXhoSum Interface

5.4.2. Modules of the Xhosa Text Summarizer

5.5. Experimentation

5.5.1. Corpus Preparation

5.5.2. Creation of Manual Summaries

5.6. Summary

CHAPTER SIX

TESTING, RESULTS, AND DISCUSSION

6.0 Introduction

6.1. Testing

6.2. Results

6.2.1. Results of Subjective Evaluation

6.2.2. Results of Objective Evaluation

6.3. Discussion of the Results

6.4. Discsion on Coherence and Cohesion

6.5. Summary

CHAPTER SEVEN

5 CONCLUSION AND FUTURE WORK

7.0 Introduction

7.1. Research Summary

7.2. Conclusion and Future Work

Research Objectives and Themes

The primary aim of this dissertation is to design, implement, and evaluate an automated text summarizer specifically for isiXhosa news articles. This research addresses the growing challenge of information overload for isiXhosa speakers by developing an extraction-based system that identifies and presents the most relevant content from lengthy news articles. The system is designed to function with minimal reliance on complex semantic resources, utilizing statistical techniques adapted for the specific linguistic requirements of the Xhosa language.

  • Development of an extractive-based automatic news summarizer for isiXhosa.
  • Linguistic analysis of isiXhosa, including morphology, phonology, and word structure.
  • Implementation of statistical methods such as term frequency and sentence position for ranking.
  • Subjective and objective evaluation of summary quality using native speakers and the ROUGE tool.
  • Preprocessing strategies including tokenization, stop word removal, and lightweight stemming.

Excerpt from the book

1.1. Automatic Text Summarization(ATS)

The volume of information available for users of the Internet has been increasing on a daily basis. In this, the information age, the growth of electronic information has necessitated intensive research in the area of Natural Language Processing (NLP) and Information Retrieval (IR). The fast growth of information has made it difficult for many users to cope with all the text that potentially is of interest to them. As a result, systems that can automatically summarize one or more documents, have become the focus of interest recently, in the field of automatic summarization [1]. Automatic text summarization has become a suitable tool for assisting people in the task of reading large volumes of textual information.

Examples of summaries that users choose are: news headlines, scientific abstracts, minutes of meetings, and weather forecasts. These are all kinds of summaries people enjoy reading on a daily basis [2].

A summary can help users to get the meaning of a complete text document within a short time. The following are some of the general reasons that support the necessity of text summarization.

Summary of Chapters

1. INTRODUCTION AND BACKGROUND: This chapter provides the research context, problem statement, and objectives, emphasizing the need for an automatic isiXhosa news summarizer.

1.1. LITERATURE REVIEW: This chapter covers existing research on automatic text summarization, including core processes, techniques, and linguistic concepts relevant to text analysis.

CHAPTER THREE: This section details the isiXhosa language, discussing its unique morphology, consonant and vowel inventory, and its orthographic history.

CHAPTER FOUR: This chapter describes the methodology and system architecture, focusing on the preprocessing steps and the proposed ranking algorithms used for extraction.

CHAPTER FIVE: This chapter details the technical implementation, explaining the interface development, stemmer rules, and the preparation of the corpus and manual summaries.

CHAPTER SIX: This chapter presents the testing phase, evaluating both the subjective and objective results of the IsiXhoSum system against manual summaries.

CHAPTER SEVEN: This chapter concludes the research by summarizing the findings and suggesting potential directions for future enhancements.

Keywords

Xhosa, Automatic Text Summarization, Natural Language Processing, IsiXhoSum, Term Frequency, Sentence Position, Extraction-based, Information Retrieval, Linguistic Analysis, Stemming, Corpus, Subjective Evaluation, Objective Evaluation, ROUGE, News Articles.

Frequently Asked Questions

What is the core focus of this research?

The research focuses on the development of an automated extraction-based text summarizer designed specifically for isiXhosa language news articles.

What are the primary themes addressed in this thesis?

The key themes include the linguistic analysis of isiXhosa, statistical approaches to sentence ranking, the implementation of language-specific preprocessing tools (such as a stemmer), and the evaluation of system-generated summaries.

What is the main research objective?

The objective is to design, implement, and evaluate a prototype system, named IsiXhoSum, capable of producing readable summaries of Xhosa news texts that help reduce information overload for users.

Which scientific methods were employed?

The study utilized extraction-based summarization methods, specifically incorporating term frequency analysis and sentence position algorithms, supported by linguistic rules tailored for the isiXhosa language.

What content is covered in the main body of the work?

The work covers a review of existing summarization literature, an analysis of Xhosa language structure, the design of the system's algorithm, the technical implementation using Python and NLTK, and an extensive evaluation phase.

Which keywords characterize this work?

Key terms include Xhosa, Automatic Text Summarization, Natural Language Processing, IsiXhoSum, Term Frequency, and Sentence Position.

How does the IsiXhoSum system handle Xhosa morphology?

The system uses a lightweight rule-based stemmer specifically developed to strip suffixes and prefixes from Xhosa nouns and verbs, ensuring that words with the same stem are mapped to a single form.

How was the system evaluated?

Evaluation was conducted using both subjective methods (rating by isiXhosa native speakers) and objective methods (comparing system output to human-generated summaries using the ROUGE2.0 tool).

What role does sentence position play in this summarizer?

The system assumes that the structure of news articles follows an "inverted pyramid" style, meaning that the first sentences of an article generally contain the most critical information, which the system prioritizes during extraction.

Final del extracto de 115 páginas  - subir

Detalles

Título
Development of an automatic news summarizer for isiXhosa language
Curso
Computer Science
Calificación
75
Autor
Zukile Ndyalivana (Autor)
Año de publicación
2017
Páginas
115
No. de catálogo
V442361
ISBN (Ebook)
9783668861718
ISBN (Libro)
9783668861725
Idioma
Inglés
Etiqueta
IsiXhosa Python NLTK
Seguridad del producto
GRIN Publishing Ltd.
Citar trabajo
Zukile Ndyalivana (Autor), 2017, Development of an automatic news summarizer for isiXhosa language, Múnich, GRIN Verlag, https://www.grin.com/document/442361
Leer eBook
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
Extracto de  115  Páginas
Grin logo
  • Grin.com
  • Envío
  • Contacto
  • Privacidad
  • Aviso legal
  • Imprint