From practice perspective, given the abundance of digital content nowadays, coming up with a technological solution that summarizes written text without losing its message, coherence and cohesion of ideas is highly essential. The technology saves time for readers as well as gives them a chance to focus on the contents that matter most.
This is one of the research areas in natural language processing/ information retrieval, which the dissertation tries to contribute to. It tries to contextualize tools and technologies that are developed for other languages to automatically summarize textual Xhosa news articles. Specifically, the dissertation aims at developing a text summarizer for textual Xhosa news articles based on the extraction methods.
In doing so, it examines the literature and understands the techniques and technologies used to analyze contents of a written text, transform and synthesize it, the phonology and morphology of the Xhosa language, and finally, designs, implements and test an extraction-based automatic news article for the Xhosa language. Given comprehension and relevance of the literature review, the research design, the methods and tools and technologies used to design, implement and test the pilot system.
Two approaches were used to extract relevant sentences, which are, term frequency and sentence position. The Xhosa summarizer is evaluated using a test set. This study has employed both subjective and objective evaluation methods. The results of both methods are satisfactory. Keywords: Xhosa, Automatic Text Summarization, Term Frequency and Sentence Position.
Table of Contents
- Chapter 1: Introduction
- 1.1: Background
- 1.2: Problem Statement
- 1.3: Research Objectives
- 1.4: Research Questions
- 1.5: Significance of the Study
- 1.6: Scope of the Study
- 1.7: Limitations of the Study
- 1.8: Structure of the Thesis
- Chapter 2: Literature Review
- 2.1: Introduction
- 2.2: Text Summarization
- 2.2.1: Types of Text Summarization
- 2.2.2: Techniques for Extractive Summarization
- 2.2.3: Evaluation Metrics for Text Summarization
- 2.3: Natural Language Processing
- 2.3.1: Morphological Analysis
- 2.3.2: Lexical Analysis
- 2.3.3: Syntactic Analysis
- 2.3.4: Semantic Analysis
- 2.4: isiXhosa Language
- 2.4.1: Characteristics of the isiXhosa Language
- 2.4.2: Resources Available for isiXhosa
- 2.5: Related Work
- 2.6: Conclusion
- Chapter 3: Research Methodology
- 3.1: Introduction
- 3.2: Research Design
- 3.3: Data Collection and Preparation
- 3.3.1: Data Source
- 3.3.2: Data Pre-Processing
- 3.4: Development of the Automatic News Summarizer
- 3.4.1: System Architecture
- 3.4.2: Summarization Algorithm
- 3.5: Evaluation Metrics
- 3.6: Ethical Considerations
- 3.7: Conclusion
- Chapter 4: Implementation and Evaluation
- 4.1: Introduction
- 4.2: Implementation of the Summarizer
- 4.3: Evaluation of the Summarizer
- 4.3.1: Experimental Setup
- 4.3.2: Evaluation Results
- 4.4: Discussion of Results
- 4.5: Conclusion
- Chapter 5: Conclusion and Future Work
Objectives and Key Themes
This thesis focuses on developing an automatic news summarizer for the isiXhosa language. The main goal is to address the lack of such systems for this language, enabling efficient information extraction and dissemination. The study aims to achieve this by implementing a system that leverages text summarization techniques and natural language processing approaches. It will be evaluated against standard metrics to assess its effectiveness.
- Automatic Text Summarization for isiXhosa
- Natural Language Processing Techniques
- Development and Evaluation of a Summarizer System
- isiXhosa Language Resources and Challenges
- Contributions to Information Access and Dissemination in isiXhosa
Chapter Summaries
The thesis is structured into five chapters, each exploring different aspects of the research.
- Chapter 1 provides a comprehensive introduction to the topic, outlining the background, problem statement, research objectives, and significance of the study.
- Chapter 2 delves into a thorough literature review, discussing existing research on text summarization, natural language processing, and resources available for isiXhosa.
- Chapter 3 focuses on the research methodology, outlining the design, data collection and preparation, the development of the summarizer system, and evaluation metrics.
- Chapter 4 details the implementation and evaluation of the summarizer, including experimental setup, results, and a discussion of the findings.
Keywords
The key terms that characterize the research are automatic text summarization, isiXhosa language, natural language processing, extractive summarization, evaluation metrics, and information access. The research explores the development and evaluation of a system for extracting concise summaries from isiXhosa news articles, utilizing natural language processing techniques. The focus is on contributing to the field of information retrieval and dissemination for the isiXhosa language.
- Quote paper
- Zukile Ndyalivana (Author), 2017, Development of an automatic news summarizer for isiXhosa language, Munich, GRIN Verlag, https://www.grin.com/document/442361