The number of Amharic documents on the Web is increasing as many newspaper publishers started providing their services electronically. The unavailability of tools for extracting and exploiting the valuable information from Amharic text, which is effective enough to satisfy the users has been a major problem and manually extracting information from a large amount of unstructured text is a very tiresome and time consuming job, this was the main reason which motivate the researcher to engage in this research work. The overall objective of the research was to develop information extraction system for the Amharic vacancy announcement text. 116 Amharic vacancy announcement texts which contain 10,766 words were collected from the ―Ethiopian reporter‖ newspaper published in Amharic twice in week. For this study, nine candidate texts are selected from Amharic vacancy announcement text, these are organization, position, qualification, experience, salary, number of people required, work agreement, deadline and phone number. The experiments have been carried out on each component of a system separately to evaluate its performance on each components, this helps us to identify drawbacks and give some clue for future works.
The experimental result shows, an overall F - measure of 71.7% achieved. In order to make the system to be applicable in this domain which is Amharic vacancy announcement,

Excerpt

Inhaltsverzeichnis (Table of Contents)

CHAPTER ONE. INTRODUCTION..
- 1.1. GENERAL BACKGROUND
- 1.2. STATEMENT OF THE PROBLEM..
- 1.3. OBJECTIVE OF THE STUDY.
- 1.4. METHODOLOGY.
- 1.5. APPLICATION OF RESULTS AND BENEFICIARIES
- 1.6. SCOPE AND LIMITATIONS OF THE STUDY
- 1.7. ORGANIZATION OF THE STUDY..
CHAPTER TWO..... LITERATURE REVIEW
- 2.1. INTRODUCTION..
- 2.2. INFORMATION EXTRACTION (IE).
- 2.3. BUILDING INFORMATION EXTRACTION SYSTEMS..
- 2.4. ARCHITECTURE OF INFORMATION EXTRACTION SYSTEM
- 2.5. PREPROCESSING OF INPUT TEXTS.
- 2.6. LEARNING AND APPLICATION OF THE EXTRACTION MODEL
- 2.7. POST PROCESSING OF OUTPUT.
- 2.8. RELATED NLP FIELDS TO INFORMATION EXTRACTION..
- 2.9. INFORMATION EXTRACTION (IE) AND INFORMATION RETRIEVAL (IR) ..
- 2.10. EVALUATION OF INFORMATION EXTRACTION
- 2.11. RELATED WORKS.
CHAPTER THREE THE AMHARIC WRITING SYSTEM...
- 3.1. INTRODUCTION.
- 3.2. AMHARIC CHARACTER REPRESENTATION AND WRITING SYSTEM
- 3.3. AMHARIC PUNCTUATION MARKS AND NUMERALS...
- 3.4. CHARACTERISTICS OF THE AMHARIC WRITING SYSTEM.
- 3.5. THE MORPHOLOGY OF AMHARIC
- 3.6. GRAMMATICAL STRUCTURE OF AMHARIC...
- 3.7. SENTENCES IN AMHARIC..
CHAPTER FOUR DESIGN AND IMPLEMENTATION OF AVATIES....
- 4.1. INTRODUCTION.
- 4.2. PROPOSED MODEL...
- THE PROTOTYPE SYSTEM ..
CHAPTER FIVE.... RESULT AND EVALUATION
- 5.1. INTRODUCTION....
- 5.2. EVALUATION METRICS.
- 5.3. THE DATASETS...
- 5.4. EXPERIMENTAL RESULT AND EVALUATION EACH COMPONENT OF OUR SYSTEM.

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This thesis focuses on designing and implementing an information extraction system specifically for Amharic vacancy announcement texts. The goal is to develop a system that can automatically extract relevant information from these texts, facilitating easier job searching and analysis. The key themes explored in this work include:

Information Extraction (IE) techniques and their application in Amharic.
The challenges and complexities of working with Amharic language data.
The design and development of a prototype system (AVATIES) tailored for Amharic vacancy announcement texts.
Evaluation methodologies for information extraction systems, particularly in the context of Amharic.
The potential impact and benefits of this system for the Amharic job market.

Zusammenfassung der Kapitel (Chapter Summaries)

**Chapter One: Introduction** This chapter lays the foundation for the research, introducing the general background, problem statement, and objectives of the study. It also outlines the methodology, the application of the findings, and the scope and limitations of the research.
**Chapter Two: Literature Review** This chapter delves into the existing literature on information extraction (IE), covering various approaches, architectures, preprocessing techniques, evaluation methods, and related NLP fields like information retrieval and text summarization. It also discusses related works in information extraction from different languages, including English, Chinese, and Amharic.
**Chapter Three: The Amharic Writing System** This chapter examines the complexities of the Amharic writing system, providing insights into its character representation, punctuation, morphological structure, and grammatical rules. This understanding is crucial for developing an information extraction system for Amharic.
**Chapter Four: Design and Implementation of AVATIES** This chapter presents the proposed model for the Amharic Text Information Extraction system (AVATIES). It describes the architecture of the system, which includes data preprocessing, learning and extraction components, and post-processing. The chapter also discusses the development of the prototype system.
**Chapter Five: Result and Evaluation** This chapter presents the results of the evaluation of the AVATIES system. It discusses the evaluation metrics used, the datasets employed, and the experimental results of each component of the system. The chapter analyzes the performance of the system in terms of candidate text extraction, organization and position extraction, and overall accuracy.

Schlüsselwörter (Keywords)

The core keywords and focus topics of this work include Information Extraction, Amharic Text Processing, Natural Language Processing (NLP), Amharic Vacancy Announcement Texts, AVATIES, Data Preprocessing, Machine Learning, Evaluation Metrics, and Prototype System Development. The research centers on the development and evaluation of a specialized information extraction system for Amharic vacancy announcements, addressing the linguistic challenges and potential benefits for the Amharic job market.

Frequently Asked Questions

What is the main goal of the research?

The research aims to develop an information extraction system specifically for Amharic vacancy announcement texts to automate the extraction of job-related data.

What is AVATIES?

AVATIES stands for Amharic Text Information Extraction system, a prototype model designed to extract nine candidate texts like organization, position, and salary.

Why is Amharic challenging for information extraction?

Amharic has a complex writing system, unique punctuation, and a sophisticated morphological and grammatical structure that requires specialized preprocessing.

What was the experimental result of the system?

The experimental evaluation achieved an overall F-measure of 71.7% in extracting relevant information from the collected datasets.

Where was the data for this study collected from?

The data consisted of 116 Amharic vacancy announcement texts collected from the "Ethiopian Reporter" newspaper.

Excerpt out of 105 pages - scroll top

Details

Title: Designing an Information Extraction System for Amharic Vacancy Announcement Text
College: Addis Ababa University
Course: NAtural Language processing
Grade: Very Good
Author: Sintayehu Hirpassa (Author)
Publication Year: 2011
Pages: 105
Catalog Number: V289226
ISBN (eBook): 9783656895565
ISBN (Book): 9783656895572
Language: English
Tags: information extraction
Product Safety: GRIN Publishing GmbH

Quote paper: Sintayehu Hirpassa (Author), 2011, Designing an Information Extraction System for Amharic Vacancy Announcement Text, Munich, GRIN Verlag, https://www.grin.com/document/289226

Designing an Information Extraction System for Amharic Vacancy Announcement Text