This paper seeks to computationally advance the Yoruba language by designing its rule based tagger. The work adopts Standard Theory and Principles and Parameters Theory to segment and instruct the computer system of the syntactic structure of the language through Prolog. Some hundred Yoruba words are coded to serve as lexicon or dictionary.

Through the words, some syntactic rules are as well programmed. The work tags Yoruba parts of speech of non-derivative sentences. It reveals that not all Yoruba NPs can complement prepositions in prepositional phrases (PPs). It is however made known that there is need to reclassify Yoruba words so as to enable machines like computers to generate grammatically acceptable Yoruba sentences.

Extrait

1. Introduction

2. The Yorùbá Language

3. Literature Review

4. Needs for Yorùbá Tagger

5. Significance of Tagging Yorùbá Lexical Items

5.1 Computer and the Yorùbá Syntax

5.2 Communication with Computer System

5.3 Easy Computerization of Yorùbá

5.4 Provision of Word’s Information

5.5 Lexical Disambiguation

5.6 Stemming for Informational Retrieval

6. Algorithms of Designing Yorùbá Parts of Speech Tagger

7. Library File

8. Tagged Yorùbá Expressions

9. Conclusion

10. Limitation of the Study

Research Objective and Core Themes

The primary objective of this paper is to computationally advance the Yorùbá language by designing and implementing a rule-based part-of-speech (POS) tagger. The research addresses the challenge of linguistically processing Yorùbá simple sentences by utilizing Prolog to encode syntactic structures and lexical knowledge, thereby enabling machines to generate and identify grammatically acceptable sentences.

Computational linguistic development of Yorùbá
Implementation of a rule-based POS tagging system using Prolog
Linguistic analysis of Yorùbá sentence structure (NP and VP)
Exploration of headedness and parametric variation in Yorùbá syntax
Development of a lexicon and syntactic database for NLP applications

Excerpt from the Book

Significance of Tagging Yorùbá Lexical Items

There are many important advantages to Yorùbá people and their language if the language’s words are tagged computationally, some of these include:

i. Computer and the Yorùbá Syntax: This work will assist to programme computer to identify the Yorùbá language syntax. This, in the long run, will enable the computer systems understand the Yorùbá language, and however, the language can now compete favorably with the so called “mainstream languages”.

ii. Communication with Computer System: Tagging Yorùbá words will enable users irrespective of their educational background and level use as well as instruct computer systems for basic necesities. All they need do is to be able to identify some alphabet of their language and little numerical items like 1, 2, 3 et cetera. Consequently, some worthy information will be available for farmers and market people who may not have the capability to read English language.

iii. Easy Computerization of Yorùbá: Tagging Yorùbá words will undoubtedly facilitate some other works to computerize the language. Adedjouma et al (2013) state that “it is well known that taggers are the essential elements in the development of any serious application in this field”. This work will help by providing a stepping stone for Yorùbá Language Processing (YLP) based applications and a methodology for a similar work on other languages.

Chapter Summaries

Introduction: Provides a definition of POS tagging and establishes its importance as a critical component in modern Natural Language Processing (NLP) applications.

The Yorùbá Language: Offers an overview of the Yorùbá language's demographic spread, its classification within the Niger-Congo phylum, and its status as a major language in Nigeria.

Literature Review: Examines previous efforts to computerize the Yorùbá language, including limitations in existing studies and specific linguistic anomalies regarding phonetics and tonemes.

Needs for Yorùbá Tagger: Argues that integrating Yorùbá into Information and Communication Technology (ICT) is essential for the language's survival and profile in the digital age.

Significance of Tagging Yorùbá Lexical Items: Outlines the practical benefits of the tagger, including syntax identification, improved human-computer communication, and aid for linguistic research.

Algorithms of Designing Yorùbá Parts of Speech Tagger: Details the theoretical foundation of the tagger, employing Immediate Constituent Analysis (ICA) and Head Parameter (HP) theory implemented via Prolog.

Library File: Describes the technical setup of the dictionary file, including the classification of lexical items and the use of Definite Clause Grammar (DCG) for parsing.

Tagged Yorùbá Expressions: Presents test results of the developed tagger applied to various non-derivative Yorùbá sentences.

Conclusion: Summarizes the findings, confirming the feasibility of rule-based tagging for Yorùbá while noting the complexity of rule specification.

Limitation of the Study: Discusses the current scope of the tagger, noting that it currently excludes certain derivational sentences and fragments.

Keywords

Yorùbá Language, Parts of Speech Tagging, Rule-based POS Tagging, Computational Linguistics, Natural Language Processing, Prolog, Syntax, Lexicon, Headedness, Immediate Constituent Analysis, Definite Clause Grammar, Language Technology, Machine Translation, Information Retrieval, Linguistic Modeling.

Frequently Asked Questions

What is the fundamental purpose of this research?

The research aims to computationally advance the Yorùbá language by creating a rule-based part-of-speech (POS) tagger to facilitate its integration into modern computing and NLP applications.

What are the primary themes discussed?

The work focuses on linguistic analysis of Yorùbá syntax, the application of computational theories like Standard Theory (ST) and Head Parameter (HP), and the practical implementation of a tagging system using Prolog.

What is the central research goal?

The primary goal is to successfully code Yorùbá syntactic structures to enable a computer to identify and tag lexical items within simple sentences.

Which scientific method is utilized in this study?

The study employs Immediate Constituent Analysis (ICA) of the Standard Theory, implemented through the Prolog programming language, specifically utilizing Definite Clause Grammar (DCG).

What topics are covered in the main section of the paper?

The paper covers the theoretical foundation of Yorùbá grammar, the technical requirements for computerization, the algorithmic design of the tagger, and an empirical evaluation of the system using sample sentences.

Which keywords characterize this work?

Key terms include Yorùbá Language, Rule-based POS Tagging, Computational Linguistics, Prolog, Syntax, and NLP.

How does the tagger handle ambiguity in Yorùbá?

The system uses lexical disambiguation by analyzing the linguistic features of a word, including its preceding and following context, to assign the correct tag.

What role does Prolog play in this project?

Prolog serves as the operational environment where lexical facts and syntactic rules are encoded, allowing the system to parse sentences and query the database for structural information.

What are the limitations of the current model?

The current tagger focuses on non-derivative simple sentences and cannot yet handle more complex fragments or specific derivational linguistic structures.

Fin de l'extrait de 15 pages - haut de page

Résumé des informations

Titre: Rule Based Parts of Speech Tagging of Yorùbá Simple Sentences
Auteur: Abiola Oyelere (Auteur)
Année de publication: 2013
Pages: 15
N° de catalogue: V347022
ISBN (ebook): 9783668369788
ISBN (Livre): 9783668369795
Langue: anglais
mots-clé: rule based parts speech tagging yorùbá simple sentences
Sécurité des produits: GRIN Publishing GmbH

Citation du texte: Abiola Oyelere (Auteur), 2013, Rule Based Parts of Speech Tagging of Yorùbá Simple Sentences, Munich, GRIN Verlag, https://www.grin.com/document/347022

Rule Based Parts of Speech Tagging of Yorùbá Simple Sentences