Before starting the research for the ways to extract proper nouns, we should ask ourselves what is proper noun. According to British council learning center, proper nouns are the names of organizations including companies, places, and people. However, what is the benefit of proper nouns? They give the reader a surface knowledge of what is going on in any given content. Sometimes, the text or sentence is worthless without the specific nouns, which include. People usually read articles for the famous people, places, or organizations they speak about.
Many studies in English language have been conducted in the Name recognition of proper nouns while in Arabic it is still at its early stages. The coding files can meet the goal of extracting proper nouns but not even close to 100% accuracy. By the use of modules that understand the sentence structure as Lingua::EN::NamedEntity, the result would score a higher accuracy, however, does not achieve a satisfactory results. In addition, the Arabic systems of its both methods: rule-based or machine learning are still no comparable to systems that have been developed in the English language, which opens a wide scope for researches to work on NERA (Name Entity Recognition in Arabic language).
Inhaltsverzeichnis (Table of Contents)
- Introduction
- Body Background
- Lingua-EN-NamedEntity
- Blacklist Method
- Lingua::LinkParser
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This research investigates the extraction of proper nouns from English and Arabic texts, exploring different approaches and evaluating their efficiency and accuracy. It aims to identify effective methods for identifying proper nouns within electronic text documents.
- Extraction of proper nouns from electronically stored texts
- Comparison of different methods for proper noun extraction
- Evaluation of accuracy and efficiency of methods
- Exploration of the use of Perl and PHP programming languages
- Development of a contextual method for proper noun extraction in English.
Zusammenfassung der Kapitel (Chapter Summaries)
- Introduction: This chapter introduces the concept of proper nouns and their significance in text analysis. It highlights the importance of proper noun extraction for understanding text content and provides an overview of existing research in English and Arabic languages.
- Body Background: This chapter focuses on the use of Perl language modules for proper noun extraction. It discusses the importance of parsing modules and introduces the Lingua-EN-NamedEntity module as a potential tool for achieving the research objective.
- Lingua-EN-NamedEntity: This chapter presents the results of applying the Lingua-EN-NamedEntity module to a test text ("The inside story of the GOP's Alabama meltdown"). It evaluates the accuracy and limitations of the module, highlighting its inability to extract all proper nouns correctly.
- Blacklist Method: This chapter describes an attempt to improve proper noun extraction accuracy by incorporating a wordlist into the code. It outlines the challenges of using a wordlist to filter out non-proper nouns and discusses the resulting limitations.
- Lingua::LinkParser: This chapter introduces the Link Grammar approach to natural language processing and its potential for proper noun extraction. It discusses the strengths and weaknesses of regular expressions and Link Grammar in identifying and understanding proper nouns.
Schlüsselwörter (Keywords)
This research focuses on the extraction of proper nouns from electronic texts in English and Arabic languages. The study utilizes Perl and PHP programming languages to develop and evaluate different methods, including Lingua-EN-NamedEntity, blacklist method, and Link Grammar. Key terms include proper noun extraction, natural language processing (NLP), Named Entity Recognition, Perl modules, regular expressions, Link Grammar, contextual method, and wordlists.
- Quote paper
- Marwan Al Omari (Author), 2018, Extracting Proper nouns in Electronically English and Arabic Texts. Theoretical Background and Practices, Munich, GRIN Verlag, https://www.grin.com/document/1215103