This research thesis focuses on developing as well as deploying efficient and proficient OM model for automatically processing Arabizi language system in the context of both public and private customer service providers in Lebanon country. Service providers include restaurants, hotels, shopping centers, governmental institutions, etc... Arabizi corpus of 2635 text reviews, which is essential for the building of the OM model, was gathered through crawling pages of service providers in Facebook, Google and Zomato websites over a period of time from April 4, 2018 to October 30, 2018.

The main aim of this research thesis is to give credit to the Arabizi language users’ feelings and thoughts in Lebanon territory by extracting sentimental knowledge out of expressed sequences of texts in positive or negative impressions. In addition, it is necessary to highlight the challenges that underpin this language system for the public and researchers most particularly to further their research studies on. Moreover, it is crucial to distinguish Arabizi, particularly in the Lebanese context; therefore, it would be a startup point for other researches to build on. Furthermore, this research experiments the machine capabilities on tasks for sentiment predication and classification in the Lebanese Arabizi. And, this thesis is purposeful to build a dataset that contains reliable Arabizi reviews, which could be used for further researches. Researchers could be working on the expansion of this corpus, too. In general, it is important to classify the outstanding number of Arabizi sentences, which could be of great help for media offices, government centers, research facilities, and start-ups businesses in knowledge-making and future current-based predication tasks.

Because of the unavailability of SA tools for automatically processing Arabizi language system, building a one is of highly importance. For this, Arabizi language system would be in a place of recognition in the SA field with the increase of internet users, who currently use it and would use in the future. In addition, it would help companies, institutions, small businesses in extracting sentiments of positives and negatives much more efficiently in text reviews written in Arabizi; therefore, they would reflect on enhancing the qualities of their provided services.

Leseprobe

Chapter I

1.1 Introduction

1.2 Problem Statement

1.3 Purpose of the Study

1.4 Research Questions

1.5 Research Hypotheses

1.6 Significance of the Study

1.7 Limitations of the Study

1.8 Challenges of the study

1.9 Research contributions

1.10 Key Terms

1.10.1 Sentiment Analysis

1.10.2 Natural Language Processing (NLP)

1.10.3 Arabizi NLP

1.10.4 Classifier

1.10.5 Big Data

1.10.6 Machine Learning Classifier

1.10.7 Lexicon-based Classifier

1.10.8 Customer Review

1.11 Research Outline

Chapter II

2.1 Literature Review

2.2 Natural Language Processing (NLP)

2.3 Big Data and Sentiment Analysis (SA)

2.4 Approaches to SA

2.4.1 Lexicon-Based Approach

2.4.2 Machine Learning Approach

2.4.3 Hybrid Approach

2.5 Arabizi and the Lebanese Dialect

2.6 Sentiment Analysis and Lebanese Arabizi

Chapter III

3.1 Research Methodology

3.2 Research Design

3.3 Research Sample

3.3.1 The Challenges of Analyzing Arabizi Texts

3.4 Data Preprocessing and Filtering

3.4.1 Removal of reviews with “neutral” sentiment

3.4.2 Ratings’ Encodings

3.4.3 Data splitting for training and testing

3.4.4 Data Cleaning

3.5 Reviews Representation

3.5.1 Selected Features

3.6 Research Tools

3.6.1 Machine Learning Classifier

3.6.2 Lexicon-based Classifier

Chapter IV

3.7 Research Procedure

Chapter IV

4.1 Experiment Preparation

4.2 Data Preprocessing

4.3 Feature Extraction

4.4 Building Classifiers

4.4.1 Machine Learning

4.4.2 Lexicon-based

4.5 Results and Evaluation

Chapter V

5.1 Research Result

5.2 Machine Learning

5.2.1 First phase (Default settings)

5.2.2 Second phase (hyperparameters tuning settings)

5.2.3 Experiment Summary

5.3 Lexicon-based

5.3.1 Experiment Summary

5.4 Discussion

Chapter VI

6.1 Conclusion

6.2 Future Work

Research Objectives and Themes

This study aims to develop and deploy an efficient sentiment analysis model tailored for the Arabizi language system, specifically within the context of customer service reviews in Lebanon. By leveraging both supervised machine learning (using Logistic Regression) and a lexicon-based approach (using the Science of Language and Communication Semantic Analysis System), the research addresses the lack of automated processing tools for this informal, Latin-scripted dialect.

Sentiment Analysis for Arabizi (informal Arabic in Latin characters).
Comparison of Machine Learning (Logistic Regression) and Rule-based (Lexicon) classification.
Processing of user-generated content from social media platforms (Facebook, Google, Zomato).
Hyperparameter tuning of Logistic Regression models for optimized classification performance.
Linguistic analysis of common Arabizi challenges such as code-switching, exaggerations, and orthographic variations.

Excerpt from the Book

1.1 Introduction

Nowadays, the huge flow of unstructured (unlabeled) data of about forty thousand exabytes that speculated to reach in the early 2020 (Gantz and Reinsel, 2012), with the presence of the World Wide Web (WWW), has attracted a large number of data mining researchers for the aim of extracting vivid knowledge and other useful information for making sense of what the people feel and reckon in the virtual space (Waters, 2010). For such big data analysis, Sentiment Analysis (SA) or opinion mining (OM) is a major concern for opinion analytic and extraction from sequences of texts in forms of reviews, discussions, and blogs (Pang and Lee, 2008). SA is one of multidisciplinary research field that includes NLP, Computational Linguistics (CL), Information Retrieval (IR) or Extraction (IE), ML, DL, and Artificial Intelligence (AI) (Feldman, 2013). Concerning emotion understanding and identification in the depth of Computer-mediated Communication (CMC), SA is most practical and useful to carry on because it fills the gap between machine’s understanding and human natural language by giving it the ability to identify and grasp sentimental information through written expressions associated within the big data by classifying and processing language and utterance into one of SA predefined classes, for example, positive, neutral, or negative one (Duwairi et al., 2016).

One of the most popular and used social media application in the Arab world is Facebook. It shows a continuous increase in its users, reaching to about 116 and a half million in the Middle Eastern countries, and specifically 360 thousand in Lebanon solely at the beginning of 2018 (“Middle East Internet Statistics”, 2018). Accordingly, users generate continues flow of data in every day’s basis that are characterized as growing mountains fueled with opinions: reviews, ratings, recommendations, and other useful information (Wright, 2009), especially on public and private services including food, education, hotel, resort, product, shop, and restaurant, etc. (Agarwal et al., 2015). However, various-shaped challenges associated within the folds of the generated big data while attempting to automatically process such in NLP tasks for the sake of knowledge-making and further decision-making in terms of data size, language dialect, and the complexity of linguistics form and nature (phonology, morphology, syntax, semantic and pragmatic) (Elgendy and Elragal, 2014).

Summary of Chapters

Chapter I: This chapter provides an introduction to the research, defining the problem of lacking Arabizi sentiment analysis tools and outlining the study's research questions and objectives.

Chapter II: This chapter reviews the literature on Natural Language Processing (NLP), Big Data, and various sentiment analysis approaches, including machine learning and lexicon-based methods.

Chapter III: This chapter details the research methodology, including the design, data collection from social media, and the specific linguistic challenges associated with analyzing Arabizi texts.

Chapter IV: This chapter explains the research procedure, focusing on experiment preparation, data preprocessing, feature extraction, and the construction of both machine learning and lexicon-based classifiers.

Chapter V: This chapter presents the experimental results and evaluations, comparing the performance of different models based on precision, recall, and f1-scores.

Chapter VI: This chapter concludes the research, confirming the applicability of the proposed models and suggesting directions for future research in Arabizi natural language processing.

Keywords

Sentiment Analysis, Natural Language Processing, Arabizi NLP, Classifier, Big Data, Machine Learning, Logistic Regression, Lexicon-based, Customer Review, Computational Linguistics, Opinion Mining, Feature Extraction, Data Preprocessing, Sentiment Classification, Arabic Dialects.

Frequently Asked Questions

What is the core focus of this thesis?

The thesis focuses on building and comparing sentiment analysis tools specifically for the Lebanese Arabizi language system, using customer reviews from public and private service sectors.

What are the primary thematic areas addressed?

The main themes include Natural Language Processing (NLP), Arabizi dialect peculiarities, machine learning classification, lexicon-based rule construction, and big data management within the context of Lebanese customer feedback.

What is the primary research goal?

The goal is to bridge the gap in sentiment analysis for Arabizi by proposing an efficient automated classification system and creating a reliable dataset for future research.

Which scientific methods are utilized?

The study utilizes an experimental, quantitative, and descriptive approach, specifically testing supervised machine learning (Logistic Regression) alongside a rule-based lexicon classifier (SLCSAS).

What is covered in the main section of the paper?

The main part of the paper details the data preprocessing techniques, the construction of BoW and TF*IDF features, the training of classification models, and the comprehensive evaluation of their performance.

Which keywords characterize this research?

Key terms include Sentiment Analysis, Arabizi NLP, Machine Learning, Logistic Regression, Lexicon-based Classification, and Opinion Mining.

What is Arabizi?

Arabizi is an informal slang system used by Arabic speakers to write Arabic using English (Latin) characters, often encountered in computer-mediated communication like social media.

How did the author handle the lack of annotated datasets?

The author performed manual data collection and extraction from Facebook, Google, and Zomato, creating a unique dataset of 2635 reviews which were subsequently preprocessed for training and testing.

How does the Lexicon-based classifier function in this study?

The SLCSAS (Science of Language and Communication Semantic Analysis System) uses manually hand-crafted grammar rules, a dictionary, and a semantic map to categorize sentiment based on specific keywords and linguistic markers found in the text.

What was the outcome of the hyperparameter tuning?

The experiments demonstrated that hyperparameter tuning, particularly for the TF*IDF-based Logistic Regression model, significantly enhanced the classifier's performance, making it the most accurate solution presented in the study.

Ende der Leseprobe aus 109 Seiten - nach oben

Details

Titel: Feelings and thoughts of Arabizi language users in Lebanon territory. An efficient OM model for automatically processing Arabizi language system
Note: 89
Autor: Marwan Al Omari (Autor:in)
Erscheinungsjahr: 2019
Seiten: 109
Katalognummer: V537244
ISBN (eBook): 9783346134141
ISBN (Buch): 9783346134158
Sprache: Englisch
Schlagworte: Sentiment analysis Natural language processing Machine learning
Produktsicherheit: GRIN Publishing GmbH

Arbeit zitieren: Marwan Al Omari (Autor:in), 2019, Feelings and thoughts of Arabizi language users in Lebanon territory. An efficient OM model for automatically processing Arabizi language system, München, GRIN Verlag, https://www.grin.com/document/537244

Feelings and thoughts of Arabizi language users in Lebanon territory. An efficient OM model for automatically processing Arabizi language system