In today scenario there is abrupt usage of microblogging sites such as Twitter for sharing of feelings and emotions towards any current hot topic, any product, services, or any event. Such opinionated data needs to be leveraged effectively to get valuable insight from that data. This research work focused on designing a comprehensive feature-based Twitter Sentiment Analysis (TSA) framework using the supervised machine learning approach with integrated sophisticated negation handling approach and knowledge-based Tweet Normalization System (TNS). We generated three real-time twitter datasets using search operators such as #Demonetization, #Lockdown, and #9pm9minutes and also used one publically available benchmark dataset SemEval-2013 to assess the viability of our comprehensive feature-based twitter sentiment analysis system on tweets. We leveraged varieties of features such as lexicon-based features, pos-based, morphological, ngrams, negation, and cluster-based features to ascertain which classifier works well with which feature group. We employed three state-of-the-art classifiers including Support Vector Machine (SVM), Decision Tree Classifier (DTC), and Naive Bayesian (NB) for our twitter sentiment analysis framework. We observed SVM to be the best performing classifier across all the twitter datasets except #9pm9minutes (DTC turned out to be the best for this dataset). Moreover, our SVM model trained on the SemEval-2013 training dataset outperformed the winning team NRC Canada of SemEval- 2013 task 2 in terms of macro-averaged F1 score, averaged on positive and negative classes only. Though state-of-the-art twitter sentiment analysis systems reported significant performance, it is still challenging to deal with some critical aspects such as negation and tweet normalization.
Inhaltsverzeichnis (Table of Contents)
- Chapter 1 Introduction
- 1.1. Levels of Sentiment Analysis
- 1.1.1 Document-level Sentiment Analysis
- 1.1.2 Sentence-level Sentiment Analysis
- 1.1.3 Aspect-level Sentiment Analysis
- 1.2. Sentiment Analysis Approaches
- 1.2.1 Knowledge-Based Approach (Lexicon-Based Approach)
- 1.2.2 Statistical Approach
- 1.2.3 Hybrid Approach
- 1.3. History of Sentiment Analysis, Specifically Twitter Sentiment Analysis
- 1.4. Twitter
- 1.5. Need for Twitter Sentiment Analysis
- 1.6. Research Objectives
- 1.6.1 Corpus Creation
- 1.6.2 Tweet Normalization
- 1.6.3 Feature Engineering
- 1.6.4 Negation Handling
- 1.6.5 Training of Classifiers
- 1.6.6 Evaluation of Classifiers
- 1.7. Thesis Organization
- Chapter 2 Literature Review
- 2.1. Twitter Sentiment Analysis
- 2.2. Data Pre-processing
- 2.3. Negation Modelling
- 2.3.1 Forms of Negation
- 2.3.2 Addressing Negation
- 2.3.2.1 Negation Scope Detection
- 2.3.2.2 Negation Handling
- Chapter 3 Experimental Set up
- 3.1. Twitter Corpus
- 3.1.1 Benchmark Twitter Dataset
- 3.1.2 Real-time Twitter Dataset
- 3.1.2.1 Real-Time Twitter Corpus Labelling
- 3.2. Linguistic Resources
- 3.3. Classification Features
- 3.3.1 Ngrams Features
- 3.3.2 POS-Based Features
- 3.3.3 Morphological (Twitter-Specific Features)
- 3.3.4 Cluster features
- 3.3.5 Lexicon-based features
- 3.3.6 Negation Features
- 3.4. Supervised Machine Learning Classifiers
- 3.4.1 Naive Bayesian Classifier
- 3.4.2 Support Vector Machine
- 3.4.3 Decision Tree Classifiers
- 3.5. Evaluation Metrics
- 3.5.1 Accuracy
- 3.5.2 Precision (Positive Predictive Value)
- 3.5.3 Recall (Sensitivity)
- 3.5.4 F1 score
- 3.6. Conclusion
- Chapter 4 Tweet Normalization
- 4.1. Tweet Normalization System (TNS)
- 4.1.1 Phase 1: Basic Cleaning Operations
- 4.1.2 Phase 2: Tweet Normalization
- 4.2. Evaluation of Tweet Normalization System (TNS)
- 4.2.1 TNS Evaluation Result on #Demonetization Corpus
- 4.2.2 TNS Evaluation Result on #Lockdown Corpus
- 4.2.3 TNS Evaluation Result on #9pm9minutes Corpus
- 4.2.4 TNS Evaluation Result on Twitter SemEval-2013 Dataset
- 4.3. Conclusion
- Chapter 5 Negation Handling
- 5.1. Types of Negation
- 5.2. Phases of Modelling Syntactic Negation
- 5.2.1 Negation Cue Identification
- 5.2.2 Negation Scope Detection
- 5.2.3 Handling the Negated Context Words
- 5.3. Proposed Algorithm for Negation Exception Cases
- 5.4. Evaluation of Negation Exception Algorithm (NEA)
- 5.4.1 Evaluation of NEA on #Demonetization Corpus
- 5.4.2 Evaluation of NEA on #Lockdown dataset
- 5.4.3 Evaluation of NEA on #9pm9minutes Dataset
- 5.4.4 Evaluation of NEA on SemEval-2013 Twitter Dataset
- 5.5. Conclusion
- Chapter 6 Classifiers Training and Evaluation
- 6.1. Training of Classifiers
- 6.2. Classifier Evaluation Results on Real-time Twitter Datasets
- 6.2.1 Evaluation on #Demonetization Corpus
- 6.2.2 Evaluation on #Lockdown Corpus
- 6.2.3 Evaluation on #9pm9minutes Corpus
- 6.3. Evaluation on Benchmark SemEval-2013 Twitter Dataset
- 6.4. Contribution of Negation Handling Approach with Incorporated Negation Exception Algorithm on Classifiers Performance
- 6.5. Contribution of Each Pre-processing Modules on Classifiers Performance
- 6.6. Conclusion
- Chapter 7 Conclusion
- 7.1. Corpus Creation
- 7.2. Tweet Normalization System (TNS)
- 7.3. Negation Modelling
- 7.4. Feature Engineering
- 7.5. Classification Result
- 7.6. Future Work
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This research aims to design a comprehensive, feature-based Twitter Sentiment Analysis (TSA) framework that effectively handles the inherent challenges of unstructured and noisy Twitter data. The framework uses supervised machine learning, incorporating sophisticated negation handling and a knowledge-based Tweet Normalization System (TNS).
- Analyzing the effectiveness of various state-of-the-art classifiers (SVM, NB, DTC) on different feature sets.
- Developing and evaluating a knowledge-based Tweet Normalization System (TNS) to clean and normalize Twitter data.
- Creating and implementing a robust negation handling approach that accounts for negation exception cases.
- Assessing the impact of each pre-processing module on classifier performance.
- Investigating the influence of negation handling on sentiment classification accuracy.
Zusammenfassung der Kapitel (Chapter Summaries)
- Chapter 1: Introduction
- Chapter 2: Literature Review
- Chapter 3: Experimental Setup
- Chapter 4: Tweet Normalization
- Chapter 5: Negation Handling
- Chapter 6: Classifiers Training and Evaluation
This chapter introduces the concept of sentiment analysis, its different levels (document, sentence, and aspect-level), and various approaches to analyzing sentiments, including lexicon-based, statistical, and hybrid methods. The chapter then focuses on the specific challenges of Twitter sentiment analysis and the need for comprehensive frameworks to handle the unstructured nature of tweets. The research objectives and thesis organization are also outlined.
This chapter provides a comprehensive review of existing research on sentiment analysis, focusing on Twitter sentiment analysis. The chapter highlights various studies that have explored different techniques and features, including machine learning approaches, lexicon-based methods, and deep learning. The chapter also discusses the critical aspects of tweet pre-processing and negation handling, exploring different methods and their limitations.
This chapter details the experimental setup used for the research, outlining the datasets used (including real-time Twitter datasets and the SemEval-2013 benchmark dataset). The chapter explains the process of labeling real-time Twitter data using automatic clustering and manual analysis. It also presents the linguistic resources developed and used, including the Tweet Normalization System, POS tagger, and negation handling algorithm. The chapter concludes by discussing the features extracted and the classifiers employed in the study.
This chapter focuses on the Tweet Normalization System (TNS) developed for cleaning and normalizing Twitter data. The TNS consists of two phases: basic cleaning operations and normalization of non-standard words. The chapter presents a detailed description of the TNS and its accuracy on different datasets, showing its effectiveness in preparing Twitter data for sentiment classification.
This chapter explores the process of modeling negation in Twitter sentiment analysis, describing various forms of negation and outlining the phases of negation handling: cue identification, scope detection, and handling negated context words. The chapter then focuses on the negation exception cases, where negation cues are present but do not convey a sense of negation. The chapter presents a proposed algorithm for identifying these cases and evaluates its accuracy on different datasets.
This chapter discusses the training and evaluation of three state-of-the-art classifiers (SVM, NB, and DTC) on both real-time and benchmark Twitter datasets. The chapter analyzes the performance of classifiers with different feature groups and investigates the impact of negation handling, particularly the negation exception algorithm. The chapter presents a comprehensive analysis of the contribution of each pre-processing module to classifier performance, demonstrating the overall effectiveness of the proposed TSA framework.
Schlüsselwörter (Keywords)
This research focuses on Twitter Sentiment Analysis (TSA), leveraging various features, and addresses critical NLP challenges like negation and tweet normalization. Key terms and concepts include: Sentiment Analysis, Twitter Sentiment Analysis, Negation Modelling, Negation Exception Case, Negation Exception Algorithm, Tweet Normalization System, Supervised Machine Learning, Real-Time Twitter Dataset, Benchmark Twitter Dataset, Corpus-Based Statistical Approach, Reverse Polarity, Negation Cue, and Feature Engineering.
- Quote paper
- Manu Banga (Author), A Comprehensive Approach on Sentiment Analysis & Prediction, Munich, GRIN Verlag, https://www.grin.com/document/1315485