A Comprehensive Approach on Sentiment Analysis & Prediction


191 Pages


Table of Contents



List of Figures

List of Tables


Chapter 1 Introduction
1.1. Levels of Sentiment Analysis
1.1.1 Document-level Sentiment Analysis
1.1.2 Sentence-level Sentiment Analysis
1.1.3 Aspect-level Sentiment Analysis
1.2. Sentiment Analysis Approaches
1.2.1 Knowledge-Based Approach (Lexicon-Based Approach)
1.2.2 Statistical Approach
1.2.3 Hybrid Approach
1.3. History of Sentiment Analysis, Specifically Twitter Sentiment Analysis
1.4. Twitter
1.5. Need for Twitter Sentiment Analysis
1.6. Research Objectives
1.6.1 Corpus Creation
1.6.2 Tweet Normalization
1.6.3 Feature Engineering
1.6.4 Negation Handling
1.6.5 Training of Classifiers
1.6.6 Evaluation of Classifiers
1.7. Thesis Organization

Chapter 2 Literature Review
2.1. Twitter Sentiment Analysis
2.2. Data Pre-processing
2.3. Negation Modelling
2.3.1 Forms of Negation
2.3.2 Addressing Negation Negation Scope Detection Negation Handling

Chapter 3 Experimental Set up
3.1. Twitter Corpus
3.1.1 Benchmark Twitter Dataset
3.1.2 Real-time Twitter Dataset Real-Time Twitter Corpus Labelling
3.2. Linguistic Resources
3.3. Classification Features
3.3.1 Ngrams Features
3.3.2 POS-Based Features
3.3.3 Morphological (Twitter-Specific Features)
3.3.4 Cluster features
3.3.5 Lexicon-based features
3.3.6 Negation Features
3.4. Supervised Machine Learning Classifiers
3.4.1 Naive Bayesian Classifier
3.4.2 Support Vector Machine
3.4.3 Decision Tree Classifiers
3.5. Evaluation Metrics
3.5.1 Accuracy
3.5.2 Precision (Positive Predictive Value)
3.5.3 Recall (Sensitivity)
3.5.4 F1 score
3.6. Conclusion

Chapter 4 Tweet Normalization
4.1. Tweet Normalization System (TNS)
4.1.1 Phase 1: Basic Cleaning Operations
4.1.2 Phase 2: Tweet Normalization
4.2. Evaluation of Tweet Normalization System (TNS)
4.2.1 TNS Evaluation Result on #Demonetization Corpus
4.2.2 TNS Evaluation Result on #Lockdown Corpus
4.2.3 TNS Evaluation Result on #9pm9minutes Corpus
4.2.4 TNS Evaluation Result on Twitter SemEval-2013 Dataset
4.3. Conclusion

Chapter 5 Negation Handling
5.1. Types of Negation
5.2. Phases of Modelling Syntactic Negation
5.2.1 Negation Cue Identification
5.2.2 Negation Scope Detection
5.2.3 Handling the Negated Context Words
5.3. Proposed Algorithm for Negation Exception Cases
5.4. Evaluation of Negation Exception Algorithm (NEA)
5.4.1 Evaluation of NEA on #Demonetization Corpus
5.4.2 Evaluation of NEA on #Lockdown dataset
5.4.3 Evaluation of NEA on #9pm9minutes Dataset
5.4.4 Evaluation of NEA on SemEval-2013 Twitter Dataset
5.5. Conclusion

Chapter 6 Classifiers Training and Evaluation
6.1. Training of Classifiers
6.2. Classifier Evaluation Results on Real-time Twitter Datasets
6.2.1 Evaluation on #Demonetization Corpus
6.2.2 Evaluation on #Lockdown Corpus
6.2.3 Evaluation on #9pm9minutes Corpus
6.3. Evaluation on Benchmark SemEval-2013 Twitter Dataset
6.4. Contribution of Negation Handling Approach with Incorporated Negation Exception Algorithm on Classifiers Performance
6.5. Contribution of Each Pre-processing Modules on Classifiers Performance
6.6. Conclusion

Chapter 7 Conclusion
7.1. Corpus Creation
7.2. Tweet Normalization System (TNS)
7.3. Negation Modelling
7.4. Feature Engineering
7.5. Classification Result
7.6. Future Work



Due to the progression of technology, there is abrupt usage of microblogging sites such as Twitter for sharing of feelings and emotions towards any current hot topic, any product, services, or any event. Such opinionated data needs to be leveraged effectively to get valuable insight from that data. This research work focused on designing a comprehensive feature-based Twitter Sentiment Analysis (TSA) framework using the supervised machine learning approach with integrated sophisticated negation handling approach and knowledge-based Tweet Normalization System (TNS). We generated three real-time twitter datasets using search operators such as #Demonetization, #Lockdown, and #9pm9minutes and also used one publically available benchmark dataset SemEval-2013 to assess the viability of our comprehensive feature-based twitter sentiment analysis system on tweets. We leveraged varieties of features such as lexicon-based features, pos-based, morphological, ngrams, negation, and cluster-based features to ascertain which classifier works well with which feature group. We employed three state-of-the-art classifiers including Support Vector Machine (SVM), Decision Tree Classifier (DTC), and Naive Bayesian (NB) for our twitter sentiment analysis framework. We observed SVM to be the best performing classifier across all the twitter datasets except #9pm9minutes (DTC turned out to be the best for this dataset). Moreover, our SVM model trained on the SemEval-2013 training dataset outperformed the winning team NRC Canada of SemEval- 2013 task 2 in terms of macro-averaged F1 score, averaged on positive and negative classes only. Though state-of-the-art twitter sentiment analysis systems reported significant performance, it is still challenging to deal with some critical aspects such as negation and tweet normalization. Moreover, it is not reasonable to consider negation sense in all the negation tweets because there are certain cases in which negation cue presence has no sense of negation (such cases are known as negation exception cases). A lot of earlier works addressed the negation but evaluation of negation exception cases has not been taken up so well. Thus, this thesis also focused on handling negation exception cases. The main motivation behind this is to prevent the misclassification of negation tweets by addressing the negation properly and by considering the negation exception tweets. We developed a negation exception algorithm for the identification of such negation exception tweets and incorporated that algorithm into the sophisticated corpus-based statistical negation modelling approach. We used the twitter-specific automatic lexicons (NRC-Hashtag and S140) for getting the score of each word in negated context as well as in an affirmative context. We evaluated the effectiveness of our sophisticated negation modelling approach with assimilated negation exception algorithm by presenting a comparison with the most common reverse polarity approach of negation. This is done for all the twitter datasets and with all the classifiers. We also looked into the effectiveness of the negation exception algorithm alone by presenting a comparison of all models with or without negation exception rules across all the twitter datasets that we used in this study. The experimental results demonstrated that our negation modelling approach with incorporated negation exception rules provided a substantial improvement in the performance of all classifiers across all the datasets. We have also determined the accuracy of the developed negation exception algorithm by comparing its result with the human judgement across all the twitter datasets. For this, we randomly selected 200 tweets from each of the real-time as well as benchmark dataset. We achieved considerable accuracy of the negation exception algorithm across all the datasets. Moreover, to handle the noisy and unstructured tweets we have developed a knowledge-based tweet normalization system that would clean the noise and normalize the non-standard words in a tweet. We have determined the accuracy of the developed tweet normalization system by comparing its result with the manual pre-processing results across all the twitter datasets. For this, we have selected randomly 1500 tweets from the #Demonetization dataset, 500 tweets from the rest of the datasets respectively. We achieved considerable accuracy of the tweet normalization system across all the datasets. We have also assessed each pre-processing module’s impact on classifier performance across all the twitter datasets. During the evaluations, we observed considerable performance of our comprehensive feature­based twitter sentiment analysis system across real-time as well as benchmark dataset due to handling critical aspects such as negation and tweet normalization.

Keywords: Sentiment Analysis, Twitter Sentiment Analysis, Negation Modelling, Negation Exception Case, Negation Exception Algorithm, Tweet Normalization System, Supervised Machine Learning, Real­Time Twitter Dataset, Benchmark Twitter Dataset, Corpus-Based Statistical Approach, Reverse Polarity, Negation Cue, Feature Engineering

List of Figures

Figure 1.1: Abstract model of sentiment analysis

Figure 1.2: Workflow of Supervised Learning

Figure 1.3: Sentiment Analysis Approaches (Medhat et al., 2014)

Figure 1.4: Example tweet taken from the WHO timeline

Figure 2.1: Different approaches used in the literature for the negation cue and scope detection

Figure 3.1: SemEval-2013 benchmark corpus details (training and test set distribution)

Figure 3.2: Real-time twitter corpus generation (e.g., tweets on ‘#Demonetization’)

Figure 3.3: Workflow of real-time twitter corpus labelling

Figure 3.4: Cluster analysis procedure (cluster class assignment)

Figure 3.5: Real-time twitter corpora details

Figure 3.6: Feature engineering process

Figure 3.7: Feature engineering for our real-time as well publically available twitter dataset

Figure 3.8: Word cloud of positive tweets in “#Demonetization” dataset

Figure 3.9: Word cloud of negative tweets in “#Demonetization” dataset

Figure 3.10: Word cloud of neutral tweets in “#Demonetization” dataset

Figure 3.11: Word cloud of positive tweets in “#Lockdown” corpus

Figure 3.12: Word cloud of negative tweets in “#Lockdown” corpus

Figure 3.13: Word cloud of neutral tweets about “#Lockdown”

Figure 3.14: Word cloud of positive tweets about “#9pm9minutes”

Figure 3.15: Word cloud of positive tweets in SemEval-2013 corpus

Figure 3.16: Word cloud of negative tweets in SemEval-2013 corpus

Figure 3.17: Word cloud of neutral tweets in SemEval-2013 corpus

Figure 3.18: Bokeh plot: display most negative tokens in demonetization corpus in bottom right corner ...

Figure 3.19: Bokeh plot: display most positive tokens in demonetization corpus in top left corner

Figure 3.20: Bokeh plot: display most positive tokens in Lockdown corpus in top left corner

Figure 3.21: Top 20 ngrams from Demonetization tweets, selected by chi-square feature selection method

Figure 3.22: Top 20 ngrams from benchmark twitter corpus, selected by chi-square feature selection method

Figure 3.23: General model of supervised classifiers

Figure 3.24: SVM Model working

Figure 3.25: Working of Decision Tree algorithm

Figure 3.26: Confusion matrix for binary classification problem

Figure 4.1: Basic cleaning operations (first phase of TNS)

Figure 4.2: Tweet pre-processing phase 2: normalization

Figure 4.3: Tweet normalization system accuracy result across twitter datasets

Figure 5.1: Different forms of negation

Figure 5.2: Process for negation cue identification

Figure 5.3: Procedure for handling negation exception cases

Figure 5.4: Negation exception algorithm evaluation procedure

Figure 5.5: Negation exception algorithm accuracy result across twitter datasets

Figure 6.1: Supervised Learning Process

Figure 6.2: Classifiers performance comparison on #Demonetization test set

Figure 6.3: Results of performance comparison across SVM, NB, and DTC by the removal of one feature group at a time for #Demonetization dataset

Figure 6.4: Classifiers performance comparison on #Lockdown test set

Figure 6.5: Results of performance comparison across SVM, NB, and DTC by the removal of one feature group at a time for #Lockdown dataset

Figure 6.6: Classifiers performance comparison on #9pm9minutes test set

Figure 6.7: Results of performance comparison across SVM, NB, and DTC by the removal of one feature group at a time for #9pm9minutes test dataset

Figure 6.8: Performance comparison of several state-of-the-art classifiers across real-time twitter datasets in one plot

Figure 6.9: Comparative performances of the state-of-the-art classifiers on SemEval-2013 test tweet

Figure 6.10: Results of performance comparison across SVM, NB, and DTC by the removal of one feature group at a time for benchmark twitter dataset

Figure 6.11: Performance comparison among several classifiers showing significance of negation exception rules and double negation with respect to #Demonetization dataset

Figure 6.12: Performance comparison among several classifiers showing significance of negation exception rules and double negation with respect to #Lockdown dataset

Figure 6.13: Performance comparison among several classifiers showing significance of negation exception rules and double negation with respect to #9pm9minutes dataset

Figure 6.14: Performance comparison among several classifiers showing significance of negation exception rules and double negation with respect to SemEval-2013 benchmark dataset

Figure 6.15: Results of different negation handling strategies with #Demonetization dataset

Figure 6.16: Results of different negation handling strategies with #Lockdown dataset

Figure 6.17: Results of different negation handling strategies with #9pm9minutes dataset

Figure 6.18: Results of different negation handling strategies with SemEval-2013 dataset

List of Tables

Table 3.1: The distribution of sentiment class labels in the training and test set of SemEval-2013

Table 3.2: Statistics of unlabelled real-time twitter datasets

Table 3.3: Population of clusters (number of tweets per cluster) for real-time twitter datasets

Table 3.4: Results for clusters class assignment

Table 3.5: Statistics of labelled real-time twitter corpus

Table 3.6: Comparison of CMU tool and nltk tool in terms of tokenization of tweets

Table 3.7: Pos-tagging output of CMU Tagger

Table 3.8: Demo confusion matrix for three class classification problem

Table 4.1: Part of stop words file including apostrophe words too

Table 4.2: Part of the acronym file

Table 4.3: Part of the misspelled resource file

Table 4.4: Part of the emoticon text file

Table 4.5: Part of the negation words file

Table 4.6: Tweet normalization system demonstration (working examples)

Table 4.7: Showing the evaluation of automatic and manual pre-processing on few tweets from the “#Demonetization” corpus

Table 4.8: Showing the evaluation of automatic and manual pre-processing on few tweets from the Lockdown corpus

Table 4.9: Showing the evaluation of automatic and manual pre-processing on few tweets from the #9pm9minutes corpus

Table 4.10: Showing the evaluation of automatic and manual pre-processing on few tweets from the SemEval-2013 benchmark Twitter corpus

Table 4.11: Evaluation of TNS on several datasets

Table 5.1: Partial list of negation cues or words

Table 5.2: Negation scope resolution process in example tweets

Table 5.3: Negation handling procedure with incorporated negated exception rules algorithm

Table 5.4: Evaluation of NEA (negation exception algorithm) on ‘^Demonetization tweets

Table 5.5: Evaluation of NEA (negation exception algorithm) on “#Lockdown” tweets

Table 5.6: Evaluation of NEA (negation exception algorithm) on #9pm9minutes tweets

Table 5.7: Evaluation of NEA (negation exception algorithm) on SemEval-2013 corpus

Table 5.8: Negation exception algorithm evaluation results on several datasets

Table 6.1: Statistics of training and test tweets of real-time and benchmark datasets

Table 6.2: Statistics of #Demonetization corpus (training + testing tweets)

Table 6.3: Performance of state-of-the-art classifiers on “#Demonetization” test tweets

Table 6.4: Ablation experiments result on “#Demonetization” test set with SVM, when one feature group is removed at a time

Table 6.5: Ablation experiments result on “#Demonetization” test set with DTC, when one feature group is removed at a time

Table 6.6: Ablation experiments result on “#Demonetization” test set with NB, when one feature group is removed at a time

Table 6.7: Statistics of #Lockdown twitter dataset (training+test set)

Table 6.8: Comparative performance of state-of-the-art classifiers on “#Lockdown” test tweets

Table 6.9: Ablation experiments result on “#Lockdown” test set with SVM, when one feature group is removed at a time

Table 6.10: Ablation experiments result on “#Lockdown” test set with DTC, when one feature group is removed at a time

Table 6.11: Ablation experiments result on “#Lockdown” test set with NB, when one feature group is removed at a time

Table 6.12: Statistics of #9pm9minutes twitter dataset (training+test set)

Table 6.13: Comparative performance of state-of-the-art classifiers on “#9pm9minutes” test tweets

Table 6.14: Ablation experiments result on “#9pm9minutes” test set with SVM, when one feature group is removed at a time

Table 6.15: Ablation experiments result on “#9pm9minutes” test set with DTC, when one feature group is removed at a time

Table 6.16: Ablation experiments result on “#9pm9minutes” test set with NB, when one feature group is removed at a time

Table 6.17: SemEval-2013 dataset statistics (training + test set)

Table 6.18: Comparative performances of the SVM, NB, and DTC on SemEval-2013 test set (recall and F1 in the bracket are averaged on only positive and negative classes)

Table 6.19: Confusion matrix for the baseline classifier of SemEval-2013 test set

Table 6.20: Comparative performance of best SVM model with baseline and officially submitted result on SemEval-2013 test set

Table 6.21: Ablation experiments result on SemEval-2013 test set with SVM, when one feature group is removed at a time

Table 6.22: Ablation experiments result on SemEval-2013 test set with DTC, when one feature group is removed at a time

Table 6.23: Ablation experiments result on SemEval-2013 test set with NB, when one feature group is removed at a time

Table 6.24: Significance of negation exception algorithm with each real-time twitter dataset across state-of- the-art classifiers

Table 6.25: Significance of negation exception algorithm with SemEval-2013 dataset

Table 6.26: Comparison of different negation processing strategies with real-time twitter datasets

Table 6.27: Comparison of different negation processing strategies with benchmark SemEval-2013 dataset

Table 6.28: Evaluating the impact of each pre-processing module by Loss/Gain in SVM performance across each real-time twitter dataset

Table 6.29: Evaluating the impact of each pre-processing module by Loss/Gain in SVM performance on Sem Eval-2013 twitter dataset


Abbildung in dieser Leseprobe nicht enthalten

Chapter 1 Introduction

Opinions are the key indicators of one’s behaviours. Subjective feelings such as opinion, emotion, attitude, etc. greatly influenced human behaviour. Our choices and beliefs are somehow conditioned on how others evaluate and feel for the world. Whenever we want to take a decision, we frequently look out for the others opinions. This is not only related to an individual but also for the large organizations, who are always eager to know public opinions on their services or products. Even a person before buying any product, often look out the feedback or reviews provided by the existing customers of that product. For instance, shopping websites such as Amazon.com provides the star ratings and consumer’s reviews for every product they are selling. Such opinionated reviews act as product recommendation for the buyer. There is remarkable progress in the usage of social media like Facebook, Twitter, blogs, etc. by users of different nationalities for expressing their opinions, desires, allegations, and emotions about any services, product or about any topic.

Opinion is the sentiment of people towards any entity which can be an event, person, product, or organization. An entity might have features and components. As an illustration, Phone is an entity. Various components of entity “Phone” are battery, screen, etc., and its features are camera quality, voice quality, etc. People can express their opinion directly towards any entity or by presenting comparison between entities. For example, the phrase “Battery life of Nokia is long” expresses opinion directly on “battery life” feature of Nokia phone but the phrase “Battery life of Nokia is more than IPhone” expresses comparison between battery life feature of two entities i.e. Nokia and IPhone.

Liu (2012) defined opinion as a quintuple: (ei, aij, sijkl, hk, tl) Where ei represents an entity, a represent an aspect of the entity, s ijkl is the opinion of people towards an aspect a ij , h k indicates an opinion holder and t l is the time of expressing opinion by sentiment holder. Aim of sentiment analysis is to determine the sentiment s i j kl expressed by people on any entity and categorize them into positive, negative or neutral with their level of intensity (highly positive or scale ratings from 1 to 5). All the above mentioned components of opinion are essential to analyze the sentiments from the opinionated data otherwise accurate analysis can’t be done. For instance, if time component is missing, it would be difficult to analyze the opinion expressed by people on an entity with time because people opinion regarding an entity changes with time. As an illustration, consider a review posted by the user “Hary” on 10.07.2015. “The picture quality of my new Samsung camera is great”.

In this review, entity is “Samsung”, aspect is “picture quality”, opinion expressed is “positive” on the picture quality of entity “Samsung”. Opinion holder is “Hary” and time on which review was posted is “10.07.2015”. Thus, as per the definition of Liu (2012), opinion quintuple is (Samsung, picture quality, positive, Hary, 10.07.2015).

In ancient times, before the technological growth, people used to ask their friends, family, and neighbours, whenever an opinion regarding any entity is needed (Liu, 2010). In the same way, organizations in the past arranged for the opinion polls and surveys (using questionnaire) for getting the public opinions on their products or services. That means, in the past opinionated data is limited only. But the explosion of Web 2.0 triggers the people to use the social media (such as blogs, reviews, forums, twitter, etc.) for conveying their opinionated views regarding anything that could be a product, an event, any famous personality, politicians, and many more. It is stated by John Scalzi that “Everyone is entitled to their opinion about the things they read (or watch, or listen to, or taste, or whatever). They’re so entitled to express them online”. Thus, opinionated data is no longer limited to friends, neighbours and relatives. It becomes possible to get the opinions from the massive pool of people. .Huge amount of opinionated data is available in digital form from which valuable information can be extracted that in turn would help in taking varieties of decisions regarding any product, service, organization, or topic. Such data is commonly known as User-Generated- Data (UGC). Extracted information from opinionated data have different potential usages i.e. organizations collect the opinionated data which is available in form of comments, reviews, blogs, etc. about their services and products and then analyze those data for extracting valuable information so that decision can be made regarding improvement of their services and products. Even different e-commerce websites use the opioniated data for analyzing the buying pattern of their customers, sentiment of customers regarding their brand, and, then, based on the analysis strategies are made to for sales improvement.

Opinionated data can even have huge impact in shaping the brand loyalties and brand advocacy of a company. Online reviews on product and services have become significant source for the consumers for their purchase decision and buying pattern. For instance, 92% people before buying any product read the online reviews on product (Shrestha, 2016). Thus, opinionated publically available data have not only helped in reshaping business but also influence public emotions, which have greatly impact on political and social systems. For instance, in 2011, in some Arab countries posting of opinionated data led to mass mobilization for the political changes. Thus, sentiment analysis has become the need of hour for monitoring and distilling the valuable information contained in the opinionated data. It is the sentiment analysis, which mine people’s sentiment about certain topics and events (Dave et al., 2003; Pang & Lee, 2008). Identifying the emotions expressed in a piece of text has wider range of applications such as management of customer relation (Bougie et al., 2003), identifying risk of suicide cases (Cherry et al., 2012; Pestian et al., 2008), determining product popularity, government popularity (Mohammad & Yang, 2011), and many more.

Various old fashioned procedures for collecting and analyzing opinionated data are surveys, interviews, etc. but opinionated data is in huge amount and in unstructured form, so manually analyzing sentiments from them is too difficult. Thus, there is need of automated techniques for analyzing the sentiments from unstructured opinionated data. For analyzing the massive amount of opinionated information, various Natural Language Processing (NLP) tasks are being used. In particular, there is growing interest in the sentiment analysis, whose aim is polarity classification (binary or multiclass classification with outputs such as negative versus positive, negative versus positive versus neutral, or thumbs up versus thumbs down (Cambria, 2016). Sentiment analysis (SA) not only helps in analyzing people’s sentiment or opinion but also its influence on the society (Cambria, 2016; Cambria et al., 2017; Ebrahimi et al., 2017). Opinions are the subject of study in sentiment analysis. Figure 1.1 portrays the abstract model of sentiment analysis with its applications in various domains.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.1: Abstract model of sentiment analysis

Massive amount of opinionated data shifted the focus of researchers towards sentiment analysis which then become a significant research area. It plays a key role in Artificial Intelligence (AI) advancement. Most notably, there is increasingly interest of companies in sentiment analysis because of upsurge of social media websites such as twitter and facebook, growth of recommendation websites, and increase in blog popularity. Such opinionated information is a valuable currency for the companies and businesses in identifying new opportunities and successful marketing of their products. Sentiment analysis is an automatic technique of analyzing the sentiments or emotions expressed by people in form of blogs, reviews, comments, etc. It classifies a piece of text into Positive, negative or neutral sentiment or determines the emotions from them i.e. joy, sad, anger, etc. In short, Sentiment Analysis (oftenly mentioned to as Opinion Mining) is one of the machine learning (ML) technique which classify the sentiments expressed by people towards any topic by using Natural language Processing, Computational linguistics, and Text Analysis techniques (Liu, 2015; Pang & Lee, 2008). As stated by the Liu (2012), “Sentiment analysis is the analysis of expressed attitudes and opinions with the goal of determining the degree to which there are positive or negative sentiments therein”. Various academic and commercial tools such as SAS (www.sas.com/social), IBM (www.ibm.com/analytics), or SenticNet (www.business.sentic.net) are available which provides summarization of people opinions and trends. Nevertheless, such tools are limited to emotion or polarity classification. However, such tools are affected by the linguistic elements such as negation.

1.1. Levels of Sentiment Analysis

Generally speaking, there are three levels of sentiment analysis: document-level, sentence-level, and aspect­level.

1.1.1 Document-level Sentiment Analysis

Document-level analysis is based on the assumption that a document contains direct opinion on a single entity. Thus, it is not suitable in cases (such as blogs and forums) where comparative opinions are present in a document as it provides overall single opinion on a document. Several researchers performed document­level sentiment analysis (Pang & Lee, 2008). It’s not a deeper level of analysis and doesn’t provide opinion of people on individual feature or an aspect of any product, which is a major disadvantage (Liu, 2012). For example, consider a document regarding Nokia phone:

(1)I bought Nokia phone 2 months ago. (2) Camera quality is good. (3) My sister thought it’s very expensive.

In the above document there are 3 sentences expressing different opinions on different aspects of Nokia phone. Sentence (1) is neutral, (2) expresses positive sentiment on camera quality of Nokia phone and (3) expresses negative opinion on price feature. If document-level analysis is done on above document then it’s very difficult to get individual opinion on each aspect of Nokia phone as document-level provides overall single sentiment on entire document. Moreover, in reality people are interested in knowing the public reviews on each feature of the product (Nokia phone in the above example) rather than on entire product itself. Hence, there is need of sentence-level or aspect-level sentiment analysis for deeper analysis.

1.1.2 Sentence-level Sentiment Analysis

The main aim of sentence-level analysis is to determine the polarity of sentence i.e. whether it is expressing positive, negative or neutral opinion. It is based on the assumption that, a sentence articulates single opinion. In this level, firstly analysis is done to determine whether the sentence is objective or subjective that is separating feelings, beliefs, or views from the facts. That is subjective versus objective classification is done. Subjective sentences are those which contain expressed opinions, feelings, etc. that is non-factual information is represented by them. For instance, Camera quality of Nokia is good. Most of the subjective sentences contain sentiment bearing words such as love, good, etc. but few many not express opinions. Such type of subjective text needs to be handled carefully. For instance, “He came yesterday”.

Objective sentences are those which contain facts but such sentences do not convey any sentiments, e.g., “I purchased a Nokia phone 2 months before”. It’s an objective sentence having no opinion. However, there are few objective sentences with implied opinion in them. For instance, “This machine has stopped working suddenly” expressing negative polarity implicitly, though there are no sentiment bearing words in it. Next polarity classification is done to determine whether a sentence entails negative or positive connotations (Liu, 2012).

1.1.3 Aspect-level Sentiment Analysis

It’s a finer level of analysis on individual aspect (feature or attribute of an entity) of each entity so, it has more possible usages. It is also known as entity or feature level in some earlier works (Hu & Liu, 2004; Pang & Lee, 2008). The concept of aspect extraction from opinionated data was first introduced by Hu and Liu (2004). Identifying sentiments expressed in a piece of text has varieties of applications such as tracking people’s mood towards movies, products, politics, etc. This helps in improvement of customer relation model. In many of the above mentioned applications, it is important to determine the sentiment associated with an entity or features (aspect) of that entity because generally, customers express their opinions on different aspects of a product or services they have consumed. As an illustration, “Camera quality of my Nokia phone is awesome”. If aspect-level analysis is done on that piece of text then, it determines the positive sentiment on camera quality feature of Nokia phone. It might be possible that overall sentiment towards a product or service is positive, but opposite sentiment towards the aspect or feature of that entity.

Consider one more example: “The Pizza was great but the service was slow”. In this review customer has positive sentiment towards pizza they serve but negative towards restaurant service. Only an aspect-level or granular level of analysis can disclose such hidden information which is highly pertinent for the decision makers for the planning. Structured summary of sentiments on entities and their aspects can be generated using aspect-level sentiment analysis which would then be utilized for quantitative and qualitative analyzes. Aspect can be implicit or explicit. Explicit aspect is easy to identify and expressed in a text through explicit words such as “resolution of camera”, “battery size”, etc. On the contrary implicit aspect is harder to detect. For instance, “This phone is affordable”. Here, customer is expressing positive sentiment towards price aspect of entity “phone”.

Various studies have been done to handle the aspect-level through machine learning approach (Bhadane et al., 2015; Fang & Chen, 2011), lexicon-based approach (Potdar et al., 2016) and more recently deep learning (Guha et al., 2015; Nguyen et al., 2017; Ray & Chakrabarti, 2019; Weichselbraun et al., 2017). A complete survey on aspect-level sentiment analysis was done by Schouten and Frasincar (2015).

1.2. Sentiment Analysis Approaches

Sentiments and emotions expressed in a piece of text are captured and analyzed by the sentiment analysis algorithms. Such algorithms typically include machine learning methods, lexicon-based methods, hybrid approach (combination of machine learning and lexicon-based), and most recently deep learning approaches such as neural network, convolution network, etc. (Medhat et al., 2014). In a wider sense, existing approaches to sentiment analysis can be categorized into three main groups: statistical methods, knowledge­based, and hybrid methods (Cambria, 2016). Figure 1.3 shows the complete taxonomy of sentiment analysis approaches.

1.2.1 Knowledge-Based Approach (Lexicon-Based Approach)

It uses sentiment lexical resources (Cambria et al., 2018) and exploit syntactic patterns (Poria et al., 2015) for analyzing sentiments from a piece of text. Sentiment lexicons contain word attributes such as polarity (positive, negative, or neutral) (e.g., Bing Liu lexicon, Hu & Liu, 2004) and polarity strength (strong positive, weakly negative, etc.) (MPQA, Wilson et al., 2005). Some of the sentiment lexicons contain the real-valued rather than polarities for sentiment bearing words such as lexicons introduced by Kiritchenko et al. (2014b), which contain the polarities value for words in negative and affirmative context (NRC Hashtag and S140 lexicon). Lexicon-based approach typically makes use of words and expressions annotated with positive and negative label (Taboada et al., 2011). As an illustration, some of the positive sentiment-bearing or opinion words are beautiful, good, or excellent and some of the negative opinion words are bad, horrible, or sucks. That is affective dictionaries are employed for estimating the sentiment expressed in a piece of text. There exists number of lexicon resources developed either semi-automatically or automatically (Zhou et al., 2013) such as SentiWordNet (SWN) (Baccianella et al., 2010), General Inquirer (Stone et al., 1966), MPQA (Wilson et al., 2005), and many more. Lexicon resources (Esuli & Sebastiani, 2006; Hu & Liu, 2004) have been considered as a useful resource for sentiment analysis (Agarwal et al., 2011; Cambria, 2016; Choi & Cardie, 2008; Gupta & Joshi, 2019; Kiritchenko et al., 2014b; Mohammad et al., 2013).

Knowledge-based approach is further categorized into corpus-based (Turney, 2002) and dictionary approach (Liu, 2012; Pang & Lee, 2008) based on the way of generating opinion lexicon. Corpus-based approach is based on the syntactic (co-occurrence) patterns rooted in corpora. Dictionary-based approach relies on the bootstrapping technique through the use of the small group of seed opinionated terms and a dictionary including WordNet. WordNet is used to expand the seed words set via antonyms and synonyms. Seed words are small number of opinionated words (collected manually) with strong negative or positive orientation.

Traditional usage of sentiment lexicon is to sum up the polarities of sentiment bearing words present in the lexicon (Hu & Liu, 2004; Turney, 2002). Though, lexicon-based method is very simple, but has shown surprising performance in several state-of-the-art sentiment analysis systems (Kiritchenko et al., 2014b).

Few studies (Agarwal et al., 2011; Wilson et al., 2005) have exploited more sophisticated features such as count of negative and positive words, maximum score, minimum score, and total score. Such sophisticated lexicon-based features turned out to be highly effective in SemEval tasks for sentiment analysis (Mohammad et al., 2013). However, they can’t deal with semantic compositionality (Polanyi & Zaenen, 2006; Taboada et al., 2011), which exhibits intricacies that is flip polarity (not good), shifting polarity (not excellent), and negation with intensification (not very nice). Thus, few studies moved beyond BOW (Bag- of- Words) model in leveraging sentiment lexicon (e.g., Gupta & Joshi, 2019; Kiritchenko et al., 2014b; Muhammad et al., 2016) and addressed this issue by incorporating linguistic knowledge such as negation, intensification etc. into the lexicon-based approach.

Lexicon-based approach is computationally efficient, scalable, and simple, and, thus, mostly used in general sentiment (opinion) analysis (Hogenboom et al., 2015; Medhat et al., 2014). However, it’s been stated by various researchers that knowledge-based approach sometimes affected by low recall (Giatsoglou et al., 2017). Moreover, effectiveness of such approach depends on the accuracy and coverage of lexicon dictionary (Giachanou & Crestani, 2016). No satisfactory result is obtained in many cases as it is insufficient to rely solely on the sentiment-bearing words (Cambria et al., 2013). Thus, few researchers in their study combined the lexicon-based and machine learning methods to get accuracy improvement (Mudinas et al., 2012; Zhang et al., 2011).

1.2.2 Statistical Approach

It includes machine learning and deep learning methods for performing sentiment analysis. They typically make use of Bag-of-Words technique in which text is represented by a vector of words, ignoring the word position (suffers from sparsity and high dimensionality) and recently word embeddings (commonly used with deep learning), which map words of a text to fixed length dense vector (encoding syntactic and semantic property of a term). Bag-of-Words approach is not able to deal with the context of a word. Thus, researchers typically enhance the BOW with linguistic knowledge to get considerable performance (Chikersal et al., 2015). Due to limitations of BOW, word embedding technique has become quite popular recently and being used widely in sentiment analysis tasks (Ebrahimi et al., 2017; Laskari & Sanampudi, 2017; Ren et al., 2016; Rezaeinia et al., 2019; Saroufim et al., 2018; Tang et al., 2014). The two most widely used word embedding methods are Glove and Word2Vec, which convert a text into real-valued meaningful vectors. Such methods need large corpus for training.

Machine learning approach is further divided into supervised and unsupervised learning.

- Supervised Learning: Supervised machine learning classifier needs the labelled training corpus for classifier training, and, then, trained algorithm is used for classifying unseen test corpus. Put simply, main aim of supervised learning approach is to classify unseen data based on the labelled training data. It tries to infer a function from labelled training data so that an input can be mapped to an output.

Y=f(X) where X is input and Y is the output variable.

The aim of supervised learning is to approximate the above function so that one can predict the output Y for any new input data (x).

Figure 1.2 portrays the work flow of supervised learning. Some of the well-known and widely used supervised learning classifiers are Support Vector Machine (SVM), Decision Tree Classifier (DTC), Random Forest (RF), Naive Bayesian (NB), Logistic Regression, and many more. Go et al. (2009) were the first to tackle twitter sentiment analysis (TSA) through supervised learning approach and considered the problem as a task of binary classification. Supervised learning approach is further classified into Classification and Regression.

- Classification is a type of supervised learning approach whose outcome is discrete that is binary (e.g., positive or negative) or multiclass (e.g., positive, negative, or neutral). As an illustration, twitter sentiment analysis comes under the classification task.
- Regression is also a type of supervised learning approach but outcome is continuous. For e.g., predicting the sales of a company.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.2: Workflow of Supervised Learning

- Unsupervised Learning: It doesn’t require labelled training data and tries to infer the structure from the data through clustering such as customer segmentation. In this, only input data (X) is present but there is no corresponding output variable. The most common unsupervised learning is the clustering approach, which divided the unlabelled data into clusters based on some patterns in such a way that objects within clusters are more similar to each other. K-mean is one of the popular unsupervised clustering approaches.

Statistical methods (machine learning and deep learning) have shown significant performance for polarity detection (Majumder et al., 2019; Zadeh et al., 2018). Most of the earlier works on sentiment analysis address the problem through machine learning approach, and, recently deep learning. Machine learning technique of sentiment analysis, however suffers from some issues. Firstly, it needs a massive amount of training data and is domain dependent. That is model trained on one domain might perform poorly when tested on varied domains. Lastly, tweaking might lead to variation in results. Such issues are significant when it comes to NLP because it prevent in getting human-like performances. Thus, few studies recently focused on another statistical approach, deep learning to gather syntactical patterns from the corpus.

1.2.3 Hybrid Approach

Hybrid approach to sentiment analysis combines knowledge-based and statistical approach to get the benefits of both techniques (Jurek et al., 2015). In this technique, the output of lexicon-based approach act as input to machine learning algorithm. Hybrid approach to sentiment analysis has been explored in the past by number of researchers by combination of varied techniques rather than using a standard approach. Many studies in the past utilized the hybrid approach for sentiment analysis and reported considerable performance (e.g., Appel et al., 2016; Gupta & Joshi, 2019; Khan et al., 2014; Muhammad et al., 2016; Poria et al., 2014).

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.3: Sentiment Analysis Approaches (Medhat et al., 2014)

1.3. History of Sentiment Analysis, Specifically Twitter Sentiment Analysis

Sentiment analysis has been the subject of research for the past many years. The efforts in determining the people’s opinion and attitude towards any event or topic have fueled the interest of researchers in the field of sentiment analysis (Ebrahimi et al., 2017). It has been a necessity for the small or large organizations and 9 even for an individual in their decision making through publically available opinionated data. Various organizations and even government has started in funding sentiment analysis projects and integrated it as a part of their strategy for marketing. Analyzing the expressed sentiments on social media in various forms can bring wonderful business opportunities too. It has great potential to enhance the proficiencies of recommendation systems and customer relation model. For example, it helps in determining the features enjoyed by customers and at the same time can help recommendation system to exclude the feature (which gets negative comment from consumer) from their recommendation list. It is true that sentiment analysis field has recently witnessed a huge surge of research movement, but for quite a while there has been a solid interest.

Sentiment analysis was originally used to extract opinions on written documents in the 1950s. However, explosion of web 2.0 led the frequent use of sentiment analysis on the opinionated data available on the internet. Before the year 2000, there had been little research about sentiments and opinion, though there is a long history of Natural Language Processing (NLP) and Linguistics. Before the widespread awareness in the area of sentiment analysis, most of the earlier works focused on exploration on affects, metaphor, point of views, and many more (e.g., Sack, 1994; Wiebe, 1990; Wiebe & Bruce, 1995). One reason can be limited amount of opinionated data in digital form before the World Wide Web (WWW). Due to lack of technological growth in internet people often used the traditional method of Word-of-Mouth (WOM) for getting opinion and feelings regarding any topic and event. Stokes and Lomax defined WOM as “interpersonal communication regarding product or services whether the receiver regards communicator as impartial”. Before the widespread awareness of WWW, organizations were using the traditional polling, survey or interview method for collecting opinionated view of people (Liu, 2010). But the upsurge of web 2.0 makes the sentiment analysis an interesting and active research area because people’s now have an unprecedented channel through which they can share their valuable opinions about any number of topics and events. There are number of reasons for the increasingly interest of researchers in the field of sentiment analysis. One reason might be the wider range of applications of sentiment analysis in each and every domain almost such as healthcare (Carrillo-de-Albornoz et al. 2018; Goeuriot et al., 2012), politics, social events, financial services, and many more. Another most obvious reason is the availability of huge amount of publically available sentiment bearing opinionated data, if leveraged effectively can provide tremendous benefits to organizations and government companies. That is technical growth in social media coincides with the research growth in sentiment analysis.

Since 2000, rapid growth has been encountered in sentiment analysis and it started growing from computer sciences to different fields too (e.g. political sciences, management sciences, and many more). Pang and Lee (2008) in their survey on sentiment analysis stated that year 2001 or so appears to spread the awareness of 10 opportunities raised by sentiment analysis and subsequently led to the publishing of large numbers of papers in the field of sentiment analysis. It has been applied extensively to the polarity classification of product reviews (e.g., Pang et al., 2002; Potdar et al., 2016) available on the web. It is worth noting that, majority of existing studies done in the field of sentiment analysis employed machine learning approach for the review polarity classification.

There are number of sources for collecting opinionated data such as blogs, chats, discussion rooms, and various social media websites. However, in late 2000 researcher’s attention has been shifted from sentiment analysis to twitter sentiment analysis specifically because twitter is the goldmine of opinionated data, containing real-time messages. It is a platform for collecting high-throughput opinionated data, which could be used to ascertain industry-wide trends, reveal current stories on hot topics, and take appropriate action on time. Twitter users vary from a common man to renowned personalities from different countries. Thus, twitter is the ironic source of opinionated data providing exciting opportunities to the researchers to gain insight into the opinions from the immense amount of opinionated data. It can be seen as next footstep to sentiment analysis and is more challenging than sentiment analysis due to unstructured nature of tweets. A tweet is the short message posted by a twitter user. Due to restriction imposed by twitter on user, a person can tweet maximum of 160 character length. Thus, twitter users frequently use informal language such as slangs, misspelled words, emoticons, punctuation, etc. while posting messages on twitter. Such highly informal and unstructured language makes the twitter sentiment analysis a challenging task. Thus, techniques used for the sentiment analysis of structured reviews might not work for twitter sentiment analysis. Moreover, people tweet on each and every current hot topics and events (incredible topics coverage), which makes twitter sentiment analysis task more challenging.

1.4. Twitter

Twitter is the most popular social media platform among users for expressing their views or for exchanging of the information. It is an informal platform where people from all over the world can post and broadcast messages. It played an important role in spreading awareness of several natural disasters such as Arab Spring (Kumar et al., 2014) and Hurricane Sandy. It was founded in 2006 and originally it was created as a platform for the sms text, but it grew much more than that as it has been catapulted by the social media influencers and famous personalities to reach their targeted audiences for either business purpose or their individuals alike. Statistics (https://learn.g2.com/Twitter-statistics) have shown that there are more than 321 million active users, sending an average of 500 million tweets daily. Twitter is the one of the top rated microblogging site till date, having millions of users tweeting millions of tweets every day. It is the twitter which provides companies an opportunity to grow their business (brand) because they can outreach or understand their customers through twitter opinionated data. Also, more than 80% of data is unstructured in nature (https://learn.g2.com/structured-vs-unstructured-data). Message posted by a twitter user on twitter is known as tweet. However, maximum length of a tweet is 160. Thus, users use informal language while tweeting. Some of the linguistic peculiarities of a tweet are:

- @ sign is used in tweet for indicating the user name. For instance, “@KimKardashian this #iPhoneCase is LIT YASSS for #Kimoji case #iPhone7 #Blogger”.
- # (hashtag) symbol is used to indicate the topic on which a person is tweeting. For instance, a tweet on demonetization would be “Grasshopper holds a survey to show #Demonetization was a disaster but from the voting he must be disappointed now :P https://t.co/GXISJ3Pk57”.
- Url (web link) is used to refer external source. For example, “The App isn't working. Hello !!! #iphone7 #samsung @ Nassau, Bahamas https://t.co/GSfJxc0yWv”.
- People often use repeated punctuations such as !!! to express their sentiments. For instance, “This Smartphone is stupid . The App isn't working . Hello !!! #iphone7 #samsung @ Nassau , Bahamas https://t.co/GSfJxc0yWv”.
- Elongated words (a letter is repeated many times in a word) are being used while tweeting in order to give emphasis on a particular word such as looooove, hateeeee, etc. For instance, “Loooooove Iphone7... #Iphone7”.
- People also often use capital words for putting emphasis on certain words while tweeting. For instance, “Apple Pay arrived in Japan and I'm actually really glad it did . LOVE it !!!!! #iPhone7 #iPhone7Plus #Apple #ApplePay #Suica”.
- Twitter users frequently use slangs in their messages to keep it short such as lol, rofl, and many more. Slangs are indicators of their moods. For instance, slang lol means laughing out loud. Consider a tweet on Lockdown: “Lol even in my game I am not allowed to travel #SocialDistancing #Lockdown #QuarantineLife https://t.co/dggG4fkNxt”.
- People frequently use misspelled words while tweeting due to length restriction. For e.g. “Most of d Bnk's ATM's were closed at d tym of #Demonetization , I hope next tym Post Bank of India's ATM's were fully busy . @IndiaPostOffice”.
- Now-a-days one of the most common thing users are using for expressing their mood through tweet is the use of emoticons such as happy face, sad face, wink, and many more. For instance, “#Covid_19 #StayAtHomeAndStaySafe #Lockdown #naturelovers ... Mother nature we want your happiness too :) https://t.co/KkK5mC5l4y”.
- RT is used for retweeting a tweet. It indicates that user is reposting or retweeting. For instance, “Samsung sabotage! RT @eNCA: Woman burnt after falling asleep on her #iPhone7 https://t.co/rKUuMdj9Zr https://t.co/4t7iLMdvtK”.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.4 portrays a tweet, taken from the timeline of World Health Organization (WHO). This tweet contains several linguistic peculiarities of a tweet such as user name (“@DrTedros”) and topics (#StayHome and #COVID19). This tweet is on hot trending “coronavirus” topic.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.4: Example tweet taken from the WHO timeline

1.5. Need for Twitter Sentiment Analysis

Twitter sentiment analysis is a significant research topic and is in boom now-a-days. It helps in government and companies to understand their system effectiveness from the tons of opinionated data publically available on the web. Twitter, a microblogging websites is used frequently by the people’s for the expression of their feelings, opinion, attitude, etc. towards a particular topic, events, etc. Even twitter is the most frequently used site not only among common man, but also by politicians, and famous personalities. Whenever any event happens such as Loksabha elections, decision on Ram Mandir, lockdown decision or demonetization, then suddenly twitter starts flooding with tweets related to those events or hot topics. Such tweets represent individual’s state of mind on current events and topics, which are very informative.

Thus, twitter is considered as a valuable repository of informative data for prediction and opinion mining in a wide variety of fields including finance that is determining correlation between tweet sentiment and fluctuation in stock market (Bollen et al., 2011; Brown, 2012; Sanchez-Rada et al., 2014; Sohangir et al., 2018; Zhang, 2013), politics that is prediction of election results (Angaria & Guddeti, 2014; EI Alaoui et al., 2018; Jose & Chooralil, 2016; Mejova et al., 2013; Mohammad et al., 2015; O’Connor et al., 2010; Ramteke et al., 2016), sports (Barnaghi et al., 2016; Branz & Brockmann, 2018; Mukherjee & Jansen, 2018), health (Hanson et al., 2013; Korkontzelos et al., 2016; Mahata et al., 2018), and much more (Akhtar et al., 2018).

Even, twitter is being used for tracking and predicting the natural disasters (Doan et al., 2011) and terror incidents. Moreover, there exist studies that explore the sentiments associated with tweet on temporal and spatial scale (Bertrand et al., 2013). Few studies analyze the dynamic (Thelwall et al., 2011) and geography (Mitchell et al., 2013) of sentiments associated with tweets. Thus, not only the tweet text but the other attributes of a tweet are being used in gauging public sentiment. For instance, Bertrand et al. (2013) in their study combined their twitter sentiment analysis approach with the geotagging provided by the users. They observed that people’s mood is lowest at hub of transportation and highest at the public park. However, Ebrahimi et al. (2017) stated that it is still a challenging task to construct accurate sentiment analysis model for dynamic events such as predicting election result.

A huge amount of opinionated data is available in the form of tweets on twitter, but it is formidable task to manually collect sheer volume of data, extract relevant information, and finally summarize the extracted opinions into usable form. Thus, automated sentiment analysis is needed. Though collective opinions by different people are more informative than single opinion, but such opinionated data in the form of tweets is highly unstructured in nature. Moreover, 80% of the data (https://learn.g2.com/structured-vs-unstructured-data) available on the web in digital form is unstructured and organized in nature. There is a need to leverage such data effectively and thus, twitter sentiment analysis is needed for distilling and extracting valuable information from such sentiment bearing data, which would in turn help in decision making for the betterment of the society. The main focus of twitter sentiment analysis is to extract the sentiments associated with a piece of text, specifically tweets. Some of the advantages of using twitter sentiment analysis are discussed below:

- Real-time analysis: Users of twitter usually post real-time messages on the events and topics currently happening. Twitter sentiment analysis is able to identify the sudden shift of customer emotions, or mood and can help in taking action before the speeding up of problem. . Now-a-days almost all shopping websites are monitoring your purchase action in real-time and are able to recommend their customer according to their shopping pattern.
- Scalability: Twitter sentiment analysis is an automatic way of collecting and monitoring tweets. Thus, one could get cost effective result on a very large corpus of tweets in a short span of time. On the contrary, manually analysis of large number of tweets is a very time consuming process and is quite impossible to scale. Traditional technique of poll and surveys for analysis are cost and time intensive.
- Consistency: Twitter sentiment analysis led to the more accurate and consistent result unlike manual processing, in which different person might classify same tweet in different category according to their perception.

A lot of earlier works addressed the twitter sentiment analysis through different techniques (machine learning, lexicon-based, or hybrid approach) and varied features. Significant performances have been reported by the researchers in their twitter sentiment analysis work. Even various sentiment analysis tools such as Sentimentor (Spencer & Uchyigit, 2012), SentiStrength (Thelwall et al., 2012; Thelwall et al., 14 2010), etc. have been developed in order to mine the valuable information embedded in opinionated data. Most of the research in twitter sentiment analysis is focused on supervised machine learning approach, in which machine learning classifiers are to be trained on labelled training data. However, limited research has been done in presenting a comparison among different state-of-the-art classifiers with varied features. This thesis main motivation is to address that problem by presenting a performance comparison among several state-of-the-art classifiers with respect to features in order to determine which classifiers works best with which feature group. We shall evaluate the performance of several classifiers on varied domains which include real-time twitter dataset (real-time tweets would be extracted from twitter) as well as general purpose publically available benchmark dataset (SemEval-2013 dataset, Nakov et al., 2013). Chapter 3 presents complete details on collection and annotation of real-time twitter datasets. We would present an exhaustive analysis of contribution of different feature groups with different classifiers on both real-time tweets and publically available benchmark tweets.

It is apparent that twitter sentiment analysis is much more complicated than the sentiment analysis in general (such as sentiment analysis of product reviews from amazon) because of complicated unstructured nature of tweets. As stated by “Minsky” sentiment analysis is a big suitcase and entails various subtasks such as negation, word sense disambiguation, metaphor resolution etc. Though earlier works (e.g., Aggarwal et al., 2011; Reitan et al., 2015) reported significant performance in twitter sentiment analysis, it is still challenging to deal with various linguistic phenomenon’s such as negation, tweet normalization, etc. Neglecting such linguistic phenomenon while performing sentiment analysis might lead to low performance.

Besides this, it is insufficient to handle all the negation tweets in the same way, without even considering whether the negation in tweet has actually sense of negation or not. There is very limited research on handling that type of negation, in which negation presence doesn’t actually negating the sentiment associated with the tweet. Besides our main motivation behind this thesis, another research gap that would be handled in this work is to address the existing problem of negation handling. We would be developing an algorithm for handling those negation tweets in which presence of negation doesn’t necessarily means negation. We shall present ablation experiments for assessing the significance of handling negation with incorporated negation exception algorithm on classifiers performance. Moreover, this thesis also deals with unstructured nature of tweets through a knowledge-based approach known as tweet normalization system.

1.6. Research Objectives

There is wide availability of opinionated data on different platforms such as blogs, news articles, reviews, microblogging sites, and many more. Though such data is very informative, but it is also true that, it is very complex to handle such data in an effective way. Though we are in era of technological advancement, but 15 the data complexity and overwhelming data size make it very difficult to extract needed valuable information from it. It’s quite obvious that it’s not feasible to handle data simultaneously from all platforms. Moreover, currently among different sources of sentiment bearing data, twitter is the goldmine and is the opportune. Twitter leads to generation of tons of data every day related to hot and current topics. Thus, this thesis is focused on twitter sentiment analysis, specifically message-level supervised polarity classification that is classifying a tweet into positive, negative, or neutral class. Our main aim is to present exhaustive analysis of different state-of-the-art classifiers with different group of features. That is this work would analyze which classifier is working effectively with which feature group so that significant classification performance can be achieved. While exploring the different classifiers with varied features, this work also addresses the unstructured nature of tweet in a very effective way through knowledge-based approach and the challenging task of negation handling. That is we shall look into critical aspects of NLP too such as tweet normalization, and negation while doing twitter sentiment analysis. In order to achieve our objective and to overcome the above mentioned challenges, we propose the following experiments.

1.6.1 Corpus Creation

Twitter corpus is either collection of tweets related to a particular topic or tweets in general. Quality corpus is the golden metric for any successful twitter sentiment analysis system. Comprehensive corpus is an essential part of any twitter sentiment analysis system. In the past few years, various evaluation datasets have been released in order to evaluate the performance of twitter analysis. There are number of publically available benchmark twitter corpus, which can be used directly for performing twitter sentiment analysis. Various benchmark twitter datasets are SemEval datasets (Nakov et al., 2016; Nakov et al., 2013), Stanford Twitter dataset (Go et al., 2009), and many more (Kouloumpis et al., 2011; Shamma et al., 2009). In most of the datasets, tweet is labelled with positive, negative or neutral classes (Go et al., 2009; Nakov et al., 2013; Shamma et al., 2009). However, in few datasets some additional labels are also there such as irrelevant, mixed, etc. (Speriosu et al., 2011). Moreover, in some datasets numerical rating is given to each tweet from -5 to +5 (Thelwall et al., 2012; Thelwall et al., 2010). In a survey conducted by Saif et al. (2013), authors presented eight different datasets widely used in the existing literature of twitter sentiment analysis. Nevertheless, with such datasets, it is very difficult to determine public sentiments on current hot topic or any event happening. Main reason is that such corpuses were generated long back and contains general tweets. In addition to perform twitter sentiment analysis on benchmark twitter dataset having general tweets (SemEval-2013), we aim to perform twitter sentiment analysis on tweets related to a particular event or topic too. Thus, we propose to generate our own corpus (real-time twitter dataset). For this, we would collect tweets on various topics such as tweets on #demonetization, trending tweets on #lockdown, and tweets on “#9pm9minutes” decision by our honorable Prime Minister Narender Modi.

1.6.2 Tweet Normalization

Tweet normalization is the cleaning of unstructured tweets to get a noise free corpus because people often use informal language in tweets with lots of acronyms, misspelled words, emoticons, punctuations, and many more. It helps in normalizing and cleaning the notoriously noisy unstructured data and makes the dataset ready for sentiment classification. Tweet pre-processing is primary and important step towards sentiment analysis because of two main reasons .One is that pre-processing improves dataset quality by making assumptions on the features inclusion in feature vector representation which is among one the important factor affecting the success of any machine learning classifier. Second is that it helps in dimensionality reduction by removal of irrelevant elements (i.e. scripts, html tags, punctuations except exclamation and question mark, URLs, etc.) from tweet which do not contribute in evaluating sentiments. In supervised classification approach, feature vector (on which classifier would be trained) is generated from corpus and in case of tweet, each tweet token (word) participate in feature vector creation which in turn would increase dimensionality. High dimensional data will make our model over-fit to training corpus which will not generalize on unseen data. Thus, data pre-processing helps in cleaning of such uninformative elements from twitter corpus and prevent our model from being over-fitted. Above mentioned significance of data pre-processing in twitter sentiment analysis motivates us to explore the work already done in data pre-processing.

Twitter sentiment analysis requires a non-traditional approach due to tweet characteristics such as slangs, misspelled words, and many more. Thus, the informal tweet language presents new challenges for NLP. Such challenges are beyond those which are come across when working with structured text such as reviews. Thus, we would be proposing a knowledge-based Tweet Normalization System that not only remove the noise (punctuation, usernames, stop words, etc.) from a tweet but also normalize the acronyms, misspelled words, emoticons, negation, etc. For this, we shall create resource files for slangs, misspelled words, emoticons, and negation. We shall also analyze the impact of each of the pre-processing module on classification performance of different classifiers. This would help in understanding the effectiveness of each of the pre-processing module. Moreover, we shall perform experiment to determine the effectiveness of proposed tweet normalization system in itself. It ascertains us how well that system is cleaning and normalizing the tweets automatically. For this, we would compare the result of our proposed tweet normalization system with manual pre-processing.

1.6.3 Feature Engineering

Feature engineering is the generation of features from the cleaned and normalized corpus. Most of the pioneer works (Go et al., 2009; Pak & Paroubek, 2010) in twitter sentiment analysis generated the most common features such as ngrams, pos-based features, and twitter-specific features. Recent works in twitter 17 sentiment analysis used the other features too such as lexicon-based features. However, very limited works provided the complete wide and varied feature sets for classifiers. Thus, in this thesis, we shall generate various types of syntactic and semantic features from tweets such as twitter-specific automatic lexicon­based features, manual lexicon-based features, ngrams, negation features, pos-based features, emoticon features, hashtag features, etc. We shall create 4/5 feature sets from these features by grouping the related features. We shall perform the series of experiments to analyze the effectiveness of different feature groups with each of the state-of-the-art classifiers including Support Vector Machine, Naive Bayesian, and Decision Tree. This would help in better understanding of combination of features with classifiers.

1.6.4 Negation Handling

Sentiment analysis touches each and every facet of Natural Language Processing (NLP) that is word sense disambiguation, negation, coreference resolution, and many more which makes task of sentiment analysis more challenging and difficult. Negation is one of the critical aspects of NLP, which can entirely change the sentiment orientation of a piece of text. The polarity expressed by a piece of text is oftenly identified by various opinionated words or phrases. However, their polarities are reliant on contextual elements such as negation words, which either flip the polarity or change the intensity of polarity. According to a famous researcher Christopher Potts (http://sentiment.christopherpotts.net/lingstruc.html) “sentiment words behave very differently under the semantic scope of negation”. Various earlier works addressed the negation in different ways such as reverse polarity, shifting of the score of words affected by negation, or corpus-based statistical approach. Few works addressed the negation in a very simple way that is replacing negation words by the tag “NOT”. Even few addressed negation by implementing negation in feature form. However, one must not forget that not all negation means negating every time. There are various situations, in which negation presence has entirely different meaning that is negation presence doesn’t necessarily means negation. It is necessary to differentiate the meaning of negation and handle them accordingly. There are few works in the literature, who discussed on those cases of negation, but no one has presented a working prototype for that.

Thus, in this thesis we would be addressing negation in an appropriate way through corpus-based statistical approach, in which each word under negated context gets the appropriate score from the twitter-specific automatic lexicons developed by Kiritchenko et al. (2014b). We shall give different score to word under negated and affirmative context. Moreover, our main focus in negation handling is to address those negation tweets where negation presence doesn’t have negation sense. We shall analyze the negation tweets to determine a pattern for those cases. We would be proposing a set of rules for identifying those negation tweets where negation has no negation sense. It would ascertain that no negation handling would be done on those negation tweets. We shall perform the experiment to prove the contribution of handling those negation tweets in performance improvement.

1.6.5 Training of Classifiers

Supervised machine learning classifiers are being used to perform twitter sentiment analysis. A classifier has to be trained on the features extracted from training corpus, and then trained classifier would predict the sentiment from unseen test corpus. Various existing studies on twitter sentiment analysis used different types of classifiers in their supervised approach of sentiment analysis. Since, this thesis is based on supervised machine learning approach for twitter sentiment analysis; we intend to use some of the state-of- the-art classifiers such as SVM, NB, and DTC. This would help us in analyzing their advantages and disadvantages with tweets and also different classifiers feasibility on tweets. We shall perform the series of experiments with each of the classifier and with different features. This would ascertain us which classifier works well with which feature. Moreover, we shall perform experiments with each classifier to ascertain the effectiveness of the proposed negation strategy with incorporated negation exception algorithm.

1.6.6 Evaluation of Classifiers

For evaluating the performance of trained classifier, one needs a test corpus. However, that test corpus must be unseen by the trained classifier, otherwise we would get 100% accuracy, which is unreasonable. Thus, in this thesis, we shall take care that our test corpora would be completely unseen by the trained classifier. For that, we shall be using train and test split function, which randomly divides the entire corpus into training and test set. We would be using the test corpora to determine the effectiveness of our proposed twitter sentiment analysis system with incorporated tweet normalization system and negation. Same feature sets would be generated from test corpora too, and, then we shall register the sentiments predicted by trained classifier on test corpora. For evaluation of classifiers performances, we would be using various evaluation metrics such as accuracy, precision, recall, and F-measure. Our primary metrics would be recall and F- measure, which can handle the class imbalance problem (when the number of positive, negative, and neutral tweets is unequal).

1.7. Thesis Organization

The rest of this thesis is structured as follows: Chapter 2 presents the earlier work already done in twitter sentiment analysis. Firstly, we provide initial efforts in sentiment analysis in general. Then we move on to twitter sentiment analysis specifically. This chapter also provides the review of research being done in the field of tweet normalization and negation handling. Chapter 3 gives the experimental set up complete details for the proposed framework of twitter sentiment analysis. We detail out the real-time corpora that we have collected and the benchmark publically available corpus that we have used in this thesis work. Then we 19 proceed to present the labelling of real-time corpora through automatic clustering approach. It further provides a description of tool used for tokenization and POS tagging, classifiers used in this thesis work, several basic to advanced features, and resource files used for tweet normalization. Moreover, it also provides a gentle description on developed tweet normalization system and negation exception rules. Chapter 4 presents our developed tweet normalization system for cleaning the tweets. This chapter also presents the accuracy of our developed knowledge-based tweet normalization system through experiments on real-time as well as benchmark twitter corpora. Chapter 5 discusses the process of negation modelling used in this thesis with the incorporation of negation exception rules. It further presents the accuracy of our proposed negation exception algorithm on real-time as well as benchmark twitter dataset. Chapter 6 provides detailed description of training and testing of several state-of-the-art classifiers on real-time as well as benchmark dataset. This chapter further moves on by presenting a comparison of which classifier works well with which feature group. Moreover, it discusses the significance of our developed negation exception algorithm and tweet normalization system on classifiers performance. We also present the comparison of our proposed negation modelling approach with traditional reverse polarity approach. Chapter 7 presents a conclusion on work done in this thesis with possible future directions.

Chapter 2 Literature Review

The content of World Wide Web was published through website owners in the earlier stages of WWW. Moreover, content at that time was mostly objective, containing facts rather than opinionated views. But the explosion of web 2.0 such as microblogging services, blogs, etc. had changed the situation entirely and provides a platform to the users for sharing their opioniated views towards any product, services, events or topics. This led to drastic increase in the volume of texts which not only conveys facts but opinions too. Moreover, traditional techniques such as topic classification and information retrieval are insufficient in handling and analyzing such massive amount of subjective information available on social media platforms. Thus, there is requirement of automated sentiment analysis techniques for extracting valuable insight from opinionated data. Initial work in sentiment analysis was focused mainly on structured customer reviews on product and services. However, there has been sudden shift of researchers attention to unstructured data due to availability of real-time messages through tweets. Opinionated information extracted through tweets is considered to be more valuable. There exist a number of sentiment analysis tools such as sentimentor developed specifically for analyzing sentiments from tweets. A lot of studies have been conducted in field of twitter sentiment analysis. In this chapter, we shall be discussing on the various state-of-the-art studies conducted in sentiment analysis, specifically twitter sentiment analysis. Based on research goal of this thesis presented in section 1.6, we further move on by providing research being done in tweet pre-processing and negation handling.

2.1. Twitter Sentiment Analysis

Sentiment analysis is a technique of evaluating the textual data for extracting valuable knowledge by the use of machine learning and natural language processing (NLP) approaches. From various researchers sentiment analysis has received considerable attention recently because of its promising advantages in various domains such as politics, healthcare, finance, marketing, or businesses. Although there was little research in the field of sentiment analysis before 2000 but few earlier works were there on analysis of adjectives (Hatzivassiloglou & Mckeown, 1997), affects, viewpoints, and many more (Wiebe, 1990; Wiebe & Bruce, 1995). For instance, Hatzivassiloglou and Mckeown (1997) presented an algorithm for adjectives polarity determination. They used 21 million words corpus and evaluated their algorithm with 1336 manually labelled adjectives. They were able to achieve accuracy ranging 78% to 92%.

The research on sentiment analysis was started in early 2000 and was appeared firstly in (Das & Chen, 2001, Pang et al., 2002, Turney, 2002) because rapid growth has been witnessed in usage of review sites and on line discussion groups during that time. People used to post article on such sources which represent their overall sentiment towards a subject. Thus, researchers attention shifted towards analyzing the sentiment of overall text rather than topic categorization (sorting opinionated documents according to their subjects such as politics, sports etc.).

Though research in the field of sentiment analysis started with movie (Pang et al., 2002) and product reviews (Turney, 2002), but there exist studies in other domain too such as news (Balahur & Steinberger, 2009; Godbole et al., 2007; Kale et al., 2018; Nguyen et al., 2017; Reis et al., 2019; Souma et al., 2019), finance (Sohangir et al., 2018), health (Carillo-de-Albornoz et al., 2018), travel reviews (Valdivia et al., 2017), and blogs. The work of Pang et al. (2002) is considered as pioneer work in sentiment analysis and act as baselines for other researchers who used their approach in sentiment analysis across diverse domains. They analyzed the impact of three machine learning classifiers (SVM, NB, and MaxEnt) in sentiment analysis of movie reviews that is classifying whether a review is positive or negative. In their study, they used unigrams, bigrams, and pos-based features. They found SVM (unigram model) to be the best performing classifier with 82.9% accuracy and ngrams presence observed to be more effective than ngrams frequency. They also inferred from their experiments that for sentiment analysis of movie reviews, machine learning approach is much better than human generated baselines. Moreover, they concluded sentiment analysis task to be more challenging than topic categorization because people can express their opinion in a subtle manner. For instance, it is difficult to analyze sentiment from a review “How can anyone sit through that movie” because of non-presence of any sentiment bearing words. Unlike Pang et al. (2002), Dave et al. (2003) observed bi-grams and tri-grams to be the most useful for sentiment analysis of product reviews.

While the focus of most researchers (Pang et al., 2002; Pang & Lee, 2004) were towards sentiment analysis at document-level, few studies aimed for finer level task too such as determining the polarity of words using few linguistic rules or set of seed words (Hatzivassiloglou & Mckeown, 1997; Turney, 2002), subjective sentences (Pang & Lee, 2004), topics (Yi et al., 2003), subjective expressions (Kim & Hovy, 2003, Wilson et al., 2005), summarization of customer reviews (Hu & Liu, 2004; Potdar et al., 2016) and many more. Pang and Lee (2004) have shown improvement in classification performance due to finer level of sentiment analysis tasks.

Turney (2002) presented a simple algorithm for sentiment analysis of reviews (movie, car, and travel reviews). Although, work of Turney (2002) was closest to the work of Pang et al. (2002) but, they used unsupervised approach such as Point Wise Mutual (PMI) for performing sentiment analysis. PMI is a technique for determining the strength of association between words. PMI works by matching the relevant bigrams using nearby operator to determine the frequency of occurring of that bigram with the seven positive and seven negative words. This way their proposed approach infer the semantic orientation of adjectives and adverbs and then classification of review is done by the summation of inferred polarity of sentiment bearing words (adjectives and adverbs). They evaluated their algorithm with 3596 words (include 22 adjectives, nouns, adverbs, and verbs) and obtained an accuracy of 80%. Their work is related to the study of Hatzivassiloglou and Mckeown (1997) for determining semantic orientation of adjectives and able to achieve comparable accuracy as that of Hatzivassiloglou and Mckeown (1997).

Although most of the research in sentiment analysis focused on product analysis that is analyzing people opinions on products like phones, hotels, or movies, but other studies (Bhadane et al., 2015; Guha et al., 2015; Hu & Liu, 2004; Kiritchenko et al., 2014a; Nguyen et al., 2017; Potdar et al., 2016; Ray & Chakrabarti, 2019; Weichselbraun et al., 2017) also being extended to deeper level of sentiment analysis that is extracting overall people opinion on product features too rather than entire product in itself. This led to the growth of sentiment analysis in commercial environment, where marketing companies are using sentiment analysis for monitoring their customer’s opinions on their products, products features and services, analyzing customer’s trends, market buzz, and many more.

Even government agencies use sentiment analysis for assessing threat to their nation. For instance, as stated in New York Times (2006), the United State government spent a huge amount of 2.4 million dollar for funding a sentiment analysis project that monitors online activities. One of the important usages of sentiment analysis is in the prediction of election. It helps politicians in knowing about their popularity among people and better understanding of their voters. For instance, one of the famous companies of Massachusetts, Crimson Hexagon used sentiment analysis for analyzing people opinion on the oil spill in Mexico Gulf. They observed from the analysis that people living nearby to the gulf had lower tendency of blaming. Instead their focus was on relief efforts (New York Times, 2010).

Since 2000, rapid growth has been observed in sentiment analysis field and seen as one of the most active field of research in NLP. It has been proved very useful for the recommender systems, companies, and editorial sites in generating summary of people’s opinions and experiences. Existing studies in sentiment analysis have applied different techniques to perform sentiment analysis such as NLP (Yi et al., 2003), machine learning (Pang & Lee 2004; Pang et al., 2002; Reis et al., 2019), unsupervised lexicon-based approach (Hu & Liu, 2004, Kim & Hovy, 2004; Liu, 2010; Taboada et al., 2010, Turney, 2002), and recently deep learning (Nguyen et al., 2017; Qian et al., 2016; Rezaeinia et al., 2018; Ray & Chakrabarti, 2019; Sohangir et al., 2018; Souma et al., 2019; Wang et al., 2016). A large amount of work has been done over the last decade in exploring different aspects of sentiment analysis: Subjective vs. objective classification, polarity classification, emotions classifications such as joy, fear, sadness, etc. The works of Pang and Lee (2008) and Liu and Zhang (2012) provided the survey on those approaches. The study conducted by Pang and Lee (2008) and Owoputi et al. (2013) focused on various features and their effectiveness such as pos, topic-based features, negation handling, and many more.

Traditionally researchers addressed sentiment analysis task as binary classification that is classifying a piece of text into positive or negative classes (Hu & Liu, 2004; Kim & Hovy, 2004; Turney 2002). However few earlier works (Pang & Lee, 2004; Wilson et al. 2005) focused on more than binary classification and performed subjective/objective classification. That is firstly text is classified as either subjective or objective, and then subjective text is further classified polar (positive or negative) or non-polar (neutral). Pang and Lee (2004) considered only negative and positive classes while subjectivity classification. On the contrary, Wilson et al. (2005) considered neutral class too. Pang and Lee (2004) argued that use of lexical items is insufficient to get accurate prediction of sentiment, and, hence, used machine learning techniques (SVM and NB) as done by Pang et al. (2002) to classify documents. However, they were able to get improvement in polarity classification due to removal of objective sentences. They obtained significant improvement from 82.8 to 86.4.

Thus it can be concluded that research in sentiment analysis ranging from document-level (Pang & Lee, 2008) to determining phrase and word polarity (Esuli & Sebestiani 2006; Hatzivassiloglou & Mckeown, 1997). Moreover, traditionally most of the researchers focused on sentiment analysis of larger and structured text such as blogs, movie review (Pang & Lee, 2004; Pang et al. 2002), product reviews and even reviews on services provided by any organizations such as hotel, shops etc. (Siqueira &Barros, 2010).

One of the notable works in the field of sentiment analysis was done by Kiritchenko et al. (2014a), who participated in SemEval-2014 shared task on aspect-level sentiment analysis. They proposed the use of in­house sequence tagger for detecting aspect terms (explicit feature of a product or services such as camera, battery of a phone) and supervised classifiers (SVM) for detecting aspect categories and sentiment towards aspect terms and categories (group of similar aspect terms such as all food items can be grouped into one category food). They stood first in determining aspect categories and sentiment towards categories. They obtained third position in aspect term detection and first position in determining sentiment towards laptop domain aspect terms.

In 2016, Potdar et al. (2016) in their study presented a tool called SAMIKSHA (review bot) which create a factual summarization of public reviews on a product which in turn would help buyers to get vision of public opinion on that product. The proposed tool provides averaging rating of product features.

Twitter Sentiment analysis

While most of the earlier studies were focused on larger text such as reviews on products and services, recent researches addressed the shorter and unstructured microblog text. Since twitter was launched in 2006, so there was no earlier study on twitter sentiment analysis before 2006. Despite of being a decade old only, there is sudden upsurge in the field of twitter sentiment analysis, which handles the computational management of sentiments, opinions, and text subjectivity. However due to informal linguistic style of 24 tweets different approaches were used in the literature. Bermingham and Smeaton (2010) in their study experimented with tweets, micro reviews, blogs, and movie reviews. They observed that sentiment analysis of tweets is easier than longer text. However, they stated that situation becomes opposite when high order ngrams are being considered. Thus, twitter sentiment analysis is considered as much tougher problem than sentiment analysis of large text such as reviews. There exist varieties of studies in the field of twitter sentiment analysis exploring different techniques and varied features. Nevertheless, most of them addressed the twitter sentiment analysis through machine learning approach (machine learning algorithms are trained on the features extracted from corpus and then evaluated the model on test dataset) (Agarwal & Sabharwal, 2012; Agarwal et al., 2011; Angaria et al., 2014; Barbosa & Feng, 2010; Barnaghi et al., 2016; Chikersal et al., 2015; Davidov et al., 2010; Garg & Chatterjee, 2014; Go et al., 2009; Godea et al., 2015; Gupta & Joshi, 2019; Jiang et al., 2011; Kolchyna et al., 2015; Kouloumpis et al., 2011; Mukherjee & Bhattacharyya, 2012; Neethu & Rajasree, 2013; Pak & Paroubek, 2010; Saif et al., 2012a, 2012b; Spencer & Uchyigit, 2012; Srivastava et al., 2019; Wakade et al., 2012; Zhang, 2013 ; Zhou et al., 2011) and, more recently, deep learning technique (Alharbi & de Doncker, 2019; Mahata et al., 2018; Ren et al., 2016; Saroufim et al., 2018; Tang et al., 2014) (neural network is used for automatic learning of complex features).

Early attempt on twitter sentiment analysis was done by Go et al. (2009), Pak and Paroubek (2010), and Spencer and Uchyigit (2012) and is among one of the first researches in twitter sentiment analysis specifically. They have explored microblogs and have done notable work in the field of twitter sentiment analysis. They all used distant supervision approach in their work for collecting and labelling training tweets automatically such that tweet containing positive emoticon is marked as positive while tweet containing negative emoticon is marked as negative. Pak and Paroubek (2010) and Spencer and Uchyigit (2012) tested their proposed technique for twitter sentiment analysis on manually annotated 216 tweets while Go et al. (2009) evaluated on 359 manually annotated tweets. Go et al. (2009) collected 1.6M tweets and Pak and Paroubek (2010) collected 300k tweets using distant supervision approach. Unlike the Pang et al. (2002) they evaluated three different classifiers (MaxEnt, NB and SVM) on twitter corpus. Go et al. (2009) explored different ngrams (unigrams and bigrams) and POS -based features with MaxEnt, NB and SVM classifiers and observed NB as best classifier. They obtained accuracy of up to 81% on their test tweets. The best accuracy of 83% was observed by MaxEnt classifier. They found unigram+ bigram as best performing features across all the three classifiers. Moreover Pos features were found to be not useful which is consistent with the findings of pang et al (2002). Their work is limited to binary classification only, as they didn’t consider objective tweets.

On the contrary, Pak and Paroubek (2010), and Spencer and Uchyigit (2012) in their works focused on objective tweets too and collected them by querying account of newspaper and magazines. Pak and 25 Paroubek (2010) explored unigrams, bigrams, trigrams, and POS features with Multinomial NB classifier while Spencer and Uchyigit (2012) explored unigram and bigrams with or without pos-based features by developing a web based tool called sentimentor which utilizes NB classifier. Pak and Paroubek (2010), and Spencer and Uchyigit (2012) both observed bigrams to be useful feature then other ngrams. Through their linguistic analysis of corpus, they observed POS to be strong indicator of sentiments which is inconsistent with Go et al. (2009) findings. However, some of the findings of Spencer and Uchyigit (2012) during linguistic analysis of twitter corpus are not consistent with findings of Pak and Paroubek (2010), specifically in positive and negative tweets.

In another study, Garg and Chatterjee (2014) investigated the relevance of negation detection and two step classification process (subjective vs objective and then positive vs negative). Their negation detection approach is based on left and right connectivity of a word with nearest negation cue. They experimented NB and MaxEnt model with unigrams, bigrams, trigrams, negation features, and their combinations. They observed best result of 75.33% in terms of accuracy with NB classifier, when trained on combination of unigram, bigrams, trigrams and negation features. The studies of Go et al. (2009) and Pak and Paroubek (2010) are considered as notable works in the field of twitter sentiment analysis.

Success of twitter sentiment analysis depends on varieties of features extracted from twitter corpus. Works described above are limited to ngrams and pos-based features. Thus, many researches during that time (Agarwal et al., 2011; Bakliwal et al., 2013; Barbosa & Feng, 2010, Kouloumpis et al., 2011) focused their work in exploring varied and new features such as twitter-specific features (Barbosa & Feng, 2010, Kouloumpis et al., 2011), scores from different sentiment lexicons (i.e. AFINN, Bing-Liu list, MPQA, SentiWordNet etc.), semantic (Saif et al., 2011) and sentiment features (Saif et al., 2012a) for getting better results rather than just using traditional ngrams and pos-based features. twitter-specific features include elongated words, capital words, emoticons, hashtags, and punctuations.

Barbosa and Feng (2010 ) argued that use of only ngrams features may hamper the performance of classification because of sparsity (infrequent terms in tweet). Thus, they explored twitter-specific features (replies, hashtags, re-tweets, emoticons and punctuations, and lexical features. They found an accuracy improvement of 2.2% when SVMs are trained on such microblogging features as compare to SVMs trained on only unigrams. However, unlike the Kouloumpis et al. (2011), they performed deeper analysis by using two different classifiers, one is subjective/objective classifier and other is positive/negative.

Kouloumpis et al. (2011) also stated similar findings for ngrams and investigated the usefulness of twitter­specific (letter repetition, all-caps, emoticons, and abbreviations) and lexical features for twitter sentiment analysis. In their work, authors also explored the use of existing hashtags such as #epicfail for automatically collecting and labelling training data which is completely different from the distant supervision approach 26 used by Go et al. (2009) and Pak and Paroubek (2010). For training purpose they also used emoticon dataset created by Go et al. (2009) but used manually annotated Isieve data set used for performance evaluation of classifier. They evaluated Adaboost model trained on hashtagged and hashtagged+emoticons dataset with combination of various features: ngram+lexicon, ngram+pos, ngram+lexicon+microblogging feature, and all four. Results showed that model trained on hashtagged dataset alone with combination of ngram+lexicon+microblogging feature give best performance. Like Go et al. (2009), they observed POS to be not useful in microblogging domain. One reason might be the bad quality of POS tagger.

Davidov et al. (2010) also used the same method of building training data as that of Kouloumpis et al. (2011) but they focused on binary classification rather than 3 way classification.

Agarwal et al. (2011) also explored pos, twitter-specific, and lexicon features . They studied 3 models - unigram, feature-based, and tree kernel model on binary and 3-way classification task and found that tree kernel performs better than both by 4.02% and 4.29% respectively. They also suggested combination of unigram+feature model and feature+tree kernel model and experimental results showed that both combinations outperformed unigram by 4%. Also, after rigorous analysis they found that most important feature are that which combine prior polarity with POS tags and proposed the new 100 features that can improve classifier accuracy. Furthermore, they also reported that combining the best feature set with unigrams outperform the tree kernel model. It is worth noting that, work of Kouloumpis et al. (2011) and Barbosa and Feng (2010) emphasized on the usage of twitter-specific features while study performed by Agarwal et al. (2011) and Pak and Paroubek (2010) argued on the POS importance with or without prior polarity.

It can be seen that most of the state-of-the-art studies for twitter sentiment analysis follow the machine learning approaches suggested by Pang et al. (2002). However, few earlier studies such as Joshi et al. (2011) presented unsupervised lexicon approach for twitter sentiment analysis. They presented a rule-based system known as C-Feel-It, which perform binary classification of tweets based on the sentiment bearing words in tweet.

In another study, Speriosu et al. (2011) focused in graph construction rather than incorporating the twitter­specific features directly. Their graph has few twitter-specific features such as emoticons and hashtags together with ngrams, tweet, and users as its nodes. Nodes are connected to each other based on the type of link exist between them (e.g. user nodes are connected to tweet node). For sentiment classification, they applied label propagation algorithm and observed that their proposed algorithm outperformed the MaxEnt model. They managed to get 84.7% accuracy on subset of twitter test dataset introduced by Go et al. (2009). Jiang et al. (2011) on the other hand, performed finer level of analysis that is their work tried to determine strength of polarity too ranging from 1 to 5. They also presented 3-way classification 27 (objective/positive/negative) like Kouloumpis et al. (2011). Traditionally objective text is considered to contain only facts (devoid of opinions) (Pang & Lee, 2004; Wilson et al., 2005). Thus, earlier works focused in two-way classification only (Pang & Lee 2004; Pang et al., 2002) but later few studies such as Barbosa and Feng (2010), Wilson et al. (2005), and Pak and Paroubek (2010) stated that microblogs such as tweets often state facts. Hence neutral class incorporation is necessary. Thus, they performed three-way classification. However, Barbosa and Feng (2010), Jiang et al. (2011), and Kouloumpis et al. (2011) in their studies performed three-way classification but didn’t explore cascaded design. For example, Barbosa and Feng (2010) developed two separate classifiers for performing subjective/objective classification, and, then, positive/negative classification. They evaluated both classifiers individually and didn’t explore their combination.

Agarwal and Sabharwal (2012) filled this gap and build 3 cascaded designs on top of each other: objective vs subjective, polar vs non-polar, and positive vs negative for labelling tweets as positive, negative, objective, and neutral. They used the feature set of their previous work (Agarwal et al., 2011). They observed best design to be PNP-objective-neutral where objective and neutral tweets are treated as non­polar. They extended the work of Wilson et al. (2005) by presenting a comparison of their 4-way classifier to their cascaded design. Study performed by Wilson et al. (2005) is one of the earliest studies in exploring cascaded design, who created 2 cascaded models: polar vs non-polar and positive vs negative.

It can be concluded from the above discussion that some of the twitter sentiment analysis studies (Go et al., 2009) considered the sentiment analysis as binary classification task (positive vs negative), while few considered it as a two tier task that is subjective vs objective. After classifying text into either subjective or objective, only the subjective piece of text is analyzed further for polarity classification (Barbosa & Feng, 2010; Choi & Cardie, 2010; Jiang et al., 2011), but it may lead to errors propagation. For instance, system may label a subjective sentence as objective. Hence, in this thesis work we model the task of twitter sentiment analysis as three way classification task: positive, negative, and neutral.

Earlier works (e.g., Agarwal et al., 2011; Barbosa & Feng, 2010; Go et al., 2009; Jiang et al., 2011) described above explored various effective features and techniques for twitter sentiment analysis. Although, they obtained good classification performance but the use of typical classifiers such as NB suffered from problem of data sparsity because of unstructured nature and short length of tweets. Users can post their tweet about anything. Hence, we can find large amount of named entities which are infrequent, and, thus, make the data sparse which in turn affect the performance of classification. Some of those entities are related semantically. For example, the named entities “Ipad” and “Ipod” are related semantically to the same concept “Apple product”. Moreover, sentiment expressed by a word often depends on the context it appears. There exist several studies (Muhammad et al., 2016; Saif et al., 2014b; Saif et al., 2011), which 28 focused in exploring semantics (contextual and conceptual) in twitter sentiment analysis. Contextual semantics deals with the local context such as negation, inferred from the neighborhood words. Conceptual semantics is dealt with semantic knowledge bases (e.g. ontologies). One of the pioneer works in dealing with semantics in sentiment analysis was done by Saif et al. (2014b). They have done several studies in previous years too, which focused on dealing with semantically related concepts.

Saif et al. (2011) in their work addressed the problem of data sparsity and proposed semantic smoothing for alleviating this data sparsity problem. They explored new set of features called semantic features in their work based on the fact that named entities are related semantically. Thus, extracted hidden concepts from training data, and, then, incorporate extracted concepts into unigram language model of NB classifier as an additional feature by the use of AlchemyAPI interpolation method. They found interpolation to be more effective approach for incorporating features than augmentation approach used by Agarwal et al. (2011), and Kouloumpis et al. (2011). They used the twitter dataset of Go et al. (2009) but for training they selected balanced set of 60000 tweets. For evaluating their approach, they used same test set of 359 tweets as of Go et al. (2009). They observed that using interpolating concepts as an additional feature outperformed the NB classifier without smoothing.

Further in 2012, Saif at al. (2012b) explored one more feature set namely, sentiment topic feature in addition with semantic feature set (Saif et al., 2011). For sentiment topic feature set, they used joint sentiment topic model for extracting topics and the associated sentiment of that topic and then augment those sentiment topics with original feature set. They experiment their both proposed feature sets on Stanford Twitter dataset (Go et al., 2009). While the result proved that NB classifier trained on proposed features outperform the baseline approach trained with unigrams only but only sentiment features gives better results with less features than using semantic features. In order to compare their approach with existing, they tested the NB classifier trained on their sentiment topic features on original Stanford test set (Go et al. 2009) and obtained 86.3% accuracy which clearly outperforms the result of Go et al. (2009).

In 2014, Saif et al. (2014b) came up with another interesting research, in which they focused on conceptual semantics (depends on neighborhood words) and contextual semantics (inferred from ontologies) of opinionated words. In their study, they proposed a lexicon-based approach SentiCircle, for twitter sentiment analysis. They evaluated their proposed approach on three different datasets including PMD, HCR, and STS-Gold using three sentiment lexicons including MPQA, SWN, and Thelwall. They presented a comparison of their approach with lexicon baseline methods and SentiStrength (Thelwall et al., 2012; Thelwall et al., 2010). They observed 20% improvement in accuracy and 30-40% in F-score over the lexicon baselines. However, SentiCircle outperformed the SentiStrength in accuracy but not in f-score. SentiStrength is a lexicon-based approach proposed by Thelwall et al. (2012). It assigns strength to text in 29 range -5 to +5, where +5 is extreme positive and -5 is extreme negative. It overcome the ill-formed and dubious grammar nature of a tweet through various lexical rules for the negation (e.g., no, not), intensifier (e.g., very), emoticons, and booster words (e.g., extremely). However, it’s been pointed out in various studies (Gupta & Joshi, 2019; Muhammad et al., 2016; Saif et al., 2014b) that lexicon-based approach doesn’t deal with the context of a word that is polarities are assigned irrespective of the word context. Thus, Thelwall et al. (2012) tried to improve the previous version of SentiStrength (Thelwall et al., 2010) by presenting an algorithm which can update the polarity of sentiment bearing word in a lexicon. However, that algorithm needs to train from manually labelled corpora.

Some of the earlier researches (Mukherjee & Bhattacharyya, 2012; Taboada et al., 2008) performed deeper analysis too by addressing the importance of discourse relations such as conditionals (if, if-else, until, unless, etc.), connectives (but, and, or, etc.), modals, and negation in sentiment analysis based on the fact that such linguistic constructs can alter or change the sentiment of a piece of text. Taboada et al. (2008) in their work focused on discourse relations but their techniques are well established for structured text, not the microblogs.

Furthermore, most of the previous works (e.g., Go et al., 2009; Kouloumpis et al., 2011) in twitter used the bag of words model with different features such as ngrams, pos, twitter-specific features, and lexicon-based features. Although they performed reasonably well, they ignored the impact of discourse relations such as connectives; conditionals etc. and remove such linguistic constructs by considering them as stop words during feature creation.

Mukherjee & Bhattacharyya (2012) carried over the discourse relation work of Taboada et al. (2008) in sentiment analysis of noisy and unstructured tweet and explored the impact of such linguistic constructs in twitter sentiment analysis. They incorporated discourse information into the BOW model for improvement of classification performance. They validated their discourse-based BOW model on three datasets using supervised (SVM classifier) and lexicon classification. To evaluate their approach effectiveness, they used manually annotated 8507 tweets and 15204 automatically annotated tweets (follows the work of Go et al., 2009; Pak & Paroubek, 2010 for creation of training set automatically). They also used travel review domain dataset to show their effectiveness on structured text. They obtained accuracy improvement of 2-4% with their proposed discourse-based BOW model. Precisely, they obtained an accuracy improvement of 4% over the work of Joshi et al. (2011) in lexicon-based classification and 2% improvement over the baseline SVM in supervised classification.

In another study, Chikersal et al. (2015) also enhanced supervised learning by leveraging rules for conjunctions and conditionals to modify ngrams. For this, they trained SVM on 1.6 million tweets provided by Go at al. (2009) and test it on 1794 tweets provided by the (Nakov et al., 2013). In addition to that, use of 30 unsupervised rule-based classifier proved beneficial for changing prediction of SVM classifier whose decision score is low. They used SenticNet (Cambria et al., 2012) for this purpose. SenticNet contains 14k concepts (fine grained) obtained from Open Mind Corpus labelled with their semantic orientation. It proved to be valuable in case of unstructured text such as reviews. However, it’s not tailored specifically for tweets unlike SentiStrength.

In 2016, one more notable work in the field of twitter sentiment analysis was done by Muhammad et al. (2016). They performed deeper study like Saif et al. (2014b) by considering contextual polarity. They stated that polarity of a term depends on the context in which it appears. For e.g., consider a piece of text “I don’t like this movie”. This text may be labelled as positive sentiment but the presence of negation cue “don’t” affect the polarity of word “like” and make the text to be rendered as negative. Hence, they presented SMARTSA lexicon-based system which would take into account both local and global contextual polarities of terms. They hybridized SentiWordNet lexicon with domain-specific knowledge for handling global context. They also presented strategies for handling local context elements such as negation, intensifiers, modifiers, etc. and observed better performance in comparison with state-of-art system SENTISTRENGTH (Thelwall et al., 2012). In their previous work, (Muhammad et al., 2013) they investigated the possibility of using negation and valence shifters as sentiment bearing words or as modifiers only. Their experimentation results showed performance improvement when considering negation as sentiment bearing words and valence shifters as modifiers only.

In the past few years, SemEval conference on semantic evaluation organized shared task for twitter sentiment analysis and provided the training, development, and test data set. The task hosts also provide one out of domain dataset such as SMS to check how well a system performs on domain other than it is trained on. In such shared task, teams were compared with each other in their performances. The task organizers felt that there is hindrance in the research of twitter sentiment analysis because lacks of appropriate and suitable twitter datasets. Though, there already exist Twitter dataset but they were either copyrighted and small (i- seive corpus, Kouloumpis et al., 2011) or they were collected using noisy labels such as emoticons and hashtags (Go et al., 2009). Thus, their aim is to promote twitter sentiment analysis research and to provide SemEval corpus to the various participants of the task. Then, they invite the teams for performing twitter sentiment analysis tasks at various levels such as message-level, expression-level, tweet quantification, and many more. For instance, in 2013 Nakov et al. (2013) proposed SemEval-2013 task 2: sentiment analysis in twitter, which contained two subtasks: expression-level and message-level subtasks. Their tasks attracted 44 teams with 149 submissions. Also, this workshop on SemEval has been annually hosted since 2013 and some of the subtasks are being carried over to the next year. It is worth noting that, most of the top participating teams used the machine learning (most popular classifiers were SVM, NB, and MaxEnt) or deep learning approaches.

Some of the noteworthy and the state-of-the-art approaches are the NRC-Canada system (Kiritchenko et al., 2014b; Mohammad et al., 2013; Zhu et al., 2014) and the deep learning-based (Socher et al., 2013). In this literature review, we only discuss the studies of top participants of SemEval competition.

One of the noteworthy state-of-the-art approaches for twitter sentiment analysis is NRC-Canada system (Mohammad et al., 2013). Their team participated in SemEval-2013 task 2 competition organized by Nakov et al. (2013). They used machine learning classifier SVM with varied features for both the subtasks (term­level and message-level). They evaluated their trained classifier on SMS dataset too. In their work, they explored the effectiveness of different features such as lexicon-based features, twitter-specific features, ngram features, pos-based features, cluster-based features, and negation features by the removal of one feature group at a time. For the lexicon-based feature generation, they used three manual lexicons such as MPQA, Bing Liu, and NRC Emotion (Mohammad & Turney, 2010; Mohammad & Yang, 2011) and two tweet-specific automatic lexicons created by them including NRC Hashtag and Sentiment140. They found lexicon-based feature to be the most important in performance raise of SVM, which provided more than 8.5% gain in message-level subtask. Moreover, contribution of tweet-specific automatic lexicon (generated by them) is more than manual lexicon when SVM is tested on twitter data set. However, in case of SMS dataset manual lexicon contribution is more because SMS dataset is structured in nature. Their system obtained first position in both the subtasks on tweet that is 69.02 F-score in message-level subtask and 88.93 in the expression-level subtask.

Many systems (e.g., Balikas & Amini, 2016; Giorgis et al., 2016; Plotnikova et al., 2015; Zhu et al., 2014) have their feature vector based on the SemEval-2013 top rank system (Mohammad et al., 2013). Their feature vector act as the state-of-the-art feature vector for many of the other studies performed since 2013. We too use their feature vector in our thesis work but improve it by introducing sophisticated negation exception rules.

Later in 2014, Kiritchenko et al. (2014b) improved the performance of their previous system (Mohammad et al., 2013) with which they participated in SemEval-2013 task 2: sentiment analysis of twitter. They managed to obtain F-score of 70.45 in message level task and 89.50 in term level task. Their boosted performance is due to negation handling through creation of twitter-specific negated context lexicons (S140 NegLex and NRC Hashtag NegLex) and affirmative lexicons (S140 AffLex and NRC Hashtag AffLex) from the base lexicons (S140 and NRC Hashtag).


Excerpt out of 191 pages


A Comprehensive Approach on Sentiment Analysis & Prediction
Catalog Number
ISBN (Book)
comprehensive, approach, sentiment, analysis, prediction
Quote paper
Manu Banga (Author), A Comprehensive Approach on Sentiment Analysis & Prediction, Munich, GRIN Verlag, https://www.grin.com/document/1315485


  • No comments yet.
Read the ebook
Title: A Comprehensive Approach on Sentiment Analysis & Prediction

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free