Emotions are not only the foundation of human life, but also influence all decisions in modern markets. Grasping the reasoning behind our choices is a key element in econom-ic sciences. Sentiment analysis, a tool to extract emotions from text, is used in this thesis to analyze customers’ opinions in various markets. The calculations are done on a server architecture that is designed to be scalable for massive input directly from social net-works. It computes the sentiment score in a flexible multi-stage process and provides several methods of accessing the results.

Subsequently, it is demonstrated how to use the system’s capabilities by implementing various commercial use cases. This includes geographical and demographic analysis. Additionally, the system is able to provide near-real-time results.

Lastly, the thesis concludes by performing several correlation analyses on the collected data. This illustrates how the intensity of emotions vary by the maturity and form of the economic market and affects the participating companies in these markets.

Excerpt

1. Introduction

1.1. Goal of the Thesis

1.2. Chapter Outline

2. Sentiment Analysis

2.1. Goals of Sentiment Analysis

2.2. Use Cases of Sentiment Analysis

2.3. Challenges of Processing Natural Language

2.3.1. Negation

2.3.2. Deontic Irrealis

2.3.3. Languages

2.3.4. Emoticons, Acronyms, and Further Improvements

2.4. Domain-specific Language

2.5. Algorithmic Principles of Sentimental Analysis

2.6. Measuring Sentiment Analysis Accuracy

2.6.1. Precision, Recall and Accuracy

2.6.2. F-score

2.6.3. Supervised Machine-Learning Algorithms

2.6.4. Unsupervised Algorithms

2.6.5. Dictionary-based Algorithms

2.6.6. Comparison of the Sentiment Analysis Algorithms

2.7. Sentiment Analysis in Social Networks

3. Large-Scale Sentiment Analysis System

3.1. System Architecture Overview

3.2. Data Scraping

3.2.1. Data Structure

3.2.2. Twitter’s Streaming API

3.3. Scalability with Hadoop

3.3.1. Scalable Data Storage: HDFS and HBase

3.3.2. Scalable Computation: MapReduce and Derivatives

3.4. Sentiment Analysis Implementation

3.4.1. Overview of Data Computing Process

3.4.2. Pre-Processing: Cleaning, Tokenization and POS-Tagging

3.4.3. Processing: Computing the Basic Sentiment Score

3.4.4. Post-Processing: Enhancing Sentiment Analysis Score Precision

3.4.5. Scheduling and Modularity for Near-Real-Time Predictions

3.5. Evaluation of Sentiment Analysis Accuracy

3.6. Evaluation of Speed and Scalability

3.7. Providing Results – Using the Calculated Sentiments

3.7.1. Aggregated Sentiment Values and Near-Real-Time Analysis

3.7.2. Preparing for Basic Online Analytical Processing

4. Commercial Use Cases: Attaining Competitive Advantages

4.1. Business Brand Value Monitoring

4.1.1. Near-Real-Time Monitoring

4.1.2. Historical Analysis

4.1.3. Demographic and Geolocation-based Analyses

4.2. Sentiments of Brands, Products and Markets

4.3. Sentiments of Product Features

5. Scientific Use Cases: Understanding Competitive Market Behavior

5.1. Fundamentals of Economic Markets

5.1.1. Market Forms

5.1.2. Market Maturity

5.2. Sentiments in Competitive Markets

5.2.1. First Hypothesis

5.2.2. Second Hypothesis

5.2.3. Third Hypothesis

6. Conclusion

6.1. Limitations of Findings and Further Work

7. Bibliography

Research Objectives and Thematic Focus

This thesis aims to develop a scalable, near-real-time sentiment analysis system that leverages public social media data to interpret consumer behavior and competitive market dynamics. The research addresses the technical challenges of big data processing and explores how sentiment analysis can provide actionable business intelligence and scientific insights into economic market behaviors.

Development of a scalable, multi-stage sentiment analysis architecture using big data technologies.
Implementation of near-real-time data scraping and processing workflows for social media streams.
Evaluation of system performance, scalability, and sentiment analysis accuracy.
Application of sentiment analysis to commercial use cases such as brand monitoring and feature analysis.
Scientific exploration of the correlation between competitor sentiments in various market forms and maturity levels.

Excerpt from the Book

2.3.1.Negation

One of the most important linguistic devices is negation. Language negators (i.e. “not,” “without,” and “lacks”) reverse the polarity of a sentence and are a complicated detail of sentiment analysis. The following excerpt from a movie review [26] demonstrates the power of negators:

This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.

The quote features several clear-cut positive words and a single imperceptible negator reverses the entire sentiment of the review. Negation is a very powerful tool in the human language, but extremely hard to detect and understand for algorithms [27]. Research agrees that they play an important role in the overall accuracy [28], [29].

However, the main goal of this master thesis is the analysis of overall sentiment value of a company’s brand in a specific time frame. The individual sentiment of a single text is less critical, since there is an aggregation of thousands of sentiment values into one score. Thus, the negation—in this specific use case—is of less impact, because it will overall offset when aggregating the sentiment values. Nonetheless, in the future it is necessary to add a negation analysis module.

Summary of Chapters

1. Introduction: Outlines the significance of human sentiment in modern economic markets and defines the goals of the thesis regarding sentiment analysis implementation.

2. Sentiment Analysis: Explores fundamental principles, objectives, and challenges of sentiment analysis, including linguistic nuances like negation and the comparison of various classification algorithms.

3. Large-Scale Sentiment Analysis System: Details the design and architecture of a scalable system utilizing Hadoop for scraping and processing massive streams of social network data in near-real-time.

4. Commercial Use Cases: Attaining Competitive Advantages: Demonstrates how the system can be applied to business scenarios such as brand value monitoring, historical analysis, and product feature sentiment extraction.

5. Scientific Use Cases: Understanding Competitive Market Behavior: Analyzes the correlations of sentiment between competitors in different market forms, testing hypotheses on how market maturity affects sentiment dynamics.

6. Conclusion: Summarizes the achievements in building a scalable system and reflects on the limitations, suggesting future research directions such as multi-lingual support and deeper conversation modeling.

7. Bibliography: Lists the academic references and sources utilized throughout the thesis.

Keywords

Sentiment Analysis, Big Data, Hadoop, Twitter API, Near-Real-Time, Market Behavior, Consumer Behavior, Scalability, MapReduce, Business Intelligence, Sentiment Classification, Data Scraping, Competitive Advantage, Correlation Analysis, Natural Language Processing

Frequently Asked Questions

What is the core focus of this research?

The research focuses on designing and implementing a scalable architecture to perform sentiment analysis on social media data, specifically Twitter, to extract actionable insights for both business and scientific applications.

What are the primary themes covered in the thesis?

The thesis covers sentiment analysis theory, big data architecture (using Hadoop), commercial applications like brand monitoring, and scientific investigation into how competitor sentiments correlate within different market structures.

What is the ultimate goal of the system?

The primary goal is to provide near-real-time sentiment analysis that can handle massive input data, allowing users to gain a competitive advantage by monitoring consumer opinions as they happen.

Which methodology is employed for sentiment analysis?

The author implements a dictionary-based sentiment analysis algorithm, supported by pre-processing steps like cleaning, tokenization, and POS-tagging, as it offers the best balance of speed and scalability for this project.

What is discussed in the main body of the work?

The main body details the technical architecture of the system, the challenges of processing natural language (like negation), the evaluation of system speed and accuracy, and various use cases ranging from business brand monitoring to testing economic hypotheses.

Which keywords best characterize this thesis?

Key terms include Sentiment Analysis, Big Data, Hadoop, Scalability, Near-Real-Time Processing, Market Behavior, and Social Network Data.

How does the system handle "Negation"?

The author notes that while negation is a significant linguistic challenge, it has less impact in this specific project because the system aggregates thousands of tweets, which causes individual sentiment inaccuracies to offset each other.

Why was Twitter chosen as the data source?

Twitter was chosen because it provides a public, structured stream of data via an open API, and its format (140 characters) is well-suited for high-speed processing in a large-scale architecture.

What were the findings regarding market competition?

The thesis found that, contrary to some expectations, sentiment correlations between competitors are generally low in stable markets, but become more pronounced when analyzing specific, high-intensity events within young, emotional markets like smartphones.

How does the system scale to handle peak activity?

The system uses a modular scheduling approach that prioritizes live incoming data and uses a temporary buffer to handle bursts in activity, effectively leveraging the "scale-out" capabilities of the Hadoop ecosystem.

Excerpt out of 110 pages - scroll top

Details

Title: Exploration of Competitive Market Behavior Using Near-Real-Time Sentiment Analysis
College: Otto-von-Guericke-University Magdeburg (Faculty of Computer Science)
Grade: 1.1
Author: Norman Peitek (Author)
Publication Year: 2014
Pages: 110
Catalog Number: V286583
ISBN (eBook): 9783656868682
ISBN (Book): 9783656868699
Language: English
Tags: Sentiment Analysis Hadoop
Product Safety: GRIN Publishing GmbH

Quote paper: Norman Peitek (Author), 2014, Exploration of Competitive Market Behavior Using Near-Real-Time Sentiment Analysis, Munich, GRIN Verlag, https://www.grin.com/document/286583

Exploration of Competitive Market Behavior Using Near-Real-Time Sentiment Analysis