From Valence to Emotions

How Coarse versus Fine-Grained Online Sentiment Can Predict Real-World Outcomes


Diploma Thesis, 2012

80 Pages, Grade: 1,7


Excerpt

Table of Contents

Abstract

List of Abbreviations

List of Figures

List of Tables

1 Introduction

2 Structure of Thesis

3 The Need of Automated Prediction Using Online Sentiments

4 What are the Different Prediction and Sentiment Detection Approaches and Techniques based on User-Generated-Content?
4.1 User Generated Content and its Technical Background
4.1.1 Social Media vs. Web 2.0
4.1.2 Online Community
4.1.3 Social Networking Service
4.1.4 Weblog
4.1.5 Review Site
4.2 Online Word-of-Mouth
4.2.1 Appearance of Online Word-of-Mouth
4.2.1.1 Scale Rating
4.2.1.2 Tweets
4.2.1.3 Review Texts
4.2.1.4 Blog Posts
4.2.2 Forms of Online Sentiments
4.2.2.1 Volume
4.2.2.2 Valence
4.2.2.3 Emotions
4.3 Sentiment Classification
4.3.1 Machine Learning Techniques
4.3.1.1 Naïve Bayes
4.3.1.2 Maximum Entropy
4.3.1.3 Support Vector Machines
4.3.2 Semantic Orientation Approach
4.3.2.1 Pointwise Mutual Information and Information Retrieval
4.3.2.2 Latent Semantic Analysis

5 How Consistent are Prediction Results Based on Online Sentiments?
5.1 Predictive Power of Online Sentiments
5.1.1 Stock Markets
5.1.1.1 Predictive Sources
5.1.1.2 Methods and Findings
5.1.1.3 Consistency
5.1.1.4 Limitations
5.1.2 Sales Volume
5.1.2.1 Predictive Sources
5.1.2.2 Methods and Findings
5.1.2.3 Consistency
5.1.2.4 Limitations
5.1.3 Box Office Revenues
5.1.3.1 Predictive Sources
5.1.3.2 Methods and Findings
5.1.3.3 Consistency
5.1.3.4 Limitations

6 Do Fine-Grained Sentiments Generate New Insights and Better Prediction Results Than Coarse Sentiments?

7 Conclusion

8 Managerial Implications

Bibliography

Abstract

The growing number of user-generated content online has led to a huge amount of data that can be used for scientific research. This thesis investigates the prediction of certain human-related events using valences and emotions expressed in user-generated content with due regard to past and current research. First, the theoretical framework of user-generated content and sentiment detection- and classification methods is explained, before empirical literature is categorized into three specific prediction subjects. This is followed by a comprehensive analysis including a comparison of prediction methods, consistency, and limitations with respect to each of the three predictive sources. It was found that the research results and prediction accuracies analyzed significantly differ from each other according to the sources of data and prediction methods they employed. In addition, a comparison of fine-grained and coarse sentiments as predictive data sources shows that fine-grained sentiments improve prediction accuracy. Theoretical concepts are used also for evaluation purposes because empirical data on fine-grained sentiment approaches is scarce.

List of Abbreviations

illustration not visible in this excerpt

List of Figures

Figure 1: Research Framework

Figure 2: Seven Functional Building Blocks of Social Media

Figure 3: Functional Building Blocks of Online Communities

Figure 4: Functional Building Blocks of Social Networking Services

Figure 5: Functional Building Blocks of Weblogs

Figure 6: Amazon’s 5-Star Scale Example

Figure 7: Amazon`s Review Text Example

Figure 8: Hyperplane in a Binary Classification Problem

Figure 9: Corresponding Compressed Row Vectors

List of Tables

Table 1: Selected Studies on Stock Market Prediction

Table 2: Selected Studies on Sales Volume Prediction

Table 3: Selected Studies on Box Office Revenue Prediction

1 Introduction

Social media have undergone a huge growth in the past ten years. Consumers are no longer using social media only as sources of information, but they actively create and share content of thoughts, opinions, and experiences, which is also known as user-generated content (UGC). Opinions, thoughts and experiences are usually stated numerically in form of ratings (such as star-scales) or as textual content in form of product reviews, in forums, or weblogs. All of this content has the power to continuously affect trends in all areas, from technologies and politics to entertainment and lifestyle (Bothos, Apostolou, and Mentzas, 2010 p. 50). In fact, Liu et al (2007, p. 607) argue that UGC can influence product sales, which is why the understanding of opinions and sentiments expressed online is very important.

Given the unstructured and huge amount of UGC, which is created online day by day, quantifying the information it contains is difficult. Innovative methodologies have been developed by a number of organizations, ranging from finance to information science, to filter and structure UGC in order to better understand consumer behavior or predict certain human-related events. Prediction techniques are based on various automatic text sentiment detection tools such as machine learning techniques or linguistic approaches, and an increasing number of research has been devoted to examining prediction using social media based on online sentiments. For instance, Das and Chen (2007) tried to predict stock returns by gathering stock message board data, and Tirunillai and Tellis (2012) focused on online product reviews to predict the same. Further, frequently explored prediction subjects are sales ranks (e.g., Chevalier and Mayzlin 2006; Moe and Trusov 2011) and box office revenues (e.g., Dellarocas, Zhang, and Awad; Liu 2006). In the field of marketing, most research has been conducted on social media and UGC literature.

This thesis contributes to the current research in several ways. Based on an extensive literature review, this thesis seeks to shed light on the unstructured vastness of research dealing with sentiments in UGC and its predictive power on real-world outcomes. The aim of this thesis is to structure, classify, and compare existing research. Identifying and comparing different approaches of sentiment identification and classification used in past and current research seems to be overdue. Literature that captures the full picture of different approaches of prediction based on online sentiments is not up to date.

As a contribution to further research in the field of prediction using sentiments from social media, this thesis aims to answer the following three specific research questions. First, what are the different prediction and sentiment detection approaches and techniques based on UGC sentiments? As there are plenty of different approaches using miscellaneous techniques, such as the abovementioned machine learning techniques or linguistic approaches, this question aims to support a structured overview of sentiment detection, which is essential for the prediction of real-world outcomes. Second, how consistent are the prediction results based on online sentiments? This question focuses on a deeper analysis of research results and their differences. Third, do fine-grained sentiments generate new insights and better prediction results than coarse sentiments do? The significance of this question lies in that fact that simple classification of review texts into either positive or negative ones may not provide a comprehensive measurement of online sentiments, as Liu et al. (2007, p. 607) argued.

2 Structure of Thesis

After introducing the need of automatic prediction using online sentiments in Chapter 3, this thesis first discusses the theoretical background of UGC in order to answer the first research question. In this context, Chapter 4 includes the introduction of specific designs of UGC platforms categorized according to the seven functional building blocks of Kietzmann et al. (2011), an introduction into Online Word-of-Mouth (OWOM) and its different forms, and, lastly, a discussion of relevant types of online sentiments. Before answering the second research question, different sentiment classification techniques from the field of machine learning techniques as well as linguistic approaches are described to complete the theoretical background. With regard to existent research literature on prediction using online sentiments, Duan, Gu, and Whinston (2008, p. 1008) and Dhar and Chang (2009, p. 301) identified mixed results in past research studies. This thesis takes account of this finding by answering the second research question in Chapter 5. To do so, past and current research literature is classified into their prediction subjects – stock market prediction, sales volume prediction, and box office revenue prediction. For each identified prediction category, the used predictive sources and prediction methods and findings are discussed. Furthermore, consistencies and limitations within and among the analyzed literature are identified. Finally, Chapter 6 investigates possible improvements by applying fine-grained online sentiments instead of coarse sentiments. For this purpose, empirical literature results using fine-grained online sentiments and theory-based literature on fine-grained sentiment classification is analyzed. Chapter 7 provides a conclusion and Chapter 8 some managerial implications.

Figure 1: Research Framework

illustration not visible in this excerpt

3 The Need of Automated Prediction Using Online Sentiments

The boom of social media and with it the growth in UGC opens up the possibility of access to opinions, feelings, ideas, or user preferences stated online. This vast amount of information is mostly generated voluntarily and is freely accessible. Prediction based on human agents using social media, especially experts, seems to be more precise in some cases (Pang, Lee, and Vaithyanathan 2002, p. 79), but there are clear advantages for predicting the future using IT-based techniques to analyze and process social media data and online sentiments.

One reason is that humans tend to overvalue small probabilities and undervalue high probabilities, as stated in psychology and economics literature (Wolfers and Zitzewitz 2004, p. 117). This disparity may lead to poor and biased personal predictions of future events. Moreover, as Wolfers and Zitzewitz (2004, p. 118) point out, desires and interests influence people’s decisions and therefore they do not judge objectively. Using automated prediction methods avoids such behavioral biases and leads to an objective prediction (Bothos, Apostolou, and Mentzas 2010, p. 56). Another advantage is the cost-efficiency of prediction using IT systems (Godes and Mayzlin 2004, p. 548, 558). For example, reading every single online review and handling the information contained therein manually would be a very costly and time-consuming task with the additional risk of interpretational mistakes because of human-based biases (Chevalier and Mayzlin 2006, p. 348). Automatic prediction can also handle greater volumes of data on the one hand and process them more quickly on the other hand (Mishne and de Rijke 2006, p. 1). Finally, researchers have shown that automatically generated prediction can outperform human-produced predictions (Pang, Lee, and Vaithyanathan 2002, p. 83; Zhang and Varadarajan 2006, p. 51). Researchers have recognized these advantages, which have led to increased research and effort to improve automated prediction using UGC.

4 What are the Different Prediction and Sentiment Detection Approaches and Techniques based on User-Generated-Content?

In order to align the research framework introduced here, this chapter examines the different prediction approaches and techniques and gives an introduction to the technical background of online sentiments, their appearance and forms using a top-down approach.

Due to the fact that online sentiments emerge through UGC, at first fundamental background knowledge on UGC and its technical concept will be presented. Second, OWOM, a special form of UGC, will be explained and characterized. Thirdly, different forms of online sentiments that appear within OWOM will be illustrated. Finally, technical and mathematical sentiment classification and detection tools will be introduced.

4.1 User Generated Content and its Technical Background

Many researchers have developed miscellaneous approaches to automatically predict real-world outcomes using UGC. Relevant to these techniques is a sound footing of available online data that is created by online users. There are three distinct patterns that define UGC according to an OECD (2007) study. First, UGC must be published over the Internet and be available to the public. E-mails and instant messages are not publicly available, which is why they are not classified as UGC. Second, the content has to comprehend some creative effort. This avoids replication of existing content. Third, UGC has to be created by non-professionals with a non-professional intention. (OECD 2007, p. 9) The last condition excludes content with a commercial purpose. In summary, UGC is understood as online published content created by private users with a non-commercial intention.

The following chapter will introduce the basic concepts of Web 2.0 and social media applications, their characteristics as well as the different forms of online data they contain.

4.1.1 Social Media vs. Web 2.0

The development of Web 2.0 has led to a flood of data while providing the technical basis for various platforms to communicate, to express, or to rate subjects online. Web 2.0 defines applications and services that use the World Wide Web to enable UGC without the need to download any software. Compared to the first generation (Web 1.0), the content of Web 2.0 is mainly created by its users (Kaplan and Haenlein 2010, p. 61). This includes writing texts, evaluating and commenting on other articles, posting and sorting pictures and videos, or creating and fostering a social network (Alpar, Blaschke and Keßler 2007, p. 4-7).

To distinguish the characteristics of social media and the seemingly interchangeable related concept of Web 2.0 is a challenging task. Social media and Web 2.0 are very similar in nature, which causes confusion among researchers and managers (Kaplan and Haenlein 2010, p. 60). To draw a line between the blurred boundaries, the coherent definitions of Kietzmann et al. (2011) and Kaplan and Haenlein (2010), that are based on the historic development of each term, are used. As stated above, Web 2.0 comprises mainly the development of the technical aspect. Hence, Web 2.0 can be seen as an ideological and technical platform for social media, while UGC is created and exchanged on social media (Kaplan and Haenlein 2010, p. 61). According to Kietzmann et al. (2011), social media consist of seven functional blocks: identity, conversations, sharing, presence, relationships, reputation, and groups (Kietzmann et al. 2011, p. 243-248). Figure 2 illustrates the seven blocks of social media where each block represents a specific facet of social media.

Figure 2: Seven Functional Building Blocks of Social Media

illustration not visible in this excerpt

According to Kietzmann et al. (2011, p. 243)

Due to the difference in social media applications, not all seven blocks have to be present at the same time in social media activity, as the examples in the following subchapters will demonstrate.

The identity block corresponds to the degree users reveal their identity within a social media application. This may include information about name, age, and gender as well as personal behavioral information like hobbies or interests (Kietzmann et al. 2011, p. 243). Users can share identity-related information intentionally or unintentionally in form of subjective information. This information can be contained in online postings or comments expressing the user’s thoughts and feelings (Kaplan and Haenlein 2011, p. 62).

The degree of communication of users among each other is represented by the conversation block. Each social media application has a different focus on communication and user interaction. Users can express, talk, discuss, and connect online. Due to the fact that communication data may contain individual opinions, thoughts, and feelings, the rich data of online conversation is of high relevance to researchers and firms (Kietzmann et al. 2011, p. 244). Either to use conversation data to gain new insights into customer needs and behavior or to predict real-world outcomes, conversation data is an extensive and valuable source.

The third block, sharing, represents the exchanging, distributing, and receiving of online content in a social media application. Depending on the platform, users can share movies, pictures, music, or texts in order to build relationships with each other (Kietzmann et al. 2011, p. 245).

Presence refers to the extent of information about a user’s accessibility. The information ranges from knowing if another user is online or available in the virtual world to detailed location information where a user actually stays in the real world. The presence block therefore closes the gap between online and real world due to its possible connectivity on the move (Kietzmann et al. 2011, p. 245).

In connection with the conversation block, Kaplan and Haenlein (2010) explain that a higher level of social presence is likely to make online conversations more influential, which shows a direct connection between different blocks of social media.

Building relationships is the fifth block of social media. It refers to the way and intensity in which users connect and relate to other users online. Relating to other users can be built on common interests or associations and will lead to conversation, sharing objects of sociality, or following each other online. Depending on the social media platform and the specific value of identity, online relationships can be more or less intensive (Kietzmann et al. 2011, p. 246).

Another block is reputation, which refers to the standing of each user in a social media setting. Considering the standing of other users as well as one’s own standing, different technical methods exist to create reputation. For instance, click rates, star rankings, or likes and dislikes can give information about a user’s trustworthiness and reputation (Kietzmann et al. 2011, p. 247).

The last functional building block of social media that Kietzmann et al. (2011) define is groups. Groups represent the extent to which users can form subgroups or communities within a social media setting. Groups can exist as closed groups with restricted access or as publicly accessible groups. Furthermore, individuals can group their online contacts to control the online content that is shared within a social media setting (p. 247-248).

The seven blocks of social media explain and define in a very concrete manner the diversity of a social media setting. Subsequently, specific social media applications or platforms will be introduced and characterized.

4.1.2 Online Community

Current research literature has established plenty of definitions to specify online communities, due to their varying social and technical structures, with more or less fuzzy boundaries. To be able to differentiate clearly between the following social media applications, a precise definition of online communities will be used. According to Preece (2000, p. 10), online communities consist of a socially interacting group of people coming together for a shared purpose online. Their members often share common interests, values, and characteristics and they keep to agreed rules concerning the community membership. Hence, they are governed by the communities’ individual norms and policies. With a deeper focus on the communication process, Bagozzi and Dholakia (2002, p. 3) define online communities as “mediated social spaces […] that allow groups to form and be sustained primarily through ongoing communication processes.” Thus, members of online communities have the opportunity to constantly access and comment on the opinions of socially relevant peers (Miller, Fabian, and Lin 2009, p. 305). Moreover, online communities underlie a constantly dynamic process and evolve and change over time (De Souza and Preece 2004, p. 580).

Online communities focus on interest-related topics such as technical, social, or economic interests, ranging from expert knowledge forums to shared interests Web sites (Mühlenbeck and Skibicki 2007, p. 15-18). Furthermore, the wide range of online forums includes file-sharing communities and consumer communities. Within file-sharing communities, users are able to upload and download media data such as movies, music, and pictures (e.g., Flickr and YouTube). Other users are able to comment on and rate the uploaded content, which reflects the community character (Walsh, Kilian, and Hass 2011, p. 11). Consumer communities serve as exchange platforms for consumer insights on particular products or services (e.g., Ciao and E-pinion). Community members can express their experiences and recommend or advise against products or services (Walsh, Kilian, and Hass 2011, p. 11).

To bring the understanding of online communities, within this thesis, in line with the seven functional building blocks of social media, sharing, conversations, groups, and reputation are the individual blocks that characterize online communities most.

Figure 3: Functional Building Blocks of Online Communities

illustration not visible in this excerpt

According to Kietzmann et al. (2011, p. 248)

4.1.3 Social Networking Service

In general, a social network comprises a group of persons or organizations and the relationships between each of them. They exist in every area such as economics, politics, science, and general public (Bommes and Tacke 2006, p. 34). The members of such a social structure can be from different social entities and range from individual persons, political groups to families or organizations. The connections within these social networks are built on specific relationships, interests, or interactions and are characterized by, for example, information exchange or emotional closeness (Hollstein 2006, p. 14).

With the advent of social media, social networking services (SNSs) applications started to enable users connecting to friends or colleagues by providing features of convenience. Through creating a shareable profile containing personal information such as birthday, hobbies, preferences or photos, videos and audio files, users can express themselves. Furthermore, they can invite friends to get access to their profiles and in some SNSs they are able to send mails and instant messages or post messages on other profiles inside the bounded system of the SNS (Kaplan and Haenlein 2010, p. 63). In comparison with online communities, SNSs allow much more self-expression in form of a personal profile and commonly other users are able to see friends to whom a user is connected. SNSs therefore allow conveying real-life networks online and building new connections based on interests and activities (Ahn et al. 2007, p. 835). SNSs differ with regard to the member target group. LinkedIn or Xing are SNSs focusing on business contacts only. Hence, personal profiles and information in these domains are mainly focused on education, work experiences and interests. Facebook and Google+, in contrast, are leisure networks used for connecting with friends or acquaintances. Accordingly, most personal information on these SNSs is private interest oriented (Cyganski and Hass 2011, p. 83). Besides business- and leisure-oriented SNSs, other SNSs differ according to their user base, interest focus or features such as dating- or classmate networks (Boyd and Ellison 2008, p. 214). The access to a SNS can be open or restricted. In case of restricted access, users need an invitation from another existing user of this SNS. In case of LinkedIn, for example, users need an invitation because LinkedIn is an “invitation only” network.

According to the seven functional building blocks of social media, SNSs comprise mainly of the following elements: relationships, conversations, identity, and reputation.

Figure 4: Functional Building Blocks of Social Networking Services

illustration not visible in this excerpt

According to Kietzmann et al. (2011, p. 248)

4.1.4 Weblog

Open Diary was the first website in the mid-1990s uniting online diary writers; with its foundation, the term “weblog” was introduced, which is now commonly referred to as the shorter expression “blog” (Kaplan and Haenlein 2010, p. 60). The diary characteristic is still typical of today’s blogs. Blogs are online journals which list texts, pictures, videos, or all of them in reverse chronological order (OECD 2007, p. 36; Liu et al. 2007, p. 607). Users (bloggers) can either choose between running a blog on an own server, which requires installation of software (e.g., wordpress), or they can use a blog hosting service, such as myspace.com or livejournal.com, which avoids software issues because the blog application is provided online (OECD 2007, p. 36). The freely accessible software or online application for blogs makes it an easy-to-use tool for publishing texts, pictures, or videos online (Walsh, Kilian, and Hass 2011, p. 10). Due to its diary character, the content of blogs is updated frequently and usually strongly related to the blogger’s personal life or interests (Balog, Mishne, and de Rijke 2006, p. 207). Furthermore, the structure and design of blogs is simple, and readers are able to comment on blog entries (posts), which distinguishes blogs from regular Web pages (Walsh, Kilian, and Hass 2011, p. 11). The topics range from conventional subjects (e.g., holidays, movies, sports, products, food, etc.) to special and detailed issues (Liu et al. 2007, p. 607). Hence, the authors of blog articles often express their moods, thoughts, and feelings in these posts, which leads to a very subjective spread of words. Within the blogosphere, which describes the interconnection of all blogs, bloggers can follow other blogs and link their articles, which creates a co-working atmosphere among bloggers (Walsh, Kilian, and Hass 2011, p. 11).

Today, blogs are not only private diaries anymore, where individual bloggers talk about their lives and experiences. Among the authors are ordinary people as well as professionals and celebrities (Kietzmann et al. 2011, p. 242). Industries have realized the power of blogs, because of its fast spread of words among Internet users. Mainstream media adopts blog content, and some blogs are also able to influence industries because of their strong and huge readership (Walsh, Kilian, and Hass 2011, p. 11).

Along the definition of the seven building blocks of social media, and the definition of weblogs, sharing, conversations, relationships, and reputation are the four characterizing blocks of weblogs.

Figure 5: Functional Building Blocks of Weblogs

illustration not visible in this excerpt

According to Kietzmann et al. (2011, p. 248)

4.1.5 Review Site

Even though review sites cannot be seen as a social media application by itself, they have to be considered in this context, because review sites are a valuable data source to gain online sentiments for predicting real-world outcomes. Review ratings appear in a variety of forms – they either are embedded into commercial Web pages (e.g., Amazon.com, ebay.com) or appear as exclusive review Web sites specialized in professional or user reviews (e.g., Epinions.com, Cnet.com) (Dave, Lawrence, and Penncock 2003, p. 519; Dellarocas 2003, p. 1408). The area of product or service reviews is complex and includes, for example, car-, electronic-, book-, or movie reviews. On commercial Web sites, the review is directly linked to the product that can be purchased online, whereby on solely review Web sites reviews are sorted by product type and products cannot be purchased directly. Online reviews take the form of either a numerical or graphical rating scale (e.g., 5-star rating), a free text, or a combination of both (Luo and Zhang 2011, p. 13; Dave, Lawrence, and Penncock 2003, p. 521). Through online user ratings, users can recommend or advise against products by posting their experiences and opinions, thereby supporting and influencing other users’ purchase decisions (OECD 2007, p. 35-40).

Because review Web sites cannot be seen as a social media application, a definition according to the seven functional building blocks of social media is not reasonable.

4.2 Online Word-of-Mouth

By analyzing the purchase behavior of different household goods, Katz and Lazarsfeld (1955) were the first researchers who found evidence that word-of-mouth (WOM) is the most influential source for consumers to switch a brand, compared to several other means of advertising such as newspapers or radio. WOM is a communication process conveying details about and experiences with a product or service among consumers. Because of the sender’s independence of the market, WOM is considered as more trustworthy, compared to commercial advertisements or sales persons’ consulting service (Brown, Broderick, and Lee 2007, p. 4; Jansen et al. 2009, p. 2169; Liu 2006, p. 74). Using social media applications, WOM is no longer spread only among friends, colleagues, and family members “offline” in a face-to-face manner. Consumers now exchange experiences, recommendations, and knowledge with strangers online by generating OWOM. Review sites, blogs, or social networks, for example, are used to gain and create product or service experiences. Hence, Dhar, and Chang (2009, p. 303) value OWOM generated by consumers as “the truest form of word of mouth .” Furthermore, OWOM allows exchanging information anonymously or confidentially, which is why it is difficult to control. Thus, OWOM is very important for corporations and organizations with regard to brand management (Jansen et al. 2009, p. 2169).

In the following, different forms of OWOM and its technical appearance will be introduced and explained.

4.2.1 Appearance of Online Word-of-Mouth

OWOM appears in different forms based on different social media applications on the Internet. Users are able to share their product or service experience in a graphical way or can give advice to other consumers in form of a free text. But not only product experiences are shared online. Trading advice for stock traders can be found online within social media applications as well as cooking or car maintaining instructions.

In the following, the most common, and in the upcoming analysis of empirical studies used, OWOM sources are described to give a basic understanding of today’s OWOM appearance.

4.2.1.1 Scale Rating

Scale rating gives online users the opportunity to rate a specific product or service in an easy, fast, and very short way. In form of a graphical star-scale (e.g., Amazon.com, ebay.com) or numeric scale (e.g., Pitchforkmedia.com), users can value their product or service experiences. Commonly, the scale ranges from 1 to 5 stars, but also 1 to 10 stars can be found online, whereby the less stars are given, the worst is the product or service experience of the consumer. Scale ratings indicate the valence of a review, that is, whether it is positive, negative, or neutral, and they can be interpreted by users easily and without much effort (Dhar and Chang 2009, p. 303; Chevalier and Mayzlin 2006, p. 346; Moe and Trusov 2011, p. 445-446). Figure 6 shows a typical 5-star scale of Amazon.com.

Figure 6: Amazon’s 5-Star Scale Example

illustration not visible in this excerpt

Screenshot of 5-star scale on www.amazon.com[1] (Assessed: August 1st 2012)

4.2.1.2 Tweets

In contrast to scale ratings, where only a positive, negative, or neutral opinion can be expressed, Twitter, the most popular online microblogging application, allows users to express their thoughts, feelings, and opinions with short comments (tweets). Tweets have a limited length of 140 characters, including hyperlinks, and are sent to connected friends (followers) and the public via instant messages, Web, cell phones, or e-mail through a microblogging service like Twitter (Jansen et al. 2009, p. 2170). As a special type of weblogs, microblogs also allow users to create profiles and share interests, thoughts, and feelings in form of tweets. Furthermore, users can connect with other users, celebrities, or companies by following their messages or news postings and communicating with them (retweet) (Zhang, Fuehres, and Gloor 2011, p. 55). The significance of tweets for business, marketing, and research purposes is suggested by the number of tweets that are posted each day. By June 2011, 200 million tweets were posted each day on Twitter containing personal thoughts, feelings, opinions, and emotions as well as professionally created content with a commercial purpose (Twitter 2011). The brevity of the messages, in particular, leads to a higher posting frequency compared with blog posts. Furthermore, the high availability of posts and the flexibility of posting from almost anywhere makes Twitter unique among all OWOM sources (Jansen et al. 2009, p. 2170). However, as Go, Bhayani, and Huang (2009, p. 2) found, especially this flexibility leads to more misspellings and usage of slang compared with other OWOM sources. Furthermore, they identified an average tweet length of 14 words or 78 characters.

The election of the German Federal President is a good example of the attention that is paid to tweets and their high flexibility and availability. During the election in 2010, a member of the German Parliament posted the final result on Twitter before the German Parliament announced it to the public officially. Within seconds, media stations spread the news they received via Twitter without making use of their usual news sources.

Also researchers from different disciplines are paying more and more attention to Twitter and its tweets as a valuable source of data (Jansen et al. 2009; Bollen, Mao, and Zeng 2010; Zhang, Fuehres, and Gloor 2011).

4.2.1.3 Review Texts

Review texts appear in a growing number within different forms of social media applications as well as Web sites (Dellarocas 2003, p. 1408; Tang, Tan, and Cheng 2009, p. 10761). Web pages like epinions.com or consumerreports.org focus solely on consumer reviews ranging from electronics, cars, and appliances to travel and music reviews. Amazon.com, the famous online retailer, combines its products with reviews written by consumers. Online review texts reflect experiences, thoughts, and advice on products or services written by the customers. According to Moe and Trusov (2011, p. 444), online reviews may help exchange experiences and facilitate purchase decisions for undecided consumers and they can be seen as a “sales assistant”. As mentioned above, review texts can appear as combination with a scale rating or solely as text. Two review formats can be found online. First, a structured format where users have to describe pros and cons of a product separately, followed by a summary, which is requested on cnet.com and epinions.com. Second, consumers can formulate review texts without any restrictions regarding text length or guidelines for content, which is used by Amazon.com (Liu, Hu, and Cheng 2005, p. 343). Thus, they can express their opinions, feelings, thoughts, and experience in depth. Furthermore, in some cases, pictures and videos can be uploaded to supplement the free text (e.g., Amazon.com). In some cases, review texts can be evaluated in form of comments (e.g., epinions.com) or ratings (e.g., Amazon.com) by the readership. Due to a vast number of online reviews, the evaluation of reviews allows a fast and uncomplicated search for useful and qualitative reviews. Review texts are a very strong source for OWOM with a big influence on other users. Figure 7 shows a typical review text taken from Amazon.com.

Figure 7: Amazon`s Review Text Example

illustration not visible in this excerpt

Screenshot of a review text on www.amazon.com[2] (Assessed: August 1st 2012)

4.2.1.4 Blog Posts

Mishne and Glance (2005, p. 155) describe blog posts as the “voice of the public” because of their comprehensive subjects and discussions that can include “a wide range of opinions and commentary about products.” Blogs therefore represent the public opinion of millions of customers, as the following example shows (Liu et al. 2007, p. 607). In August 2005, the famous US blogger Jeff Jarvis posted on his blog buzzmachine.com his disappointment with the product quality and customer service of US-based computer hardware manufacturer Dell. His post was shared and spread through the Web like a bushfire and received over 700 comments.[3] This negative OWOM was especially bad advertisement for Dell and shows the power of OWOM (Mishne and Glance 2005, p. 155).

In a more general way, blog posts can include OWOM, but they are not especially focused on sharing experiences and giving advice like review texts or review sites. Blogs much more focus on special interests and comprehend general issues regarding these interests. Still, due to the expression of the bloggers’ opinions, blog posts are a valuable source for OWOM (Liu et al. 2007, p. 607).

4.2.2 Forms of Online Sentiments

Classifying the huge amount of data, which is generated by online users every day, is a challenging and nearly impossible task. Classification or clustering methods based on subjects or keywords neglect the expressed sentiments within OWOM (Feng et al. 2011, p. 281). The above-introduced sources of OWOM unify the existence of sentiments reflected by each post, tweet, or article due to stated thoughts and feelings (Mishne and Glance 2005, p. 155). Sentiments can express the “overall opinion towards the subject matter – for example whether a product review is positive or negative.” (Pang, Lee, and Vaithyanathan 2002, p. 79) In addition to sentiment classification in terms of a positive, negative, or neutral attitude, researches even try to identify and classify attitudes in a finer-grained manner using emotions such as worried, happy, or anxious. In which form and to what extent sentiments appear online will be illustrated in the following. At this juncture, we focus on the forms that are used most within sentiment classification literature.

4.2.2.1 Volume

Although volume is not a sentiment, it is used in the context of sentiment-mining regularly. In this context, volume relates to the appearance frequency of a specific rating or post and shows the degree of attention a product or service receives (Tirunillai and Tellis 2012, p. 202). In case of ratings or reviews for example, volume counts the number of ratings or postings which contain the same sentiment or opinion (Moe and Trusov 2011, p. 445; Duan, Gu, and Whinston, p. 1008), or, as Liu (2006, p. 75) states: “Volume measures the total amount of WOM interactions.” Even though other measures exist, such as dispersion, intensity, or duration, only volume will be introduced here because it is a more important measure compared with the other ones (Liu 2006, p. 76). In line with the title of this thesis, volume can be seen as a coarse measurement.

4.2.2.2 Valence

Besides volume, valence is one of the most important measures of OWOM denoting whether an OWOM message is positive, negative, or neutral (Liu 2006, p. 75). Since valence can be expressed through opinions, feelings, and thoughts, different methods and techniques exist to identify valence within OWOM. Those techniques will be introduced in the following subchapter. The easiest and fastest way to extract the valence from OWOM is numerical or graphical ratings, such as 5-star ratings (Chevalier and Mayzlin 2006, p. 345; Forman, Ghose, and Wiesenfeld 2008, p. 293). Because valence measures are either positive, negative, or neutral, valence cannot measure the intensity. Therefore, within this thesis, valence is understood as a coarse measurement of online sentiment.

[...]


[1] http://www.Amazon.com/Kindle-eReader-eBook-Reader-e-Reader-Special-Offers/dp/B0051QVESA/ref=sr_tr_sr_1?ie=UTF8&qid=1344162228&sr=8-1&keywords=kindle

[2] http://www.Amazon.com/Samsung-UN46EH6000-46-Inch-1080p-HDTV/product-reviews/B0071O4EKU/ref=sr_1_cc_1_cm_cr_acr_txt?ie=UTF8&showViewpoints=1

[3] www.buzzmachine.com/2005/08/17/dear-mr-dell/

Excerpt out of 80 pages

Details

Title
From Valence to Emotions
Subtitle
How Coarse versus Fine-Grained Online Sentiment Can Predict Real-World Outcomes
College
University of Cologne  (Lehrstuhl für Handel und Kundenmanagement)
Course
Business economics
Grade
1,7
Author
Year
2012
Pages
80
Catalog Number
V215495
ISBN (eBook)
9783656440918
ISBN (Book)
9783656443261
File size
1243 KB
Language
English
Tags
from, valence, emotions, coarse, fine-grained, online, sentiment, predict, real-world, outcomes
Quote paper
Robert Kohtes (Author), 2012, From Valence to Emotions, Munich, GRIN Verlag, https://www.grin.com/document/215495

Comments

  • No comments yet.
Read the ebook
Title: From Valence to Emotions



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free