This study covers the influence of positive and negative titles of YouTube videos on user behavior regarding views, likes, dislikes and comments. For this purpose, daily records of the top trending YouTube videos in Germany were analyzed. It was found that positive terms have positive influence on liking and viewing trend videos. Negative terms have influence on disliking and commenting. Furthermore, it was examined which words were used most frequently in successful and not so successful trend videos. This study shows that YouTube is being utilized for the consumption of entertainment series, music videos and sports content. In addition, videos with Turkish titles make up a significant part of the best placed YouTube videos in Germany. These results were obtained via chi-squared tests and word clouds.
YouTube is a platform for uploading and consuming video content. In addition, these contents can be shared, liked and disliked. Founded in 2005, YouTube is currently the third most visited website in the world. Videos can be uploaded from amateurs, companies, media agencies and other people. Above all, the increasing use of mobile devices with more data volume has led to increased video consumption. 
The video portal is used by one billion active consumers every month. On average, 14- to 49-year-olds in Germany used YouTube for eleven minutes per day in the fourth quarter of 2016. The total proportion of YouTube users in Germany in 2016 was 69 percent. More than half of the young people consumed YouTube in the form of music videos several times a week in the same year. Let's play videos are also very popular among this age group. 
YouTube offers the possibility to access trend videos directly via a tab, without having to do a special search. These include not only the mentioned music and let's play videos, whose success is more likely due to the high demand, but also viral videos that are not expected. It tries to address a wide class of consumers. The videos are not dependent on previous usage, but the same trend list is presented to all users of a country. The ranking of the trend videos is updated every 15 minutes. The higher a video is placed, the better is the position. Google, parent company of YouTube, names the following four criteria which determine which videos end up in the trends:
- Number of views
- Non-misleading titles regarding content
- Content which shows what is currently relevant in the world
- Novelty or a surprise effect
Moreover, no YouTube content creator is preferred in the selection of the trend videos. Payments for a placement are also not possible. 
The reason that motivates users to consume videos on YouTube and to actively participate in the community is a phenomenon that needs to be analyzed. Therefore in this paper, the usage behavior of YouTube viewers is analyzed and evaluated, similar to a sentiment analysis. To this end, we examined the titles of YouTube trend videos. For example, companies can use these evaluations to their advantage from the point of view of the marketing department. One approach is to determine the relationship between positive and negative terms and the success of a video in the trends. In addition, the frequency of individual words relative to each other is also considered and evaluated. The goal is the successful placement of YouTube videos based on the video title.
The relevance of this objective results from the potential increase in efficiency of a company's YouTube marketing, in addition to the psychological aspect. As a result one can also draw conclusions on politically motivated slogans. This is particularly relevant for video producers who are trying to maximize the number of views. The research question is therefore: Are there certain terms that contribute that a video is particularly successful in the trends? Successful in this work means that one trend video stands out from the others in terms of the number of views, likes, dislikes and comments.
This paper is thus organized as follows. Section 2 describes the state of the art of current research in this field. In section 3 it is described which methodology is used for achieving the goals of this paper. The research results are then described in Section 4 while a discussion of the findings is carried out in Section 5. Section 6 concludes this paper with a short summary and avenues for future research.
2. Literature review
A sentiment analysis examines subjective information from the language. Examples for this information are attitudes and opinions. For this, different methods can be used.  The beginning of sentiment analysis can be traced back to the period after World War II and was politically influenced. Over the turn to the twenty-first century, the popularity of this method increased greatly. 99% of all written papers regarding sentiment analysis have been published after 2004. The number of papers regarding sentiment analysis was till 2017 nearly 7000. 
Furthermore there is already research in the areas of popularity prediction on Youtube, e.g. Figueiredo, Flavio: On the prediction of popularity of trends and hits for user generated videos  or Figueiredo, Flavio et al.: The tube over time: Characterizing popularity growth of YouTube videos.  However, this usually deals with questions such as how popularity evolves over time and how to predict future popularity of individual videos, or perhaps how to characterize the growth patterns of video popularity. On the other hand, there are hardly any papers dealing with influences that contribute to videos becoming so successful that they get listed in the YouTube trends.
3.1. Identifying successful topics
In this paper various methods are used to identify successful topics in trend videos: firstly, chi-squared tests are carried out to check whether there is a significant correlation between the success of a video and the polarity of the terms in the video titles so that recommendations can be derived for the creators of videos. Secondly, multiple word clouds, visual representations of the most prominent terms in text data, are created to determine the most common words in the titles of successful and non-successful videos so that interesting trends can be recognized. These methods are described in more detail below.
First, the chi-squared tests. The following datasets are to be used:
- a daily record of the top trending YouTube videos with different attributes and
- a data set containing words bearing positive and negative connotations
As preparatory work, these datasets are modified by pre-processing (more detailed description follows in section 3.3.). These are then combined into a joint data set in order to be able to filter for those videos that are successful or unsuccessful, and at the same time contain positive or negative terms in their titles.
The videos defined as successful are those that have a higher quantity in one category than the arithmetic mean of all videos in the data set in the respective category. The following categories are considered:
- Number of views of a video (attribute views)
- Number of positive ratings of a video (attribute likes)
- Number of negative ratings of a video (attribute dislikes)
- Number of comments on a video (attribute comment_count)
Videos with numerous dislikes are also considered successful, as a high number of negative ratings indicates a great deal of attention that the video has generated.
Thus, for carrying out the chi-squared tests, four 2x2 contingency tables are created, each of which containing the following characteristics:
- Polarity of a video with the two levels:
Video contains at least one positive term in the title and Video contains at least one negative term in the title (see Table 1-4)
- Success of a video with the two levels:
Video is successful and Video is not successful
Then the expected frequencies of both attributes, i.e. (row total x column total)/total are determined, and based on this, the non-Yates-corrected chi-squared and p-values are calculated in order to find correlations between positive or negative words and the success of a trend video.
Next, for the identification and illustration of important terms in the video titles, several word clouds are created. To this end, the column views, which indicates the number of views of a video, is used to divide the data set into four quartiles. The reason for the selected partitioning is the density distribution of all views, as shown below:
Abbildung in dieser Leseprobe nicht enthalten
Figure 1 . Density plot of attribute views (interval 0 to 1,000,000)
The abscissa shows the number of total views up to a maximum of 1,000,000 within the data set, along the ordinate the density of the views to each other is shown. It is obvious that this is a right-skew distribution, which can be explained by the fact that most of the trend videos have a number of views smaller than 200,000. This means that there is an unequal distribution of the data. For this reason, the videos in the word clouds are divided according to their quartiles (Q.25, Q.50 a.k.a. median and Q.75) and not according to the arithmetic mean.
Finally, the 100 most common terms of the lower quarter of the data set (views < Q.25) as well as the upper quarter (views > Q.75) are determined. These terms are then plotted in two different word clouds according to their quartiles so that trends in the videos can be determined.
3.2. Data sets
The first data set used in this paper is a daily record of the top trending YouTube videos and includes several months of data on daily trending videos uploaded on YouTube. Data is included for five the countries: United States, Great Britain, Germany, Canada and France. Each country maintains their own trend list with up to 200 listed trending videos per day.
Each region’s data is stored in a separate CSV (comma-separated values) file. For every video the data set includes the attributes video title, channel title, publish time, tags, views, likes, dislikes, video description and comment count. The data set was collected through usage of the YouTube API (Application Programming Interface). 
The data set was made available on Kaggle.com, a platform for predictive modelling and analytics competitions where data sets are regularly uploaded and made public for data science purposes. The data set is regularly updated with more recent trend videos. For the purposes of this paper, version 71 of the German data set was used, which contains the German trend videos from November 14, 2017 to May 3, 2018. All in all, the data set contains approximately 34,000 trend videos.
The data set was released under CC0: Public Domain, which means it is allowed to copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. 
Furthermore, in order to find out which of the words in the video titles contain positive and negative connotations, the data set called SentimentWortschatz, in the following abbreviated to SentiWS, was used.
SentiWS, can be freely downloaded on Kaggle.com as well, and is a publicly available German-language resource that can be used for sentiment analysis or opinion mining, among other purposes.
SentiWS contains two lists of words: one includes those bearing positive and the other one those bearing negative polarity. Every word is weighted within the interval of [-1; 1] and is annotated with their part of speech tag, and, if applicable, their inflections, e.g. “Kompliment”, “Komplimente”, “Kompliments”, “Komplimentes” and “Komplimenten” (engl. “compliment”).
The current version of the data set comprises 1,650 positive and 1,818 negative words. In total including inflections the total number encompasses 15,649 positive and 15,632 negative word forms. SentiWS is organised in two UTF(Unicode Transformation Format)-8 encoded text files and was last updated in March 2012. 
SentiWS was first published in the following paper: R. Remus, U. Quasthoff & G. Heyer: SentiWS - a Publicly Available German-language Resource for Sentiment Analysis. In: Proceedings of the 7th International Language Ressources and Evaluation (LREC'10), 2010. 
Finally, the data set is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.  This means that the data set can be remixed, transformed and built upon under the condition that appropriate credit is given to the author and it is not used for commercial purposes .
3.3. Data preparation
Several steps are carried out to prepare the two data sets for the subsequent analysis. First of all, it is necessary to check that both data sets are encodes in the UTF-8 format, so that special characters such as ß and umlauts such as ä, ö, ü which are characteristic of German are recognized correctly when reading the data. For this purpose, the freely downloadable program Notepad++ was used. Another important point in the preparation of the data is the conversion of words from the former German spelling into the new spelling, e.g. “Entschluß” (old) to “Entschluss” (new).
After the two data sets have each been read into the statistical computing program RStudio, any numbers and blank lines that still occur are removed from the data set. Afterwards, punctuation marks are removed and all words are converted to lowercase letters. This is used to determine the frequency of the same words without taking upper and lower case as an exclusion criterion. An important step is also to remove stopwords from German and English from the data set, i.e. words such as “der”, “die”, “das” and “the”, from which no relevant insights can be drawn.
4.1. Chi-squared tests
Now the results of the chi-squared tests described above are to be presented. For those videos where success was defined as a high number of views, the test showed the following results:
Table 1. Chi-squared test for attribute views
Abbildung in dieser Leseprobe nicht enthalten
Of the videos with at least one positive term in the title, 677 are successful and 6176 are unsuccessful. Of the videos with at least one negative term in the title, 286 are successful and 3484 are unsuccessful.
On the other hand, those videos with at least one positive term in their titles were expected to have 621 successful ones and 6232 unsuccessful ones. For the videos with at least one negative term in their titles, 342 successful ones and 3428 unsuccessful ones were expected. In this case, χ² is 15,231 and the p-value is below 0.1%, so there is a significant correlation.
For those videos where success was defined as a high number of likes, the chi-squared test showed the following results:
Table 2. Chi-squared test for attribute likes
Abbildung in dieser Leseprobe nicht enthalten
Of the videos with at least one positive term in the title, 444 are successful and 6409 are unsuccessful. Of the videos with at least one negative term in the title, 321 are successful and 3449 are unsuccessful.
In contrast, those videos with at least one positive term in their titles were expected to have 494 successful ones and 6359 unsuccessful ones. For the videos with at least one negative term in their titles, 271 successful ones and 3499 unsuccessful ones were expected. In this case, χ² is 14.778 and the p-value is below 0.1%, so there is also a significant correlation.
For those videos where success was defined as a high number of dislikes, the chi-squared test showed the following results:
Table 3. Chi-squared test for attribute dislikes
Abbildung in dieser Leseprobe nicht enthalten
Of the videos with at least one positive term in the title, 348 are successful and 6505 are unsuccessful. Of the videos with at least one negative term in the title, 307 are successful and 3463 are unsuccessful.
However, those videos with at least one positive term in their titles were expected to have 423 successful ones and 6430 unsuccessful ones. For the videos with at least one negative term in their titles, 232 successful ones and 3538 unsuccessful ones were expected. In this case, χ² is 34.966 and the p-value is below 0.1%, so there is a much more significant correlation.
- Quote paper
- Robert Komorowsky (Author), 2018, Analysis of Terms Which Contribute to the Success of YouTube Trend Videos, Munich, GRIN Verlag, https://www.grin.com/document/437786