Gratis online lesen

## ##Introduction

Roughly a year ago the US population elected the billionaire Donald Trump as their President. Therefore, Trump became one the most powerful men on the planet. He is well known for a lot of scandals and provocative statements but also for his frequent use of the social media platform Twitter. Many journalists discuss if the online activities of Donald Trump may either harm or strengthen the US economy and its markets.

Trump's term of office was accompanied by many controversies: North Korea, fake news, Muslim ban, exchange of people in important political positions are only a few keywords from subjects which are broadly discussed. Furthermore, regarding the statement of the foreign minister of North Korea the Twitter posts of Donald Trump are a declaration of war. It can be discussed whether his social media activity can be seen as an official presidential statement or not.

Without a doubt Donald Trump has a certain influence on world affairs. In this context, we tried to analyze if there is any relationship between Donald Trump's Twitter posts and indicators of the US Economy.

##Tidy Data and Text Mining

####Trump Twitter Posts

First, we searched for all posts of Donald Trump since the election day, November 8. 2016. This date was chosen because becoming a president-elect normally boosts the impact of a person on society drastically. As a next step we consulted a database, where all our desired post were included, even those which have been deleted after posting. The database displayed amongst others time, retweets, favorites and the wording of each post. To make use of this data, we used text mining methods provided by the `tidytext` as well as the `tm` package. By combining the methods of these packages, we were able to optimally process our data.

After counting single word occurrences, we created a word cloud using the `wordcloud` package and plotted a bar chart that shows what words Donald Trump used most since the election day.

Furthermore, we looked at word combinations, namely bigrams, trigrams and quadgrams (combinations of two, three and four words) and created bar charts that show those combinations that were used most frequently.

```{r, include=FALSE }

library("tidyverse")

library("tidytext")

library("gutenbergr")

library("wordcloud")

# load data ---------------------------------------------------------------

load("tidywords.Rdata")

load("Plots/LMT_plot1.RData")

t_tweet = as.POSIXct("2016-12-22 17:26:00") #untersuchter Tweet

# Creation of Wordcloud ---------------------------------------------------

set.seed(1001)

```{r, echo=FALSE }

# par(mfrow=c(1,2))

wordcloud(words = tidywords$word, freq = tidywords$freq, min.freq = 12,

max.words=80, random.order=FALSE, rot.per=0.35,

colors=brewer.pal(8, "Dark2"),scale = c(0.7,0.7))

tidywords %>%

filter(freq >= 100) %>%

mutate(word = reorder(word, freq)) %>%

ggplot(aes(word, freq)) +

geom_col() +

xlab(NULL) +

ylab("frequency") +

theme_bw() +

labs(title = "Word count of Trump tweets (n > 100)") +

coord_flip()

####Trump Tweets - Predictor Variables

After only looking at the text Trump has written during the past year, we started preparing our predictor data matrix. To the quantified variables that were already included (`favorite_count`, `retweet_count`) a column was added to count the number of posts. Additionally, for every tweet number of words, exclamation- and question marks, full stops and characters were counted. At the end we used our text mining data and created a special text variable containing the count of most frequently used words per post. The variables listed we named as `TT_summary` dataset.

After processing the data we were able to execute an aggregation for an arbitrary time horizon. Since we expected to use daily rates, we decided as a starting point to aggregate the code on a daily level, creating a "per day" predictor matrix.

####Economic Data - Response Variables

As variables, we chose the Standard & Poor's 500 [SP500] and the Dow Jones Industrial Average [DJ] as internal stock indices to represent the US economy. For each index, we looked at the daily value, volume, and intraday range as well as each growth rate. Those are the most common stock indices of the US economy and differ by the number of companies included. As external variables, we took a closer look at the USD/JPY exchange rate, the USD trade-weighted exchange rate as well as its corresponding growth rate. The JPY is known for its safe haven status. Therefore, we hope to eliminate result-dilutions caused by exchange rate volatility.

##Linear Forecaster

Initially, we decided that due to the present structure of the dependent and independent variables, forecasting with multiple linear regression models may supply first results. We used the SP500 and the DJ as our first dependent variables. We called it `D1`.

Surprisingly, after few tests, some significant p-values were found while observing the regression outcomes. On the other hand, the f-statistics only provided values close to one, which was a first indication that under these circumstances there might not be a significant relationship between the variables from the tweets and the US economic data. While performing a forward selection of the variables, it was observed that there is no variable that adds a real value to the model or improved the R²- value strongly. The resulting R²-values ranged between 0.03 and 0.09 and the adjusted R²-values were partly negative. These results show that our dataset might have no explanatory power. Thus, we decided to build a forecaster with all variables, based on the intention that the risk of potential 'overfitting' might not be big if we consider the corresponding R²-values.

````{r, include=FALSE }

load("MSE_Model1_D1.Rdata")

load("MSE_Model1_D2.Rdata")

library(knitr)

load("PLOT1.Rdata")

library(corrplot)

library(Hmisc)

library("scales")

library(png)

````

````{r, fig.show='hold',echo= FALSE }

kable(MSE_Model1_D1)

````

By looking at the mean squared error [MSE], the training error and testing error, it can be seen that their absolute values are substantially large, keeping in mind that we scaled the data to a mean of zero and a variance of one. This observation went hand in hand with our assumptions. The fact that testing error sometimes is smaller than training error, highlights that our forecaster doesn’t produce valuable predictions.

We did the same analysis with the exchange rates dataset as the dependent variable, calling it `D2. As before the f-statistics did not highlight any significance. But performing the forward selection with only two variables (`retweet_count` and `favorite_count`) resulted in better p-values and f-statistics. The R²-value was still small but at least the adjusted R²-value was not negative. These observations were only better for the real exchange rates, not the growth Rates. Therefore we fitted the model with only two variables. In general, although this measure lightly improved our statistical results they remained sparse.

````{r, include=FALSE }

load("PLOT1.Rdata")

````

````{r, echo=FALSE }

kable(MSE_Model1_D2)

````

As we saw from observing MSE our presumably best performing model still scored a relatively high testing error. The presumably worse performing models became much better in testing, but we are still on a low level of predictive power. We used the same model on the same dataset but multiplied each variable of the Twitter Posts with the number of retweets. Retweets can be considered as a value to determine the range of posts. In one case we do get slightly better f-statistics and even the R²-Value reaches 0.15. In this case we only used three variables. But again, this variable performed the worst in predicting the testing sample. Concerning the exchange rate sample, the values didn’t change a lot.

As we wanted to further observe our forecaster on regularities, we calculated the squared error of each variable for each day. To reduce the random hits and mutual neutralization, we then added the squared error of the growth rate to the squared error of the corresponding economic variables instead of multiplying the plane values. Afterwards, we built a correlation matrix to examine for regularities in the forecaster's performance.

```{r, out.width = "200px",out.height="200px",echo = FALSE,align="central"}

knitr::include_graphics("CORMAT_D1.png")

The correlation matrix eliminates correlations that do not have a p-Value below 0.01. There is a pattern in prediction errors between both models and both indices in value, volume, and range. We take a closer look at the best and worst days of predicting the above-mentioned values. Our forecaster performs well when there is not much deviation from the trend in the stock market and a regular use of Twitter by Trump. Our forecaster often doesn’t perform well when either stock markets or the Twitter usage move out of their regular ranges.

It comes clear that although some covariates like "words per day" and "characters per day" from our Twitter variables received relatively high regression coefficients for describing growth rates and volatility of the Dow Jones, they achieved insignificant p-values. Therefore their forecasting ability is limited.

##Linear Forecaster with Lasso Regression

####Lasso Using Daily Rates

As our `TT_summary` dataset did not provide us with any significant results, we decided to enhance our predictor matrix, adding words and word combinations to it. To be able to evaluate the importance of single words and word combinations, we bound our lists of occurrences of monograms, bigrams, trigrams and quadgrams per tweet to the predictor matrix, resulting in a bit more than 26'000 variables that could be checked for relevance. This entity of variables was named `TT_all`.

Of course, we were not able to perform a regression analysis with over 26 000 predictors and the same number of dependent variables, therefore we needed to use a machine learning based predictor selection algorithm. We decided to use a so-called Lasso regression, which uses lambda, a penalty / tuning parameter to prevent the model from overfitting. In addition, Lasso calculates coefficients that assign every predictor a level of relevance. Therefore, the algorithm is very useful in an environment of many predictors (“large-p-problem”) because, in contrast to the ridge regression, Lasso turns coefficients to zero and not only close to zero. In this model, we performed cross-validation to find the lambda where the MSE is minimized. We used this lambda value to create a Lasso forecaster which chooses the predictor and coefficients. We did this analysis for both economic datasets using daily rates, as well as using our "per day" predictor matrix mentioned above in the text mining chapter.

```{r, out.width = "450px",out.height="200px",echo = FALSE ,align="central"}

knitr::include_graphics("Plots/gSP_VA.png")

Looking at the results, even though the MSE improved in some cases, there were no significant improvements detectable. Examining the economic variable for which the smallest MSE was realized, it was recognizable that all coefficients were set to zero and only the intercept was used to predict the economic data. This means that the Lasso set the importance of all our Trump variables to zero.

Examining other economic variables whose predictions returned higher MSE's, it was seen that the Lasso regression returned coefficients that were not zero. However, these were extremely close to zero and did not perform well at all in the testing.

As we visually investigated the variables whose coefficients haven't been set to zero, we couldn't recognize any patterns, as they seemed to be chosen quite "randomly" and did not make sense.

````{r, include=FALSE }

load("DT_S_1.Rdata")

load("prunetree.Rdata")

library(ISLR)

library(tree)

````

````{r, echo=FALSE }

kable(DT_S_1)

````

It can be seen that our model is not always better than just guessing and even our best performing decision tree does not seem to give a reasonable structure.

````{r, echo=FALSE }

plot(prune.tree)

text(prune.tree)

````

We applied the same decision tree method for the more detailed Trump Dataset, but our results do not differ significantly. So, we do not show any more figures of these. We decided to stop using more methods on the decision because we didn’t expect major improvement under the circumstances.

##Individual Analysis

Eventually, we wanted to look for specific cases where we could display some patterns. But even by looking at posts which included North Korea or a global risk, no daily pattern could be found.

##Linear Forecaster with Lasso Regression Intraday

####Lasso using Intraday Rates

As explained in the preceding paragraphs, our predictions were not successful using daily economic data as dependent variables. To elaborate whether the characteristics of Donald Trump’s tweets had an impact on the stock market, we then tried to use intraday economic data.

We formulated the hypothesis that the effects of a Trump tweet could be seen in the following hour after that tweet was posted. We picked this short interval because, as before mentioned, the potential amount of noise and influence of potential control variables that haven’t been considered is nearly unmanageable.

Using SP500 growth rates from the months of September and October (those were the only data available for download), we picked a dependent variable that gives a general image of condition of an economy, without being too volatile due to specific influences.

Furthermore, we wanted to apply the law of large numbers. The idea was to realise a big number of “one-minute stock-market observations” at a given point in time after Trump tweeted. We then used the Lasso regression to recognize patterns of market behaviour in these one-minute intervals. We tried to describe the patterns with our quantitative variables that we gained from the tweet that Trump published. We again tried to determine which variables have the biggest influence on market development.

To implement our findings, we built a loop of sixty Lasso functions, one for each minute. We treated the SP500 growth rates for every minute after a Tweet as sixty dependent variables. One of the main problems regarding this method were the different time and frequencies in which our Twitter and our S&P data was embedded. For example, Trump tweets posted in the evening didn’t have any “economic-variable-counterpart”. We tried to solve this problem of few common denominators by just using dates and time where trump tweeted while there was an actual stock price for this exact minute. This method of matching tweets with stock-intraday-rates by time has the advantage that it omits all the posts that our outside of the opening hours of the stock exchange market to get a clear market reaction. On the other hand, we neglected the tweets which did not match an exact publication of a stock market price on the time axis. This displays a huge loss of potentially valuable data. As this meant a trade-off where we preferred accuracy over quantity our observation set got relatively small. It can be said that our analysis developed form having a large p problem to a small n problem.

Regarding the results of our research the Lasso regression we got better MSE’s compared to our daily rates Lasso. But still, the prediction results leave much to be desired. There still is no significant pattern in the chosen coefficients. For each of our dependent variables (minutes) the coefficients that were not set to zero were mostly different and furthermore extremely close to zero.

Based on those results we examined the MSE of all testing tweets, resp. we now considered all single tweets as independent variables, in order to see if there were tweets where our predictions are more suitable than in others. But also there the results left much to be desired. Even though some predictions seemed to be better, in general they were highly volatile.

We can conclude in stating that a higher number of observations may have brought a better result. Unfortunately we weren’t able to get the intraday data of the whole time period we were looking at.

```{r, out.width = "450px",out.height="200px",echo = FALSE ,align="central"}

knitr::include_graphics("Plots/plot_lasso1.png")

```{r, out.width = "450px",out.height="200px",echo = FALSE ,align="central"}

knitr::include_graphics("Plots/Plot_LASSO2.png")

````{r, include=FALSE }

load("MSE_lasso.Rdata")

````

````{r, echo=FALSE }

kable(Solution2)

````

## Individual Analysis Intraday

After having found no evidence for a significant relationship of Donald Trump's tweets and the U.S. stock market, we tackled observing a single event. Unlike former Presidents of the United States, Donald Trump often addresses tweets to certain companies or industry sectors directly. The tweets of Trump can be seen more a personal note rather than an official statement, might be controversial and have a large influence on the markets. The impact of President Trump's tweets on the U.S. equities market is a frequently debated topic throughout the financial community. Therefore, we tried to examine the influence of certain tweets on companies’ stock prices, based on high-frequency data of the New York Stock Exchange (NYSE).

### Lockheed Martin

Lockheed Martin is a global security and aerospace company. In December 2016, Trump addressed various tweets towards the aerospace industry. On 22 December 2016 at 17:26, Trump tweeted the following statement: "Based on the tremendous cost and cost overruns of the Lockheed Martin F-35, I have asked Boeing to price-out a comparable F-18 Super Hornet!".

```{r echo = FALSE, warning=FALSE, fig.aligm}

ggplot(Dplot, aes(DateTime, rt)) +

geom_point(color = "blue", cex = 1, size = 0.5) +

labs(x = "", y = "Return (in %)",

title = "Returns of Lockheed Martin over one day",

subtitle = "(measured per minute, 22 December 2016)") +

# caption = "Data source: Wharton Research Data Services") +

geom_vline(xintercept = t_tweet, lty = "dashed") +

scale_x_datetime(breaks = date_breaks("1 hour"),

labels = date_format("%H:%M", tz = "GMT-1")) +

theme_bw()

We can see in the figure above that the returns of Lockheed Martin show a volatile behavior with large increases and decreases just after the presidential tweet. The shares initially dropped -1.39% five minutes after the tweet which equals an estimated value of USD 1.2 billion and reached its minimum 1 hour after the tweet. Apparently, Lockheed Martin is strongly exposed to short-term pricing volatility created by Trump tweets. In this example, we could perfectly scrutinize that Trump tweets do in fact affect the U.S. equity market.

###New York Times

The New York Times (NYT) has been routinely criticized by the U.S. president for its coverage of him. However, the stock price of NYT even rose significantly after harsh tweets of Trump. After analyzing what might me the reason, we noticed that the NYT released an excellent quarterly statement in that very moment.

As we can see there are many more influences that effect markets. Even though Donald Trump is one of the most powerful men he may still not be able to impact markets significantly on his own, using short statements on a social media platform. But as we proved in the last part of our research, there can be exceptions.

##Conclusion and Limitations

In a nutshell, we used various techniques to find a relationship between U.S. economic indicators and Twitter posts of Donald Trump since the election day. We initiated by looking at daily economic data, continued by analyzing intraday rates breaking them down to the minutes after a Twitter post and finished by observing specific cases.

Under our given circumstances we could neither find any general significant relationships nor a general way how Donald Trump influences the market environment with his posts. As we see from our individual analysis there are situations where Donald Trump has a certain influence on stock market prices. But as we also learned there are many other factors that need to be taken into account, for example the release of financial statements. Because of that, it might be possible that Donald Trump actually influences markets regularly, but his posts may only adapt trends slightly. Within we outline the limits of our analysis. Even though we looked at each word of his post and our artificial twitter variables, our model neither understands the meaning of a post nor can look at the bigger picture. Another factor is that Twitter is just a social media platform with 330 million users and not a first-choice financial news ticker. In this context, it would be interesting to look at tweets that were quoted, for example by Bloomberg. Our model neglected the qualitative analysis and interpretation of Trump’s twitter tweet contents. In addition, it may also be the way other people understand those posts which may vary widely between recipients

Another factor is that Donald Trump might be one of the most powerful people on this planet, but he still is only a person and it’s highly unlikely that the whole economy depends on a single individual. When taking a closer look at our models and our data we have to admit that we only used posts that were made on days when the markets were actually open. For the intraday rates we just used posts that were posted during the main trading hours. For further research, we would prefer single individual analysis on posts that also have been included in financial tickers.

- Arbeit zitieren
- Tristan Breyer (Autor)Florian Huber (Autor)Sebastian Kuhn (Autor)Julian Zürcher (Autor), 2017, Donald Trump. The Twitter-President and the US-Economy, München, GRIN Verlag, https://www.grin.com/document/384525

Kommentare