Free online reading

## CONTENTS

ACKNOWLEDGEMENT

ABBREVIATIONS

LIST OF TABLE

ABSTRACT

CHAPTER 1 INTRODUCTION

1.1. Background

1.2. Rationales

1.3. Research Objectives and Research Questions

1.3.1. Research objectives

1.3.2. Research questions

1.4. Significance of the study

1.5. Scope and limitation of the study

1.6. Organization of the study

CHAPTER 2 LITERATURE REVIEW

2.1. Stock Market Return

2.2. Financial ratios, macroeconomic variables and hypotheses

2.3. Empirical researches

2.3.1. Empirical researches on stock returns and stock price movements

2.3.2. Empirical researches using Linear Regression and Logistic Regression

CHAPTER 3 METHODOLOGY

3.1. Data Collection

3.2. Ratios and variables explanation

3.3. Stock performance

3.4. Sample selection

3.5. Tests used for two models

3.5.1. Linear Regression

3.5.2. Logistic Regression

3.6. Detecting the multicollinearity

3.7. Validation for the model

CHAPTER 4 DATA ANALYSIS AND DISCUSSION

4.1. Descriptive statistics

4.2. Linear Regression Model

4.2.1. Discussions for the general model

4.2.2. Discussions for each industry

4.3. Pearson Correlation Coefficient

4.4. Logistic Regression Model

4.4.1. Discussion for the general model

4.4.1.1. Summary of cases and coding of Dependent Variable

4.4.1.2. Selection and evaluation of dependent variables

4.4.1.3. Evaluating the Goodness-of-fit of the model

4.4.1.4. Evaluating classification table

4.4.1.5. Validation of model

4.4.2. Discussion for each industry

4.4.2.1. Logistic Regression Model

4.4.2.2. Classification Table

4.4.2.3. Validation of model

4.5. Findings and Implications

4.5.1. General Model

4.5.2. Industrials, Basic Materials and Consumer Goods

CHAPTER 5 CONCLUSION

5.1. Conclusion

5.2. Recommendation

REFERENCES

APPENDIX

## ACKNOWLEDGEMENT

First of all, I would like to express how lucky I am to be a student of International University – Vietnam National University Ho Chi Minh City. Thanks to the lessons from our beloved lecturers in School of Business Administration, I earn a strong base of comprehension and skills in order to go above and beyond the thesis.

I am grateful to send my gratitude to my advisor Mr. Phan Trường Quốc, who has always provided me the precious knowledge with the greatest support whenever I need. He has been willing to bring out good solutions for most of my questions not to mention the excellent recommendations. Besides my advisor, I would like to give special thanks to all members of my thesis proposal and thesis committee for their objective and insightful comments. Moreover, I am very grateful to all my teachers at International University who have given me valuable experiences to support me in this thesis.

It would be my honor to send my sincerest appreciation to my best friend who always tries his best to answer my queries and brace me up any time I need. Moreover, I likewise want to send my thank you note to all of my International University seniors and friends that help me unconditionally.

Last but not least, my dearest thanks go to my beloved family and my boyfriend. They are the ones who spend the most difficult time with me and give me their best encouragement. Therefore, I would like to thank them for having trust in me and supporting me during my research as well as my life.

## ABBREVIATIONS

Ho Chi Minh Stock Exchange - HOSE

Capital Asset Pricing Model - CAPM

TAT – Total Asset Turnover

RENA – Retained Earnings/Net Asset

EPS – Earnings Per Share

ROA – Return on Asset

ROE – Return on Equity

RIR – Real Interest Rate

INF – Inflation Rate

VIF - Variance inflation factors

## LIST OF TABLE

Table 1: Summary of independent variables

Table 2: Descriptive Statistics

Table 3: ANOVA

Table 4: Coefficients

Table 5: Model Summary

Table 6: Summary of linear regression models of three sectors

Table 7: Pearson Correlation Coefficient

Table 8: Case Processing Summary

Table 9: Dependent Variable Encoding

Table 10: Variables in the Equation

Table 11: Variables not in the Equation

Table 12: Variables in the Equation

Table 13: Variables not in Equation

Table 14: Model if Term Removed

Table 15: Model Summary

Table 16: Hosmer and Lemeshow Test

Table 17: Classification Table

Table 18: Classification table

Table 19: Summary of Variables in Equation of three sectors

Table 20: Summary of Classification table

Table 21: Summary of Validation table for model

## ABSTRACT

Predictability of financial and macroeconomic variables is an interesting topic for a lot of previous studies. This study is also one of them which carried out to examine the relationship between stock excess returns and seven financial ratios as well as macroeconomic variables. Only the listed companies on Ho Chi Minh Stock Exchange from three industries – Industrials, Basic Materials and Consumer Goods – that are researched within the Vietnamese stock market during the period 2013-2017. The result brings out over 62% of the posibility to predict stock excess returns of the chosen factors which are Total asset turnover, Earning per share and Return on equity. In contract, the relationship between macroeconomic factors and stock excess return is barely observed so they are excluded from the final model.

This thesis uses quantitative method with Linear Regression and Logistic Regression as the main models and stepwise method is applied to run the models. Furthermore, there is also a great support of CAPM to classify the GOOD and POOR group of stocks. Besides, the association of Microsoft Excel and SPSS is applied throughout the process of collecting and analyzing data.

## CHAPTER 1

## INTRODUCTION

### 1.1. Background

In the context of economy modernization, stock market is perceived as the place of occurrence where the trading activities of medium and long-term securities. Ho Chi Minh stock exchange (HOSE) officially operated in July 20th, 2000, and started its first trading in July 28th, 2000. Subsequently, the operations of Vietnam securities depository center was officially launched in May 2006. Within the current modern economy, medium and long-term stock seem to take place in the Vietnamese stock market. The Vietnamese investors still did not know much about "stock market" while this term was a hot topic in other countries. At first, the trading volume was not so high which accounted for around 55000 stocks every day.

To be more specific, from the Vietnam securities market, these trades are still managed by the administration of the Ministry of Finance and the State Securities Commission. Ho Chi Minh stock exchange (HSX) is considered as a trading area for relatively large corporations' stock, which is called blue-chip stock. Regarding the development history, in 2005, VN Index increased significantly and had the total value of about 40,000 billion VND which accounted for 0.69% GDP of the country at that time. In the of 2006, while VN-Index increased 2.5 times compared to the beginning of the year, Ho Chi Minh trading market climbed to 144. In August 2007, HOSE was officially launched after years of growing under the name of HASTC. However, from the period of 2007-2009, Vietnam stock went down noticeably as the economy went into bubble burst with high inflation rate of up to 22% in 2008. After that, Vietnam securities market keeps its unstable trend. In the recent year 2017, Vietnamese stock market performed excellently when it hit many records that including the excess of 45% for VN-Index.

### 1.2. Rationales

With the purpose focusing on Vietnam stock market, this research attempt to construct the model from not only connection between stock market returns but also chosen inner and outside elements in Vietnam stock trade. Also, supporting the investors in perceiving some economic variables and financial ratios on which they should focus while putting resources into stocks and making investment decisions is what to be expected in this research. Therefore, the more comprehension of the market, the more risk investors can stay away from. Just as importantly, this research not only wants to discover the outcome that could illuminate the earlier researches on finding but also contribute the outcome to other later studies.

This study concentrates on Vietnam stock market and will try to build the model from relationship between stock market returns and selected internal as well as external factors in Vietnam stock exchange. This study is also expected to help for investors to recognize some economic variables and financial ratios that they should concentrate while investing in stocks and making investment decisions. Thus, the more understanding of the market, the more risk investors can be avoided. Last but not least, this study hopes to find the result that could enlighten the prior studies on finding and provide the result for other later studies.

### 1.3. Research Objectives and Research Questions

#### 1.3.1. Research objectives

The main objective of the research is to create a model for predicting stock with excess return in Vietnam, especially in HOSE.

To make it clearer, Linear Regression and Logistic Regression is mainly applied with the combination of financial ratios and macroeconomics variables so as to develop a model which divides stocks into two types by CAPM: Good or Poor. Besides, the characteristics of the Good group and Poor group are also examined in order to check the significant difference between them. Furthermore, Linear Regression and especially Logistic Regression model will be researched and tested to see how many of the chosen factors could be used to predict stock returns and assisted with stock selection.

#### 1.3.2. Research questions

Based on the objectives this study is expected to gain, the main question is raised as the following:

Do financial ratios and macroeconomic variables impact negatively or positively on stock excess returns?

### 1.4. Significance of the study

In the stock exchange, predicting stock returns is very important for not only the firms but also the investors. Regarding the managers they are able to control more effectively their portfolios with this research as it is helpful in raising awareness of the invested stocks, therefore, managers can enhance the profit from financial market and their own business. On the other hand, this model is a very useful tool for investors to construct their trading strategies through the movement of stock price before making further decisions. Hence, this prediction model could be applied in other stock exchanges with a few adjustments to some extent and support investors come up with an ideal investment and trading decision.

Furthermore, there were not much researches that study about both financial ratios and macroeconomic variables with Linear Regression and Logistic Regression model. Hence, this paper may contribute by pointing out the possible relationships between the selected factors that can be useful for further study in the future.

In the stock exchange, predicting stock returns is very important for not only the firms but also the investors. Regarding the managers they are able to control more effectively their portfolios with this research as it is helpful in raising awareness of the invested stocks, therefore, managers can enhance the profit from financial market and their own business. On the other hand, this model is a very useful tool for investors to construct their trading strategies through the movement of stock price before making further decisions. Hence, this prediction model could be applied in other stock exchanges with a few adjustments to some extent and support investors come up with an ideal investment and trading decision.

### 1.5. Scope and limitation of the study

This empirical study examines the stock returns reaction to mostly internal condition of ratio analysis while other aspects kinds of announcements or one-time events are not considered as factors which influence the stock price. In that case, sometimes stock return may change because of not only the chosen factors but also other effect. Another key point is that the scale of the study is researched within the Ho Chi Minh Stock Exchange only, thus VN-Index is chosen to be the benchmark of markets returns in this study. Using the secondary data collection method, the time horizon of this study is 5 years from 2013 to 2017 due to the limitation of time. Consequently, the chosen companies in HOSE should have entered 5 years.

Since this study addresses within the Ho Chi Minh Stock Exchange - The largest stock market which includes a large number of companies with the high degree of liquidity, it does not take into account other companies listed on Ha Noi stock exchange or Up-Com market. Moreover, the reliability of the research might be reduced due to the usage of only secondary data which may contain errors despite of being audited.

Due to the limitation of time and condition, the sample size, which just focus on 60 companies listed in HOSE, may not be an ideal representative and data is collected in Ho Chi Minh Stock Exchange only. Hence, the limitation probably leads to non- generalized conclusion for all stock market in Vietnam and for to a broader context.

### 1.6. Organization of the study

This paper is divided into five main parts:

In the first part, the introduction, the thesis will describe mostly about research objectives and research questions as well as scope and limitation of the study.

Secondly, literature review in the following chapter is going to review some previous empirical study and theoretical literatures relating to stock returns prediction. Moreover, initial definitions of vital key terms such as stock market, stock price performance, Linear Regression and Logistic Regression are also mentioned.

In Chapter 3, the methodology utilized will be presented alongside the data processing. In particular, this section provides specific data on the design of research methods and sampling techniques.

What presented in chapter 4 would be the regression results with data analysis and conclusion. In this part, statistical tests help qualify the validity and reliability of the model, trailed by a data analysis and results of the tested model which can lead to the answer for the research questions and hypotheses.

At last, the conclusions and recommendations in chapter 5 will be given based on the research questions and limitations of the research.

## CHAPTER 2

## LITERATURE REVIEW

### 2.1. Stock Market Return

The most elusive and important goal in trading is the properly to predict stock market returns, which means knowing how good or bad the profit that investors gain.Accordin to Bilha (2012) and Odera (2005), stock market return is the performace of overall market that is based on listed companies in many industries as well as the index of stock market. Furthermore, various researchers have used the stock market index to decide the stock market return due to its influence on the market prices. (Zhang, 2009; Quan and Titman, 1996). For Vietnamese stock exchange, VN-INDEX represents stock return and works as a benchmark for investors to observe the performance of stock market as well as their own portfolios. Besides, stock price movement can be predicted by technical analysis based on historical data of VN-INDEX.

The adjustment of market price likewise according to the economic environment in case the expectations of investors change. In market place, stock return demonstrates the trends in price changes. In particular, the stock returns might respond to any new information that indicates change in economic environment. Besides macroeconomic variables, stock return can also be determined by financial information from financial reports which estimates risk and return of an investment (Y. Amadi & W. Amadi, 2014).

### 2.2. Financial ratios, macroeconomic variables and hypotheses

**Table 1: Summary of independent variables**

Abbildung in dieser Leseprobe nicht enthalten

Financial statement such as cash flow or balance sheets has always been considered as one of important factors to predict stock returns and movements. More specifically, in several countries during 2002, investors believed that it was right to make decisions about the stock market based on accounting disclosures and financial ratios (McKinsey Institute). According to Essays, UK. ( 2013), researcher applied Simple Linear Regression and found out the positive the relations between retained earnings and share price. So with this relationship, we may predict that the retained earnings/net assets may also have the positive relationship with each other. Even though a relationship was figured out between EPS and the stock price, while the EPS was increasing, the share price was not increasing that much but they still have the positive relationship with each other (Islam, M., Khan, T. R., Choudhury, T. T., & Adnan, A. M., 2014). In addition, results from a paper of Maryyam Anwaar (2016) showed that ROA had got significant positive impact on stock returns while earnings per share has got significant negative impact on stock returns.

Some outstanding papers, such as those belong to Vinh Phuc (2013) or Dutta, Bandopadhyay and Sengupta (2012), researched the financial ratios such cash earning per shares, return on equity, return on asset, etc. They got good results when about a half of the selected factors in their research illustrated to be influential with stock returns prediction. To be more specific, the Total Asset Turnover was positively related to the stock performance (Dutta, Bandopadhyay and Sengupta 2012) and Vinh Phuc (2013) ROE has the opposite sign with the stock performance by contrast. Thus, investors can use their financial knowledge, and many different things efficiently to make investment decision in Ho Chi Minh Stock Exchange Market. Overall, financial ratios can be known as a valuable tool to identify the trend of development and measure the progress against predetermined internal objectives, a specificc competitor, or the general industry.

How macroeconomic factors affect the stock return is likewise a hot topic in academic. Analyistics around the world chose various macroeconomic factors, investigating period and methodology with the same goal of examining this relationship in different countries. So far, along with the above theories, a numerous of studies were conducted to identify the impact of macroeconomic factors on stock return in both developed and developing countries.

Factors related to some macroeconomic variables likewise have also proven to be able to explain the variation in stock returns. Md. Mahmudul Alam (2009) received successful result that did not reject the theoretical argument of negative relationship between stock price and prevailing interest rate. Besides Hsing (2004) adopted a structural VAR model that allows for the simultaneous determination of several

variables including real interest rate found that there was an inverse relationship between stock prices and interest rate. Dr Ahmed Uwubanmwen & Igbinovia L. Eghosa (2015) had the findings that that inflation rate had a negative, but weak effect on stock returns in Nigeria which means rising inflation tends to affect negatively on the stock returns.

Therefore, the hypotheses here would be:

- H1: The Total Asset Turnover ratio has positive impact on the stock excess returns.

- H2: The Retained Earnings/Net Asset ratio has positive impact on the stock excess returns.

- H3: The Earnings Per Share ratio has positive impact on the stock excess returns.

- H4: The Return On Assets ratio has positive impact on the stock excess returns.

- H5: The Return On Equity ratio has positive impact on the stock excess returns.

- H6: The Real Interest Rate has negative impact on the stock excess returns.

- H7: The Inflation Rate has negative impact on the stock excess returns.

### 2.3. Empirical researches

#### 2.3.1. Empirical researches on stock returns and stock price movements

For past few decades, the study on stocks execution is one of the most prevalent topics over the world. In order to determine the correlation between stock returns with several variables, quite a good few studies have been carried out by numerous financial experts along with researchers. For the U.S. stock exchange, Fama and French (1988), Campbell and Shiller (1988), McMillan, David, Wohar and Mark (2013) distinguished strong verification of consistency of stock returns when utilizing variables and ratios, for example, earning yield, cash flow yield, bond-equity yield, dividend yield, price-earnings and so on… Additionally, a strong supportive connection between international equity flows and contemporaneous U.S. stock returns according to the results of in-sample and out-of-sample predictability are demonstrated by the exploration of Hartmann and Pierdzioch (2007). The outcome was affirmed when latest investigations carried out by Campbell and Yogo (2006) and Lewellen (2004) which applying more vigorous econometric techniques still discover evidence of predictability.

In 2010, by applying two neural: multi-layer perceptron (MLP) and generalized regression, Mostafa completed an examination on forecasting the movements of the closing price on the Kuwait stock exchange. The consequences demonstrate that the importance of neuro-computational models are valuable apparatuses in forecasting stock exchange movements in emerging markets. Furthermore, because of its robustness and flexibility, the researcher anticipated that they would outperform traditional statistical techniques in forecasting stock exchanges’ price movements. Plus, with the information gathered from Istanbul Stock Trade, the artificial neural networks in forecasting stock market index movement are put in an application by Karymshakov and Abdykaparov (2012). The outcome is that artificial neural networks do give high level of accurately estimated signs.

Applying the single variable predictive regression framework of Lwellen (2004) and Campell and Yogo (2006), the capacity of dividend yield is inspected by Aono and Iwaisako (2010) to forecast Japanese aggregate stock returns. In their later paper, the predictive potential of another common financial proportions, to be specific, the price- earnings ratios are analyzed, for the way that a few researches utilizing US data realize that smoothed market price-earnings ratios have preferred determining capability over dividend yields. Following the philosophy spearheaded by Robert Shiller (1989, 2005), Aono and Iwaisako precisely established Japanese price- earnings ratios and inspected their ability to predict aggregate stock returns. The outcome demonstrates a consistent preference of the predictive ability of the price dividend ratio over the price-earnings ratio.

In conclusion, many studies by different researchers and business analysts have demonstrated the predictability of financial ratios in different market. During the process of finding evidence for stock return along with price movement, an extensive variety of financial ratios including prominent ratios like dividend yield, price-to- earnings ratios to other a few less mainstream ratios have been used. In different aspects, in order to help clarify\ the connection between these ratios and stock price performance and their results vary from nation to nation, numerous strategies and models were put into effect.

#### 2.3.2. Empirical researches using Linear Regression and Logistic Regression

Linear regressions are common in the finance literature. For instance, they have been used to test whether past prices, financial ratios, interest rates, and a variety of other macroeconomic variables can predict stock returns. This section reviews the properties of predictive regressions, borrowing liberally from Stambaugh (1986, 1999). Moreover, Campbell and Yogo (2006) developed a new inference methodology within the linear regression framework of Stambaugh (1999) and they found that the predictive power of the dividend yield is considerably weakened but that the predictive power of the short rate is robust.

In a another aspect, John Y. Campbell and Samuel B. Thompson (2015) utilized their dataset to get the initial coefficient estimates and then calculated an out-of-sample R2 statistic that can be compared with the usual in-sample R2 statistic. However, what they found like Goyal and Welch (2007), poor results deriving from in-sample as well as out-of sample data for usual linear regression. Consequently, Linear Regression is not likely a perfect model for any prediction and there might have a better method exist.

Calculated relapse (Logistic Regression) is a multivariate analysis model (Lee, 2004) that is useful for foreseeing the presence or absence of a feature or result in view of estimations of an arrangement of predictor variables.

After over 10 years, Logistic Regression was later utilized to create the default prediction model by Ohlson (1980) who distinguishes this presumption of default forecast as an equal payoff state. This study concentrated on the potential of the models to precisely rank defaulted and non-defaulted firms, in light of their default probability. Anticipating financial distress and bankruptcy by Logistic Regression, and was then came after by a few researchers, for example, Zavgren (1985) or Zmijewski (1984).

In order to enhance the precision of financial distress prediction model, Chen (2011) By a similar token, Chen (2011) implemented researches at the Taiwan Stock Exchange Corporation and gathered 100 listed companies as the beginning sample. With 37 ratios comprising financial and other non-financial ratios, the experiential examination utilized crucial factor analysis to remove appropriate variables. Decision tree classification methods and Logistic Regression methods were utilized to carry out financial distress prediction model. The outcome substantiated well the possibility and validity of the proposed strategies for the financial distress prediction of listed companies.

Afterwards, Vinh Phuc (2013), Dutta, Bandopadhyay and Sengupta (2012) likewise conducted researches on stock price performance in stock market using logistic regression. In the paper, it can be observed that three financial ratios - Cash earnings per share, Price/Cash earnings per share and ROE - out of eight considered ratios might classify stocks up to an accuracy level of more than 60% into two categories based on their arithmetic rate of return (Vinh Phuc, 2013). In the research of Dutta, Bandopadhyay and Sengupta (2012), 8 financial ratios of 30 large market capitalization companies over a four-year period are calculated and put into the model to examine their predictability. The result was tested on the group of data used to construct the model and showed a 74.6% level of accuracy.

## CHAPTER 3

## METHODOLOGY

### 3.1. Data Collection

To research and approach the issue, this paper uses quantitative method which focuses mostly on econometric model. As mentioned, companies from three industries on HOSE will be chosen and their financial statements as well as ratios will be brought into analytics from 2013 to 2017. To illustrate more clearly, carefully chosen from reliable sites such as VNDirect, Vietstock or Investing, only the audited annual reports of companies will be used for finding the financial ratios and other type of data such as historical price or news.

A great deal of firms recorded on the HOSE is being listed on Ho Chi Minh Stock Exchange at this moment; however, in this paper, only organizations that belongs to these primary three segments will be noticed: Industrials, Basic Materials and Consumer Goods. The appendix will demonstrate more particularly the names along with stock IDs of the selected companies. Afterwards, the 7 possible financial ratios and macroeconomic variables are gathered to evaluate based on:

- Their popularity in the literature.

- Potential relevancy to the study.

The 7 potential variables includes:

X1 = Total Asset Turnover

X2 = Retained earnings/ Net Assets

X3 = Earnings per share

X4 = Return on assets

X5 = Return on equity

X6 = Interest rate

X7 = Inflation rate

## 3.2. Ratios and variables explanation

Abbildung in dieser Leseprobe nicht enthalten

Total Asset Turnover ratio is calculated by dividing sales by average total assets in a certain period such as 5 years for this paper. This ratio measures proficiency of how a firm uses its assets and generates the sales. A high result of total asset turnover means that company’s total assets is used efficiently. Consequently, investors may expect for the stock price with this ratio as it evaluates the operating productivity of the company.

Abbildung in dieser Leseprobe nicht enthalten

This ratio is an indicator that demonstrates proportion of total assets funded by the retained earnings of a business. It shows how business retains its profits to finance assets rather than dividend payment so as to fund the activities or operations. The firm likely to have higher possibility to grow and so does its stock.

Abbildung in dieser Leseprobe nicht enthalten

Earnings per share is known as a portion of profit allocated to each outstanding share of common stock. It is a popular measure of overall profitability of the company. Higher EPS shows that company has good health and has enough profit to payout more money to its shareholders. Therefore, investors are able to know whether a company is potential or not based on the EPS ratio.

Abbildung in dieser Leseprobe nicht enthalten

ROA, one of the most common ratios that can be used to evaluate the stock, indicates the firm's profit related to its total assets which combines both debt and equity. This kind of ratio likewise reflects the effects of turning assets into income. A high ROA is usually expected as this means the company can gain more cash from less assets.

Abbildung in dieser Leseprobe nicht enthalten

Same as ROA, ROE is also one of the popular ratios used in finance. ROE ratio measures the ability of company in generating profit with the equity like investment from shareholders. ROE is good when it has higher percentage because it means the company uses equity effectively which creates a good balance between using equity and debt to attract investment and expand operations.

**X6: Interest rate**

The interest rate is a standout amongst the most essential factor in macroeconomic variables which impacts the stock returns. Various earlier research discovered that increasing interest rate will lead to a fall in stock prices and investors. And for this study, the real interest rate will be the only one taken into account as it brings out the best result for interest rate after the effect of inflation. Since the rise in interest rate will prompt the cost of borrowing for firm to raise, which will consequence of decreasing the value of stock price and expected of investors on stock market. Accordingly, the interest rate is required to be adversely in association with stock prices.

**X7: Inflation rate**

Besides interest rate, inflation is also notably mentioned as a factor that has strong relationship with stock returns. Inflation rate describes the expansion of general price level of goods and services for a certain period of time. This rate can have either negative or positive impact on stock returns depending on policy from government and strategy of company. Anyhow the investors still believe that inflation contains prediction about the stock returns.

### 3.3. Stock performance

The CAPM equation demonstrates the relationship between the expected return and the risk premium as following:

**E (Ri) = Rf + βi(E(Rm) - Rf)**

Where: E (Ri) is the expected return of stock or portfolio i

E (Rm) is the expected return of the market portfolio

Rf is the risk-free asset return

βi is a measure of systematic risk on stock or portfolio i

The equation points out that the expected return of a stock of portfolio is equivalent to the risk-free asset return plus a risk premium which is determined by the multiplication of a level of systematic risk on this stock of portfolio with the distinction between the normal return of market portfolio and return of none of risk assets.

For carrying out Multi Logistic Regression investigation, some researchers require a technique for grouping the execution of organizations as GOOD and POOR or even Average as there is no certain classification existed. A simple and objective method was conducted: In the event that the estimation of an organization's stock over a given year transcended showcase return, it is named a "GOOD" venture alternative; else, it is named a "POOR" speculation choice. (Upadhyay, A., Bandyopadhyay, G., & Dutta, A., 2012).

Similarly, in this paper, CAPM will be the method that can be applied to calculate the expected returns and classify the performance of stocks accordingly. A positive stock excess return will be coded as “1” which represents the GOOD group and a negative excess return will be coded as “0” that stands for the POOR group. The classification is definitely useful for the Logistic Regression in determining dependent variables.

### 3.4. Sample selection

As mentioned, CAPM can be applied to separate 2 types of stock: Good or Poor by comparing expected returns and actual rate of returns from the closing prices.

It is to assume the yield rate of government bond as risk free rate because the government cannot be solvent and they can print money to pay back. In Vietnam context, we will use the 5-year Vietnam government bond as the risk-free rate. And VNIndex from HOSE is the proxy of market return and there will be 745 observations. But only 596 observations will be chosen as training sample and the rest will be the holdout sample.

The daily fixed close stock prices are collected and the average rate of return of is calculated from collecting rate of return everyday using the following formula:

Abbildung in dieser Leseprobe nicht enthalten

Where:

P(t+1) = stock price at day t+1

P(t) = stock price at day t

The market average rate of return is calculated by using daily VN Index and the below formula:

Abbildung in dieser Leseprobe nicht enthalten

Where:

VN Index(t): VN Index at day t

VN Index(t-1): VN Index at day t-1

For Linear Regression, the stock excess returns for each stock over five years will be used directly as the dependent variable. The actual rate of returns will be calculated with the previously mentioned formula and the expected rate of returns will be calculated with CAPM.

Abbildung in dieser Leseprobe nicht enthalten

Where:

R_{Actual} = Actual rate of returns

R_{Expected} = Expected rate of returns

However, binary outcomes much be used as dependent variables for Logistic Regression. As stated before, the CAPM will help classify the GOOD (1) or POOR (0) for us to modify the dependent variables in this case.

### 3.5. Tests used for two models

#### 3.5.1. Linear Regression

Firstly, the linear regression model will be applied to test for impact of the chosen variables on the stock excess returns.

The model of linear regression can be expressed as below:

Abbildung in dieser Leseprobe nicht enthalten

Where:

Y = the categorical dependent variable

i = 1, 2, …, 7: the independent variables

β0 = the Y intercept of the regression surface

βi = 1, 2, …, 7: the slope of the regression surface

= the error term

One of the important step to conduct hypothesis tests on the regression coefficients obtained in simple linear regression is building the t-test. The null hypothesis (H0 ) assumes that there are no correlations between the independent variables and the dependent variables so equals to zero. On the contrary, the two-tailed alternative hypothesis (H1 ) assumes that the correlations between the independent variables and the dependent one exist so is not equal to zero. Normally, we reject the hypothesis if p-value is less than 0.05 level of significant and vice versa.

- H_{0} : β_{1} = β_{2} =…= β_{7} = 0

- H1 Any of β_{1} , β_{2},…, β_{7}

Apart from t-test, the R Squared (R2) is also very important, particularly for the result of correlations between variables. In our linear regression model, the R2 is expressed in the Model Summary. More than that, we notice adjusted R2 value as it seems to be more reliable in determining the efficiency of the model. The adjusted R2 includes the number of independent variables in its formula whereas the R2 does not. For example, if the adjusted R2 of a model is 0.8, it means that 80% of the dependent variable is explained by the independent ones and the remain percentages are not.

Although linear regression is often used by researchers to predict the future outcomes, in this research, we believe that the logistic regression is more outstanding and appropriate than the linear one. For that reason, three tests of linear regression model for three industries will be built in order to prove that the ability to predict stock excess returns of this type of analysis is not as good as the logistic regression.

#### 3.5.2. Logistic Regression

Logistic regression (Logistic Regression) is used to indicate the statistical relationships between land use type and a set of driving factors (Geoghegan et al., 2001; Serneels and Lambin, 2001). Another definition was given by Tabachnick and Fidell (1996) that Logistic Regression is a kind of Regression Analysis that helps predict a discrete result. Logistic Regression yields coefficients for independent variable based on a sample of data (Huang, Chai and Peng, 2007). To be more specific, in India, Avijan Dutta (2012) used logistic regression (Logistic Regression) and various financial ratios as independent variables to investigate indicators that significantly affect the performance of stocks actively traded on the Indian stock market. Furthermore, logistic regression models with two or more explanatory variables are likewise broadly used in practice (Haines and Others, 2007).

The population logistic regression is given by:

Abbildung in dieser Leseprobe nicht enthalten

Where:

Y= the categorical dependent variable

i = 1, 2, …, k: the independent variables β0: the Y intercept of the regression surface

βi = 1, 2, …, 7: the slope of the regression surface

= the error term

In particular, the major difference between Logistic Regression and other regression models is focused on the equation which reconstructs the model into a probability estimate that can only be described in Logistic Regression. This equation is marginally more complicated than those for multiple regression due to the nonlinear Logistic Regression model.

Abbildung in dieser Leseprobe nicht enthalten

After the estimated probability is calculated, it will be compared to a certain cut the outcome will belong. Usually, the outcome with estimated probability high off value so as to determine which category than the cutoff point will be classified in the first group and the second group will contain outcomes with estimated probability lower than the cutoff point. As mentioned, Logistic Regression is generally free of limitations, with the ability to break down a wide range of predictors and the variety and complexity of data sets that can be analyzed are nearly boundless.

For Logistic Regression, the hypothesis is also conducted the same as in Linear Reression model. With significant value is less than 0.05, the null hypothesis will be rejected which means there is correlation between variables.

- H0 : β_{1} = β_{2} =…= β_{7} = 0

- H_{1} Any of β_{1} , β_{2},…, β_{7}

After that, Forward Selection method, a stepwise method in SPSS, is chosen to select the variables and build up the logistic regression model. Another type named Enter method is put in use of bringing all variables together into the analysis. But in this paper, we believe that Stepwise selection method is the better option because it puts each variable in the model step by step until the best model is obtained. So with this way, the difference in scores of each variable after every new factor added.

This method begins at step 0 with no predictors contained but only the intercept and to decide the variable that has the highest score to start the model. As a result, the step 0 is not a concern for researchers in the whole process. The largest scored factor accompanying with p value that is less than 5% will enter after every step. Apart from the variables in the equation table, there is also the variables not in the equation table to show the factors that are not in the model from the previous step but added later if they match the conditions of having high score statistics and lower than 0.05 signifiance value. The process keeps working until every plausible variables entering the final model and no significant variable left.

Unlike the Linear Regression, in the final outcome, this type of model does not consider R2 as an important factor to determine the significance or performance of the model. The classification table will instead indicates how well model can predict the stock excess returns. This table works as a confusion matrix to illustrate a comparison of the number of GOOD group (1) predicted by the logistic regression model compared to the number actually observed and similarly the number of POOR group (0) predicted by the logistic regression model compared to the number actually observed.

### 3.6. Detecting the multicollinearity

Afterwards, an important step in building a logistic regression model process is to test the multicollinearity amongst independent factors. There might have certain multicollinearity that appears in seven financial as well as economic variables in this paper. To illustrate more clearly, in the regression model, if the independent variables are closely related, the independent variables have a linear relationship, which means if they are strongly correlated with one another will have a multicollinearity. To put it in another way, when the two independent variables are very strongly related to each other, then these variables must be the same but in fact it is split into two variables. Multicollinearity violates the assumption of classical linear regression models that independent variables have no linear relationship. To sum up, having multicollinearity within the model will enlarge the standard errors of the coefficient may create bad effects and it makes the estimates sensitive even when the model has small changes.

For Linear Regression model, Variance inflation factors (VIF) is put into use for testing multicollinearity. VIF measure how much the variance of the estimated regression coefficients are inflated as compared to when the predictor variables are not linearly related. A commonly given rule of thumb is that VIFs of 10 or higher might show that multicollinearity exists.

Besides multicollinearity, we also need to test whether there is a linear relationship between the dependent variable and the independent ones. That is the reason why we use Pearson Correlation Coefficient test for the Logistic Regression model. The Pearson test measures the linear correlation between the two variables and it should be examined before interpreting the results from the logistic regression analysis. To test multicollinearity, the predictors that have high result of correlation with each other will be detected as multiple co\rrelation. Conversely, the higher the correlation of independent variable with the constant, the better potential it is to be a good predictor.

### 3.7. Validation for the model

SPSS Stepwise method does not completely validate the model. The reason is researchers usually divide their sample into training group and holdout group. The first group of sample is used to build up the model and the second one is used to validate the final model. However, in this paper, we divide the sample into 2013 to 2016 group and 2017 group. The whole model will be conducted from the 2013 to 2016 group and the other one is utilized for validation.

After we have the final model, the needed data will be substituted into the equation and then it provides which stock is GOOD (1) or POOR (0) based on the thredshold of 0.5. This result will be conpared to the previous one conducted from CAPM, and if they match each other, it will be coded as 1 and vice versa. With this way, we can calculate the percentage of accurate cases and make a comparison with the result from the Classification Table of Logistic Regression model.

## CHAPTER 4

## DATA ANALYSIS AND DISCUSSION

### 4.1. Descriptive statistics

Financial data of the HOSE listed companies in three industries were collected from 2013 to 2017. The numbers were mainly gathered from VNDirect, Vietstock and Investing. The descriptive statistics of the firms in our sample are given in Table 1.

Descriptive statistics provide some features of data set representing characteristics of the population throughout briefing the sample. Basically descriptive statistic table is very useful with interpreting large number of data in a more meaningful way. The table below shows only main factors of descriptive statistics: Mean in Central tendency, Maximum, Minimum variables and Standard Deviation variance.

**Table 2: Descriptive Statistics**

Abbildung in dieser Leseprobe nicht enthalten

In fact, these numbers are not very impressive for this century. A research of OCED (2011) illustrated that in the period of 1988 and 1997, the Asian listed companies had a high average ROE which is 17.2% compared to only about 8.22% of Vietnamese firms. Furtheremore, they had the lowest asset turnover ratio of around 0.2 that is quite the same with around 0.22 of data set in this research. Lastly, the Asian multinationals likewise had the mean of ROA with 5.6% as low as 6.14% of Vietnamese one.

To be specific, the asset turrnover and retained earnings/net assets may be observed as a good factor with a quite high number. What is more, ROA and ROE are pretty low, so they may have a negative impact on the stock excess returns. EPS can be considered to stay at a somewhat normal number; besides, real interest rate and inflation rate are low which is a good sign in reality, but there has not had enough evidence to show whether they are important variables or not.

### 4.2. Linear Regression Model

#### 4.2.1. Discussions for the general model

Most of the time researchers are able to conduct Linear Regression Model for predicting purposes. Linear Regression assumes that we have certain output value that can be used to fit a “best” line through. However, this kind of model may not be the best option since using catergorical outcomes with dummy variables as 0 and 1 seems to be a better idea. Therefore, in this research, linear regression is still tested to illustrate that its predictable ability can be limitated compared to the logistic one.

First of all, we observe the ANOVA table. The indicators are good and all of significance (p-value) seems to be lower than 5%. So that we can reject the null hypothesis and get the result which is there are differences between the means of variables.

**Table 3: ANOVA**

Abbildung in dieser Leseprobe nicht enthalten

Applying the Stepwise method for Linear Regression model in SPSS, we have three out of seven factors that are selected to be added to the model: EPS, Real Interest Rate, Total Asset Turnover, ROE and Inflation as shown in the table 3.

**Table 4: Coefficients**

Abbildung in dieser Leseprobe nicht enthalten

It can be seen that all of their p-value is below 0.05, which is siginificant. The Stepwise method allows each factors added in the model step by step. After each variance added, the beta of every other factors and the final step will represent the final model. And the largest beta value in the model belongs to total asset turnover, so it can be considered as one of the most potential predictor. What is more, at the VIF (variance inflation factor) collumn, we can see that there is no multicollinearity symptoms as the values of 3 variables only stay about 1 to 3 and that is much smaller than 10.

**Table 5: Model Summary**

Abbildung in dieser Leseprobe nicht enthalten

a. Predictors: (Constant), EPS

b. Predictors: (Constant), EPS, RIR

c. Predictors: (Constant), EPS, RIR, TAT

d. Predictors: (Constant), EPS, RIR, TAT, ROE

e. Predictors: (Constant), EPS, RIR, TAT, ROE, INF

Neverthless, the model summary shows that all of the Adjusted R2 is notably smaller than 0.5 (0.113, 0.131, 0.149, 0.163 and 0.172). Adjusted R Squared usually reflects the influence of independent variables on the dependent one. The conclusion here would be there is rarely predictability of the model when applying linear regression even though the model seems to fit the data set quite well with small gaps between R2 and Adjusted R2. Therefore, logistic regression would be applied instead and it is likewise believed to test the ability of financial and macroeconomic variables m predicting stock excess returns more possibly.

#### 4.2.2. Discussions for each industry

**Table 6: Summary of linear regression models of three sectors**

Firstly, the Industrials sector will be tested with the linear regression model. In the table 5, Model Summary, the Adjusted R2 seems to be very small for every step (0.105, 0.129 and 0.142). This means the independent variables in this industry only predict correctly about 10.5%, 12.9% and 14.2%; while the other percentages might be influenced by external factors. To sum up, linear regression cannot be applied to this case since it does not create a feasible predictability for the model.

For the second industry, the Adjusted R2 is still noticed to be quite slow again from the table 6 in both two steps of the model: 19.6% and 22.2% respectively. These numbers are not power enough to make the model effective as they have not been even reached 50% to explain for the model. Hence, this paper will not apply linear regression as the official model due to its not good predictability in the Basic Materials sector.

Finally, the final sector – Consumer Goods is put to use for testing whether linear regression a suitable choice for building the predictable model in the table 7. For the last industry, linear regression is also proved to be inefficient in building a predictable model for the chosen dependent varabile. The R2 values only runs around from 0.086 to 0.202 which are considered too low even though it does increase after each step.

All in all, the binary logit model is expected to work better than the linear regression one. After the test for each industry and for the general model as well, this traditional type of regression points out its weak predictability with minor percentages. Before coming up with the logistic regression, the Pearson test will be used to detect the multicollinearity again.

### 4.3. Pearson Correlation Coefficient

Among basic assumptions the logistic regression model, two important assumptions are about dependent variables and independent variables. Firstly, the dependent variables should have good correlations with its independent ones, so that it can be significant when independent variables have efffect on predicting the dependent factor. The second assumption states that there should be no multicollinearity among the predictors.

This research has seven financial ratios selected to test their predictability on stock excess returns and since these factors are collected and calculated from the annual financial statements of listed companies on HOSE, they might contant some mutual or related components. In other words, multicollinearity perchance appears if there is strong connections between independent variables, which leads to weak accuracy in our model. Consequently, conducting a Pearson Correlation Coefficient test before heading to the final model is very important and necessary. The test not simply detects the multicollinearity, but it is also helpful for us to base on the result and predict the strongest variables.

Product moment correlation coefficient or bivariate correlation are the other names of this kind of test. The Pearson test measures the linear correlation between the two variables. In principle, Pearson's correlation will find a straight line that best matches the linear relationship of two variables. It is generally utilized as a part of the sciences. It was developed by Karl Pearson after Francis Galton presented a related idea in the 1880s. Furthermore, the test only receives the outcome ranging from +1 to -1. And normally, the condition for significant correlation is that the sig value needs to be lower than 0.05. So here we will assume that the predictors which have high scores would have high correlation.

Firstly, multicollinearity will be test via the high correlation between independent variables. Follow the blue rectangle, there is only r that is considered the highest one (r = 0.774) belonging to ROA and EPS compared with other values in the tabble. So it can be concluded that there is multicollinearity between these two variables. In point of fact, this multicollinearity seems hard to interpret as EPS and ROA generally look like not related to each other. However, one possible way to take into account can be that when the company utizilize well its asset into profit, it can earn more money that leads to bigger benefits of shareholders per share. Furthermore, RIR and INF also has a high negatively correlation with r= -0.854. These variables both show the systematic risk fo the stock, and when interest rae goes down, the companies may have the intendency to borrow the money which leads to higher inflation.

From table 8, we can see that only RENA in the red rectangles have weak correlations with the Const, respectively -0.006. As a consequence, RENA can be predicted to have few power in forecasting stock excess returns. What is more, we also get the conclusion that EPS, TAT and RIR would be the top variables that have the most potential to be added in the final model with r = 0.259, 0.178 and -0.156 respectively. With the previous detected multicollinearity, ROA and INF is forecasted to be dropped out of the model instead of EPS and RIR because of its high sig. value. However, as the correlations between those variables have not yet reach 0.9 yet and the test of Variance Inflation Factor in the linear regression model earlier also proved that there is no multicollinearity. Moreover, Belsley, D. A., Kuh, E., & Welsch, R. E. (2005) stated that fairly high correlation between variables is not always problematic. Thus, ROA, EPS, RIR or INF are not necessarily to be actually dropped out in the process of building the model.

**Table 7: Pearson Correlation Coefficient**

Abbildung in dieser Leseprobe nicht enthalten

### 4.4. Logistic Regression Model

#### 4.4.1. Discussion for the general model

##### 4.4.1.1.Summary of cases and coding of Dependent Variable

**Table 8: Case Processing Summary**

Abbildung in dieser Leseprobe nicht enthalten

As mentioned earlier in this paper, not all of cases will be used to build the model. Instead, only sample in the period of 2013 to 2016 which occupies 596 cases are selected as training sample and to build up the model. The remaining cases will perform an holdout one to validate after the final model.

**Table 9: Dependent Variable Encoding**

Abbildung in dieser Leseprobe nicht enthalten

The way that outcome categories are coded determines the direction of the odds ratios as well as the sign of the B coefficient. Most software programs solve the logistic regression equation for the dichotomous outcome category coded in a scheme of 0 and 1 coding. This paper is not an exception and therefore, POOR performance is coded as 0 and GOOD performance is coded as 1.

##### 4.4.1.2.Selection and evaluation of dependent variables

In a general view, there are five steps presented by SPSS in the process of choosing variable. The first step which is called Step 0 contains no predictors but only the intercept, and as a result, this model is not attractive to researchers.

**Table 10: Variables in the Equation**

Abbildung in dieser Leseprobe nicht enthalten

The Score test is used to forecast if independent variable would be significant or not in the model. In this paper, we use normal alpha range from 0.05 to 0.10 for the p- value. The last column which is labeled as Sig. provides the p-values, thus, we can see that most of the predictor would be statistically significant because the p-values are less than 0.1. However, the Retained Earnings/Net Asset has a specially high p-value of 0.890, which may lead to the rejection of this variable in the model. Besides, ROE and Inflation rate also contain a bit higher p-values, 0.164 and 0.168 respectively, and it means that there is a slight chance of them being removed from the model.

**Table 11: Variables not in the Equation**

Abbildung in dieser Leseprobe nicht enthalten

From table 11, we can guess that EPS will be added to the model first because it has the largest significance value (40.044). The next variables which will enter the model can be ROA (22.113), TAT (18.837) and finally RIR (14.591). The others may not enter the model due to their large p-values.

**Table 12: Variables in the Equation**

Abbildung in dieser Leseprobe nicht enthalten

In Step 1 of table 12, EPS is the first variable added to the model and it is exact what we did predict before. The value of the Wald statistics is 35.594 and the p-value is extremely small which is equal to 0 which means that EPS is statistically significant to the dependent variable. So in the first step, we have the coefficient for the constant of EPS equals to 0.194.

The model after Step 1 is presented as follows:

Y = -0.429 + 0.194 X3

**Table 13: Variables not in Equation**

Abbildung in dieser Leseprobe nicht enthalten

Table 13 provides the list of variables not included in the model after each step. After Step 1, there are 6 remaining variables. From the table above, it is evident that variable X6, RIR, has highest Score (17.351) and its p-value is significantly small (0.000).

Therefore, RIR appears in the model in Step 2. After participating in the model, the p- value of variable X6 stays the same so it is still less than the removed level of 0.05 which reject the null hypothesis of the Wald test that the coefficient of variable X6 equals to 0. The model in Step 2 is presented as follows:

Y= 1.337 - 32.564X6 + 0.2X3

Remaining variables which are not included in the first two steps are presented in table

13. Now, the variable that has highest score is variable X7, INF, with the Score value of 12.279. The p-value of inflation rate is 0 which satisfies perfectly the condition that the p-value must less than 0.05, and as a result, it is the third variable chosen to enter the model.

Y = 4.932 - 28.763X7 - 78.537X6 – 0.2X3

The p-value for the Wald test of EPS, 0.01, albeit is higher than the previous ones but still significant as it is lower 0.05. It means that variable X7 satisfies the condition that p-value less than alpha, and therefore, we can reject the null hypothesis the coefficient of X7 equal to 0.

Finally, TAT is the last factor added into the model due to its largest score of 9.123 compared to the other variables and its smallest significant value of 0.03, which is lower than 0.05. Furthermore, the coefficient of X1 is also significant as it equals to 0. As a result, we will have the final model with TAT, EPS, RIR and IF.

Y = 4.677 + 0.323X1 - 29.409X7 - 79.198X6 + 0.18X3

After Step 4, there are three variables not included in the model. The highest score is 1.085 which belongs to variable X2, RENA. However, its p-value is 0.298 and it is a lot greater than both level of alpha which are 5% and 10%. For the three remaining variables, the p-values are also large which indicates insignificance. Thus, no variables enter the model after Step 3

**Table 14: Model if Term Removed**

Abbildung in dieser Leseprobe nicht enthalten

In the table 14, the log likelihood values are quite well when they eventually go up after each step. The log likelihood is always negative and it indicates better fit with higher values, which this model has showed. In addition, variables chosen by the forward stepwise method should all have significant changes in -2 log likelihood which is a likelihood ratio representing the unexplained variance in the outcome variable. As we can see, variables entering the model in each step all have significance changes in -2 log-likelihood.

All in all, there are four independent variables that are added in the equation and all of them did pass the Wald test. These variables are Total Asset Turnover, EPS, Real Interest Rate and Inflation Rate.

Y = 4.677 + 0.323X1 - 29.409X7 - 79.198X6 + 0.18X3

From the beginning seven financial ratios and macroeconomic variables which are chosen based on the basis of their popularity in literature review and their potential relevancy to the study, there are only four financial ratios left in the final prediction model. All of them are in our expectation about their potential in predictability. Another point worth taking into account is that we did predict correctly the most potential variables after finishing Pearson Correlation Coefficient test. Earnings per share, Total Asset Turnover and Real Interest Rate seemed potential because they proved good correlations with the dependent variable within the test and all of them is actually included in the final model.

##### 4.4.1.3.Evaluating the Goodness-of-fit of the model

For regression models with a categorical dependent variable like logistic regression, it is not possible to compute R2 that has all of the characteristics of R2 in the linear regression model because the model estimates from a logistic regression are maximum likelihood estimates and they are not calculated to minimize variance. Instead, two other analogous R2 are presented in table 15. The values of these R2 are improved throughout three steps. We can concluded that the model at Step 4 is the best because it has largest R2 (0.174 and 0.131) in comparison with the models in the previous steps.

**Table 15: Model Summary**

Abbildung in dieser Leseprobe nicht enthalten

Goodness-of-fit statistics help to determine whether the model describes the data adequately. Table 16 shows the results in each step of the Hosmer and Lemeshow test for goodness-of-fit. In binary logistic regression, the Hosmer and Lemeshow test is the most reliable test of model fit, for the observations are aggregated into groups of similar cases, and then, the statistic is computed based upon these group. The Hosmer-Lemeshow statistic indicates a poor fit if the significance value, or p-value, is less than 0.05. Here, the model quite fits the data except step 2 maybe due to small interaction terms in the model. However, at the final step, the model proves to fit the data well because the significance values are much greater than 0.05.

**Table 16: Hosmer and Lemeshow Test**

Abbildung in dieser Leseprobe nicht enthalten

For a closer look to Hosmer and Lemeshow test, the contingency table is provided in the appendices.

##### 4.4.1.4.Evaluating classification table

**Table 17: Classification Table**

Abbildung in dieser Leseprobe nicht enthalten

The classification table shows the practical results of using the logistic regression model when predicting. For each case, the predicted response is GOOD if that case’s model- predicted probability is greater than the 0.5 cutoff value specified in the dialogs. If the predicted data is matched with the observed one, the number of matched cases will be recorded as correct prediction. So basically in the classification table, cells on the diagonal are correct predictions by using the model.

Y = 4.677 + 0.323X1 - 29.409X7 - 79.198X6 + 0.18X3

The estimated probability that the ith case is in one of the binary outcomes is calculated by using the probability transformation of the logistic regression.

Abbildung in dieser Leseprobe nicht enthalten

In general, there is no doubt that the overall percentage, which measures the accuracy predicting ability of the model, reaches the best result after the final step. After the Real Interest Rate is added to the equation in Step 2, the overall percentage increases to 67.4. However, in the next step, the percentage of accuracy stays quite the same as the previous step with 67.3 when Inflation rate entered the model. Finally, the final factor – Total Asset Turnover – is added in, the overall percentage declines slightly to 66.6% with 67.5% of POOR group and the other is 65.8. This value of overall percentage is considered as quite high and the predictability percentage of both groups does not have disparity.

However, 66.6% of overall percentage is obtained in the training sample which does as well amount to the fact that it is bias to a certain extent because training sample including 596 cases which are used to build up the model and it is also used to test the accuracy of the model afterwards. That is the reason why the demand for holdout sample – 2017 data - is necessary. The final equation is tested by a separated group of cases which is not in use earlier, therefore, the overall percentage in 2017 cases is believed to be more reliable.

In order to provide a visual demonstration of the correct and incorrect prediction, the classification plot or histogram of predicted probabilities is required because this is another very useful piece of information. The X axis is the predicted probability from 0 to 1 of the dependent being classified 1. The Y axis is frequency: the number of cases classified. There are 4 steps in finding the right model and the classification plots of all three steps are provided in the appendices

##### 4.4.1.5.Validation of model

Unlike the training sample using logistic regression model, excel is applied for the holdout sample to validate whether the conducted model is efficient. The method here would be utilizing data at the end of 2016 and put them into the model for prediction of stock excess returns in 2017.

To be more specfic, in the general model, we will input the data of Total Asset Turnover, EPS and Real Interest Rate of 2016 to predict the stock excess returns as Y for 2017 with the calculation from the above logistic regression model. Afterwards, the outcome Y will be coded as “1” or “0” depending on the threshold 0.5 as the predicted data for 2017. And then Y will be compared with the actual stock excess returns which is also depedent variable of the dataset for training sample. The next step is to conduct the Classification table with the same characteristics as the same type of table in SPSS earlier.

**Table 18: Classification table**

Abbildung in dieser Leseprobe nicht enthalten

In the general model, there are 80.9% of POOR case in 2017 are considered as accurate predictability and only 6.7% of the other case.

Although 51% of total is not a so-called perfect number for a prediction model and it might be even quite low compared to the training sample which has 66% of overall percentage. Nevertheless, the reason why percentage of data out-of-sample is less than the in-sample one is maybe due to smaller sample size and it affects the outcome. One thing is noticeable that the model predicts well for POOR group rather than GOOD one, while the percentages are pretty similar for both groups of the training sample.

#### 4.4.2. Discussion for each industry

Simililarly to the building process for general model for all of industries with logistic regression, the logistic regression model of each sector is also be built to see the features of different industries.

##### 4.4.2.1.Logistic Regression Model

**Table 19: Summary of Variables in Equation of three sectors**

Abbildung in dieser Leseprobe nicht enthalten

As showed in the table 19, this is step 1 which represents for all of three variables in the equation with the same kind of training sample. For the first industry - Industrials, EPS is once again essentially added to the model with the extremely low p-value of zero that proved this variable is significant. Finish choosing the variables to enter the model, p-values of them are still kept track. Following the same stepwise method with the General Model, RIR and INF eventually step in the model. So the estimated probability that the ith case is in one of the binary outcomes is calculated by using the probability transformation of the logistic regression.

Abbildung in dieser Leseprobe nicht enthalten

In the Basic Materials sector, EPS is still the first independent variable with a high Wald value of 15.258 and a totally small p-value of zero which can reject the null hypothesis that the coefficient of X3 equals to 0. After EPS has entered the model as in table 19, there are two more variables added, ROE and TAT, which are predicted to be very potential previously. Finally, we conduct the final logit model with the chosen independent variables by the stepwise method including EPS, ROE and TAT to calculate the percentage of accurate forecast is conducted with the chosen variables.

Abbildung in dieser Leseprobe nicht enthalten

For the last industry, 200 out of 596 cases is the sample size used to build up the logistic regression model. Along with TAT are two potential variables that are RIR and EPS respectively for the second and the third position in the final model. Working with the same rules for the prior industries, the Variables not in the Equation table of Consumer Goods sector also builds on and eliminates the variables one step at a time based on the scores and significant values. The final model is organized with the independent variables: EPS, RIR and TAT.

Abbildung in dieser Leseprobe nicht enthalten

What is more, all of the p-values of independent variables are less than 0.05, which means we can reject the hypotheses that they equal to zero and state that they are statistically significant.

##### 4.4.2.2.Classification Table

**Table 20: Summary of Classification table**

Abbildung in dieser Leseprobe nicht enthalten

In general, the overall percentage of 280 out of 596 cases improves step by step in table 20 with 62.9%, 63.9% and 69.6% respectively for each step. For the first model, the correct percentage of GOOD group is notably higher than the POOR one. Therefore, we may apply the model for Industrials companies when they earn good stock excess returns. However, 69.6% of overall percentage is obtained as for the training sample from 2013 to 2016. The final equation will be tested later by a separated holdout sample of 2017 cases which will bring out a more trustable result.

For the Basic Materials industry, the numbers of training sample go up after every step from 67.2%, 70.7% to 71.6%. These percentages are quite impressive for a prediction logit model. Furthermore, the model seems to predict more correctly on the POOR group with the percentages over 80. In brief, this industry is more likely to have the accurate prediction when they have low stock excess returns.

Similarly to the industries before, classification in the table of Consumer Goods is the last to be mentioned for the percentage of correct cases in the official model. One by one, the percentages of accuracy in the predictability of the model are 59.5%, 64.5% and 67%. This result can be assumed pretty good as it has a steady increase along with high numbers of percentage. Alike Basic Materials, this industry tends to predict correctly more POOR group rather than the other one.

##### 4.4.2.3.Validation of model

For three sectors in this paper, the same method to validate model on excel for the general one will likewise be applied for each industry.

Firstly, the Industrials sector contain only 70 out of 149 observations in 2017, which is also the largest one among three industries. This model is contributed by three factors that are Inflation Rate, Real Interest Rate and EPS with the same type of data to validate the model.

**Table 21: Summary of Validation table for model**

Abbildung in dieser Leseprobe nicht enthalten

The hit rate calculated for efficiently predicted cases is 52.9% with 35.9% of POOR group which is nearly a half less than 74.2% of GOOD group in Industrials sector. The result for logistic regression model is 52.9% for overall cases, which is inclined to predict better with Industrials companies having GOOD stock excess returns.

In the second industry, the sample size used to validate the model is narrowed down to 29 cases of the latest year, which is properly the smallest one as well. And the completed logit model for this industry is illustrated below to bring out a clearer view. EPS, Real Interest Rate and Total Asset Turnover are the significant variables chosen to enter the final model for Basic Materials sector. This model brought out a somewhat good prediction as the percentage of accuracy reaches up to 86.2%, that is 25 out of 29 cases are forecasted successful in the table 21. The result of 71.6% in the previous model by training sample also indicates that this is a fit model for the industry to predict stock excess returns, especially for POOR group.

Last but not least, the sample size for Consumer Goods industry used to validate the model is narrowed down to 50 cases of the latest year. In the final model, some familiar variables continue to be added into the equation including EPS, Real Interest Rate and Total Asset Turnover for the data of 2017. A remarkably high rate of correct predictability of the model is recorded with up to 80% with 40 out of 50 cases observed in 2017. Moreover, the POOR group has the tendency to be predicted more correctly with 85.3%.

Overall, these results are matched with the models conducted by training sample (2013-2016) which shows a good fitness of data and reliable models.

### 4.5. Findings and Implications

#### 4.5.1. General Model

So overall, there are four significant independent variables in our final logistic regression model which are Real Interest Rate, Inflation Rate, Total Asset Turnover and Earnings Per Share . Although, the other three variables which are not added also carry important roles in the financial situation of company and have been used widely in corporate finance but they did not show their statistical significance to the performance of the company’s stock.

The research of Dutta, Bandopadhyay and Sengupta (2012) pointed out that one of his independent variables, Sales/Net Assets, had impact on the stock performance. What is more, A. A. V. I. Wijesundera, D. A. S. Weerasinghe, T. P. C. R. Krishna, M. M. D. Gunawardena* and H. R. I. Peiris (2015) also found out that EPS had a significant positive relationship with the stock return which is followed by a simple equation to predict the future stock returns. These evidence somehow supports that Total Asset Turnover and EPS ratios in this paper likewise affect the stock excess returns. Howsover, there were not much research combine both macroeconomic variables and financial ratios to predict the stock excess returns based on the binary logistic regression model. This research figured out that not only the mentioned financial variables had significant influence on the stock excess returns but also economic ones like Infaltion Rate and Interest Rate.

The general model illustrates that Real Interest Rate (X6) and Inflation Rate (X7) has a large negative effect on the stock excess returns, especially Real Interest Rate. Many research such as Alam and Guzzin (2009) or Breen, W., Glosten, L. R., & Jagannathan, R. (1989) proved that interest rate would have a strong negative relationship with stock price or stock return and it mostly happens in a short period of time such as five years. Increasing interest rate means increasing the costs and decreasing the profit of corporations. For example, during the period from 1980 to 2012, the Federal Funds rate line in the figure 1 can be seen clearly to have opposite way with the S&P 500 Index line, that indicates a reversed relationship between them. Furthermore, the interest rate hikes will impact the ability to bear risk of firms, so a low real interest rate would be a good indicator for stock excess returns.

Besides, the Inflation Rate likewise impacts remarkably on the stock excess returns. A high inflation shows that the input price is increased; therefore, consumers can only buy less goods which leads to lower profits for the company. In conclusion, the government should put effort to adjust and balance appropriately the Real Interest Rate as well as Inflation Rate so it can partially reduce less inferior effects on stock excess returns of firms and on the stock market as a whole. On the other side, Total Asset Turnover (X1) influence postively on the stock excess returns. This can be explained that the higher total asset turnover the better how company utilize its assets in generating revenue. So it means the well-performed companies can have better stock excess returns. To improve the Asset Turnover, we can either increase sales or manipulate the assets more logically.

Finally yet importantly, Earnings Per Share (X3) is taken into account for the correlation with the stock excess returns in the model. There is a very strong positive correlation between Public Bank Berhad’s EPS on it stock prices and that there is a significant impact of earnings announcement on Public Bank Berhad’s stock prices (Seetharaman & John Rudolph Raj, 2011). So to improve the EPS, we can decrease the number of shares outstanding by buybacks. However, this method does not always receive good reactions from investors since they impose that EPS might be manipulated. Instead, boosting revenue growth is able to be a better choice for EPS improvement.

On the whole, the equation is considered to balance the percentage of accurate predictability for POOR (67.5%) and GOOD (65%) groups. Therefore, the implication of this logistic regression model can be applied for almost any kind of returns of firms.

#### 4.5.2. Industrials, Basic Materials and Consumer Goods

In the first industry, Industrials, Earning Per Share, Real Interest Rate, Inflation Rate are the variables that are put into the model of prediction. Total Asset Turnover, Earnings Per Share and Return On Equity are the highlights for the model of Basic Materials industry. For the Consumer Goods sector, Total Asset Turnover, Earnings Per Share, Real Interest Rate have effect on the stock excess returns. We can see that Earnings Per Share is the most significant factor as it entered all of three models in three sectors. Furtheremore, macroeconomic factor – Real Interest Rate and another financial ratio – Total Asset Turnover, also occur more than once.

In these sectors, a positive Earnings Per Share appears to be a sine qua non to predict to the stock excess returns maybe because of the same reason which is mentioned in the General Model. As a consequence, increasing the revenue for firms could be applied as an solution to earn good stock excess returns as well.

About the Total Asset Turnover, it seeems only important in the Basic Materials and Consumer Goods sectors. It can be explained that in these industries, firms usually have to operate in a competitive market so the profit margin might be lower than the other sector like Industrials. Therefore, these firms need to apply cost leadership strategy which means turning assets over faster so as to generate better profit. Moreover, the factor Return On Equity in the equation of Basic Materials is also related to the Total Asset Turnover. Because Return On Equity measures the rate of return that the owners of common stock of a company receive on their shareholding, improving Asset Turnover is one of a good choice to enhance the Return On Equity.

Real Interest Rate is observed to impact the Industrials and Consumer Goods industries. An increase in interest rates may lead consumers to increase savings, since they can receive higher rates of return. So the result is that people spend less on goods, which leads to losses for companies in Consumer Goods sector. For the Industrials companies, the high interest rate, especially in a short-term period, can lead to the difficulty in expanding the company and even in paying off the liabilities. The findings of a study revealed that inverse relationship exist between interest rate and industrials in Nigeria (Okonkwo N. Osmond and Egbulonu K. Godslove, 2016) and another found that industrial sectors in France and Germany are exposed to significant levels of interest rate risk. Besides, Inflation Rate would have effect too because if this rate increase, the price of materials for Industrials companies may also increase which leads to lower profit.

## CHAPTER 5

## CONCLUSION

### 5.1. Conclusion

This paper aims to investigate the factors that significantly affect the stock price performance of a company in the stock market, especially the Ho Chi Minh Stock Exchange. In particular, financial ratios and macroeconomic variables are taking into account with the help of logistic regression in order to evaluate and forecast the stock excess returns. Factors that are analyzed in various studies are quite popular and financial ratios have proved its ability in forecasting stocks’ return through a good many researches done by scholars as well as economists all over the world.

The data of all manufacturing companies which are currently listed on HOSE are collected and processes with the purpose of building a model having dichotomous outcomes. It can be observed that two financial ratios and two economic variables out of seven in total are able to classify stocks up to an accuracy level of more than 66% into two categories based on their excess rate of return. The four factors that have statistical significance are: Total Asset Turnover, Earnings Per Share, Real Interest Rate and Inflation Rate.

From investor’s point of view, it can be concluded that it is possible to predict out- performing shares by using these ratios with the application of logistic regression. Therefore, investors as well as financial companies can be of benefits when applying this model to evaluate performance of a certain stock in a given period before giving further specific investment decision. In addition, this model can be used by portfolio managers as an effective tool in managing their stock diversification by raising awareness on invested stocks. There is no denying that the more tools and information to use as references, the higher probability of earning profit and benefitting from investments. The model in this paper can make contribution to a certain extent.

### 5.2. Recommendation

Lack of time for researching has led to several restrictions of this research. Small sample size is one of the limitations of the study. The number of companies in three target industry sectors cannot be considered relatively large for a profound research. In addition to the drawback of sample size, time span of the study is also an issue which need improving. Decrease in the quality of study’s results is consequence of two preceding limitations; hence, estimation gained from the study become less practical. Moreover, although considering additional relevant factors may yield a more accurate outcome, retrieving such large database is impossible to be conducted with a shortage of research time. Hence, further studies should focus on alleviating those aspects so that the research can be more complete.

For the future research, the enlargement of sample size is deed cogitated to be executed immediately. For example, in addition to firms that are listed on Ho Chi Minh Stock Market (HOSE), including corporations on Ha Noi Stock Market (HNX) can create an augmentation of result quality. The second feasible advancement is expanding the length of time span to a greater level such as a ten-year period. Such improvement can comparatively strengthen the reliability of the findings. Another prospect to contemplate is adding other variables which are expected to be significant to the model so that results derived from the experiment can be more precise and constructive in empirical.

## REFERENCES

1. Alam, M., & Uddin, G. S. (2009). Relationship between interest rate and stock price: empirical evidence from developed and developing countries.

2. Altman, E. (1968). Financial ratios, discriminant analysis, and the prediction of corporate bankruptcy. *Journal of Finance*, 23, 589-609.

3. Aono, K., & Iwaisako, T. (2010). On the predictability of Japanese stock returns using dividend yield. *Asia-Pacific Financial Markets*, 17(2), 141-149.

4. Belsley, D. A., Kuh, E., & Welsch, R. E. (2005). Regression diagnostics: Identifying influential data and sources of collinearity (Vol. 571). *John Wiley& Sons*.

5. Breen, W., Glosten, L. R., & Jagannathan, R. (1989). Economic significance of predictable variations in stock index returns. *The Journal of Finance*, 44(5), 1177-1189

6. Campbell, J. Y., & Yogo, M. (2006). Efficient tests of stock return predictability. *Journal of financial economics*, 81(1), 27-60.

7. Campbell, John & Thompson, Samuel P., 2008. "Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average?,"*Scholarly Articles* 2622619, Harvard University Department of Economics.

8. Charles, A., Darné, O., & Kim, J. H. (2017). International stock return predictability: Evidence from new statistical tests. *International Review of Financial Analysis*, 54, 97-113.

9. Chen, N. R. (1986). Economic forces and the stock market. Journal of Business.

10. Clare, A. D. (1994). MACROECONOMIC FACTORS, THE APT AND THE UK STOCKMARKET. *J BFA*.

11. Dimitrova, D. (2005). The relationship between exchange rates and stock prices: Studied in a multivariate model. *Issues in Political Economy*, 14(1), 3-9.

12. Dutta, B. a. (2012). Prediction of Stock Performance in the Indian Stock Market Using Logistic Regression. Durgapur.

13. Essays, UK (November 2013). Retained Earnings and Share Price Relationship in Pakistan.

14. Fama, E. a. (1988). Permanent and temporary components of stock prices. *Journal of Political Economy*, 96, 246–73.

15. Fidell, T. a. (1996). Using multivariate statistics (3rd ed.). New York.

16. Goyal, A., Welch, I., 2003. Predicting the equity premium with dividend ratios. *Management Science* 49, 639–654.

17. Goyal, A., Welch, I., 2003. Predicting the equity premium with dividend ratios. *Management Science* 49, 639–654.

18. Habib, Y., Kiani, Z. I., & Khan, M. A. (2012). Dividend policy and share price volatility: Evidence from Pakistan. *Global Journal of Management and Business Research*, 12(5).

19. Hsing, Y., 2004. Impacts of Fiscal Policy, Monetary Policy, and Exchange Rate Policy on Real GDP in Brazil: A VAR Model, *Brazilian Electronic Journal of Economics* 6: 1-12

20. Islam, M., Khan, T. R., Choudhury, T. T., & Adnan, A. M. (2014). How earning per share (EPS) affects on share price and firm value. *European Journal of Business and Management*, 6(17), 97-108.

21. Ky, T. T. (2012). Testing the capital asset pricing model (CAPM) in the context of Vietnam stock market . HCMC: *IU Library*.

22. Lee, S.; J. Ryu; and L. Kim. 2007. Landslide susceptibility analysis and its verification using likelihood ratio, logistic regression, and artificial neural network models: Case study of Youngin, Korea, Landslides 4: 327–338.

23. Lemeshow, H. a. (2000). *Applied Logistic Regression*, 2nd edition. Wiley Interscience.

24. Li, Hui, et al., 2010 “Predicting business failure using classification and regression tree: An empirical comparison with popular classical statistical methods and top classification mining methods, *Expert Systems with Applicatio* ns 37(8), 5895-5904

25. Maryyam Anwaar 2016 . “Impact of Firms‟ Performance on Stock Returns (Evidence from Listed Companies of FTSE-100 Index London, UK ”. *Global Journal of Management and Business Research: D Accounting and Auditing*, Vol. 16, Issue.1, Version 1.0, Year 2016.

26. Menard, Scott. 1995. Applied Logistic Regression Analysis. Sage Publications. *Series: Quantitative Applications in the Social Sciences*, No. 106.

27. Min, Jae H., and Chulwoo Jeong. 2009. A binary classification method for bankruptcy prediction *, Expert Systems with Applications* 36(3), 5256-5263.

28. Mostafa, Mohamed M. 2010. Forecasting stock exchange movements using neural networks: Empirical evidence from Kuwait, *Expert Systems with Application* 37(9), 6302-6309.

29. Nepal, S.K. 2003. Trail impacts in Sagarmatha (Mt. Everest) National Park, Nepal: A logistic regression analysis, Environmental Management 32(3), 312-321.

30. Neter, J.; W. Wasserman; C.J. Nachtsheim; and M.H. Kutner. 1996. *Applied Linear Regression Models*, 3rd ed., Chicago: Irwin.

31. Öğüt, Hulisi, et al. 2009, Detecting stock-price manipulation in an emerging market: The case of Turkey, *Expert Systems with Applications* 36(9), 11944-11949.

32. Ohlson, J. (1980). Financial ratios and the probabilistic prediction of bankruptcy. *Journal of Accounting Research*, 18, 109-31.

33. Peter Irungu Macharia, & Simon Kamau Gatuhi 2013 . “Effect of Financial Performance Indicators on Market Price of Shares in Commercial Banks of Kenya”. *International Journal of Management & Business Studies IJMBS* Vol. 3, Issue 3, July - Sept 2013, pp 57-60.

34. Phuc, N. V. (2013). PREDICTION MODEL FOR STOCK PRICE. HCMC: IU Library.

35. Shiller, C. a. (1988). Stock Prices, Earnigns, and Expected Dividends. *The Journal of Finance*, 43. 661-676.

36. Tsai, Chih-Fong, et al. 2011. Predicting stock returns by classifier ensembles, *Applied Soft Computing* 11(2), 2452-2459

37. Upadhyay, A., Bandyopadhyay, G., & Dutta, A. (2012). Forecasting stock performance in indian market using multinomial logistic regression. *Journal of Business Studies Quarterly*, 3(3), 16.

38. Van, E., and J. Robert. 1997. The application of neural networks in the forecasting

39. Zavgren, C. V. (1985). Assessing the vulnerability to failure of American industrial firms: a logistic analysis. *Journal of Business Finance & Accounting*, 12(1), 19-45.

40. Zmijewski, M. E. (1984). Methodological issues related to the estimation of financial distress prediction models. *Journal of Accounting research*, 59-82.

## APPENDIX

**List of listed companies**

Abbildung in dieser Leseprobe nicht enthalten

Contingency Table for Hosmer and Lemeshow Test for General Model

Abbildung in dieser Leseprobe nicht enthalten

Contingency Table for Hosmer and Lemeshow Test for Industrials

Abbildung in dieser Leseprobe nicht enthalten

Contingency Table for Hosmer and Lemeshow Test for Basic Materials

Abbildung in dieser Leseprobe nicht enthalten

Contingency Table for Hosmer and Lemeshow Test for Consumer Goods

Abbildung in dieser Leseprobe nicht enthalten

**[...]**

- Quote paper
- An Nguyen (Author), 2018, Prediction Model for Stock Excess Returns on Ho Chi Minh City Stock Exchange, Munich, GRIN Verlag, https://www.grin.com/document/494351

Comments