The Impact of Corporate Textutal Disclosure on Capital Markets

Seminar Paper, 2011

36 Pages, Grade: 1.3


Table of Content

List of figures

List of tables

List of abbreviations

1 Introduction

2 Theoretical Framework
2.1 Importance of textual analysis
2.2 Information content of textual disclosure
2.3 Characteristics of textual disclosure
2.4 Approaches of textual analysis
2.4.1 Manual vs. computer-based approach
2.4.2 Dictionary approach vs. statistical approach

3 Recent Studies
3.1 Feldman et al. (2010)
3.1.1 Data, Sample Selection and Variable Definition
3.1.2 Results
3.1.3 Business Impact
3.2 Hales et al. (2011)
3.2.1 Hypotheses and Design of Experiment 1
3.2.2 Results of Experiment 1
3.2.3 Hypotheses and Design of Experiment 2
3.2.4 Results of Experiment 2
3.2.5 Business Impact

4 Conclusion and future challenges



List of figures

Figure 1: Number of potential and sample filings

Figure 2: Participants’ earnings growth forecast; Experiment 1

Figure 3: Participants’ earnings growth forecast; Experiment 2

List of tables

Table 1: Correlation matrix

Table 2: Regression of SEC filing returns on various signals

Table 3: Regression of drift returns on various signals

Table 4: Earnings Growth Forecasts; Experiment 1

Table 5: Hypothesis tests: Panel A; Experiment 1

Table 6: Hypothesis tests: Panel B; Experiment 1

Table 7: Earnings Growth Forecasts; Experiment 2

Table 8: Hypothesis tests: Panel A; Experiment 2

Table 9: Hypothesis tests: Panel B; Experiment 2

Table 10: Classification of the recent literature

Table 11: Sample construction procedure

Table 12: Prediction of future SUE

List of abbreviations

illustration not visible in this excerpt

1 Introduction

Each year, firms disclose information that is analyzed and eventually reflected in the market price. Sources of information are for example annual reports, earnings announcements and press releases. In the past, financial accounting research focused primarily on the numerical financial information disclosed (cf. Hales et al. 2011, 224).1 Interestingly, research showed that asset price movements could only partly be explained by this quantitative information and thus must have additional influencing factors (cf. Demers/Vega 2010, 2).

Since corporate disclosure generally consists only to a small fraction of qualitative data and dominantly of textual information (cf. Li 2011, 1)2, and since language is the natural medium through which people communicate, financial accounting research started to focus on the analysis of textual disclosure (cf. Hales et al. 2011, 224). Results of these studies show that different aspects of textual disclosure, like the tone (how information is written/expressed) or the readability can influence for example market prices or analyst behavior (e.g. Li 2010 or Tetlock/Saar-Tsechansky/Macskassy 2008).

This paper focuses on research in the field of tone as important characteristic of corporate textual disclosure. Its aim is to provide an overview about the most recent approaches and about challenges that researchers face.

The remainder of this paper proceeds as follows. In section 2 the importance of textual analysis and the information content of textual information are discussed. Furthermore this section provides an overview about different approaches to characterize textual disclosure and a tabular classification of the recent literature. Since this paper focuses on the tone of textual disclosure, different approaches to measure tone are discussed as well. In section 3 two recent studies are discussed and section 4 concludes with a summary of the main results of this paper and gives suggestions for future research.

2 Theoretical Framework

2.1 Importance of textual analysis

Whenever investors receive two or more signals containing price-relevant information they should include all of them in the process of price development (cf. Demers/Vega 2010, 7). Dye/Sridhar (cf. 2004, 59 f.) have developed a model that shows that if investors are provided with both hard and soft information, in efficient markets there is a unique linear equilibrium if the price is a linear function of soft and hard information.3 In the context of this paper, this means that numerical and textual information complement each other. However, Hirshleifer/Teoh (cf. 2003, 378) have found that in practice, textual analysis is often neglected by investors which ultimately leads to mispricing. For this reason it is important to find ways to systematically analyze the textual content of corporate disclosure and to use the results to make more efficient investment decisions (cf. Feldman et al. 2010, 916).

2.2 Information content of textual disclosure

A potential reason why numerical data is often preferred by investors is that it is not clear whether textual disclosure is informative or whether it is just “boilerplate” (SEC, 2002) disclosure that contains no additional valuable or credible information. Furthermore many of these disclosures are voluntary and not audited. So a fear is that management could use it in order to manipulate investors. Although it is for example regulated by the SEC what a MD&A should include, the content is mostly discretionary (cf. Henry 2008, 364 f.). Furthermore the only regulation regarding the style of language is that it should be written in “plain English” (SEC, 2002). Especially when analyzing the tone of corporate disclosure it should be kept in mind that because of the lack of strict regulation managers could use their language to manipulate investors.4

2.3 Characteristics of textual disclosure

According to Li (cf. 2010, 1054), at least three characteristics of textual disclosure are of importance for researchers: quantity, readability and tone. In general, one can differentiate between two types of studies that deal with quantity. One type measures the amount of voluntary disclosed information and examines how markets respond to supplementary voluntary disclosure (e.g. Hutton/Miller/Skinner 2003; Langberg/Siva- ramakrishnan 2010). The other type analyzes the length of a particular section or the whole document as a measure of complexity (e.g. Miller 2010; You/Zhang 2009). Studies that measure the readability of the disclosed information try to analyze for example whether textual disclosure is harder to read when it contains negative information (e.g. Li 2008; Lehavy/Li/Merkley 2010).

The main focus of research in the field of textual disclosure and also the focus of this paper lies on the tone as important characteristic. It is a natural feature of language and provides information over and above the literal meaning of the words used (cf. Penneba- ker/Mehl/Niederhoffer 2003, 550.) For this reason analyzing textual disclosure with regard to tone has become very important (cf. Davis/Piger/Sedor 2008, 1). Different approaches have been made in order to define what tone actually is and how it can be measured. Most researchers such as Davis/Piger/Sedor (2008) and Demers/Vega (2010) differentiate between positive (optimistic) and negative (pessimistic) tone. Additionally, Demers/Vega (2010) introduced certainty as a new language construct. Hales et al. (2011) have investigated the effect of vivid and pallid language on investor judgment.

Appendix A contains a classification of several recent studies according to quantity, tone and readability. This classification is not exhaustive but is included in this paper to provide a better understanding of the different fields of research that have evolved over the last years. It is developed closely related to a similar classification of Li (cf. 2011, Table 1) and extends it in certain aspects.

2.4 Approaches of textual analysis

The challenge of textual analysis is the transformation of textual into numerical information that can be used for further analysis (cf. Loughran/McDonald 2011, 37). During the last years especially researchers from the fields of computer science, psychology and linguistics have developed tools and methods to quantify textual information in a relatively objective way (cf. Feldman et al. 2010, 916 f.). This section provides an overview over the most important methods, their strengths and weaknesses.

2.4.1 Manual vs. computer-based approach

One fundamental difference in textual analysis is between the manual and the computer based approach. Researchers using the further are manually collecting data or manually developing word lists to analyze corporate disclosure (cf. Larcker/Zakolyukina 2010, 5). One example for a manually developed word list is a study by Li (2006) in which he examines the impact of risk sentiment in annual 10-K filings on future stock returns. In order to measure risk sentiment, he selected and counted words that are related to risk (e.g. “risky”) and uncertainty (e.g. “uncertain”) (cf. Li 2006, 31). An advantage of this approach is that the researcher can select words that are related to his individual research construct and thus guarantee a high level of preciseness (cf. Larcker/Zakolyukina 2010, 5).

A major disadvantage is that manually collected word lists could be biased by the researcher’s subjectivity. This leads to only limited possibilities of replication and generalization and thus also to fewer follow-up studies (cf. Li 2011, 4). Related to the potential lack of objectivity in the selection of word lists is also the risk that the researcher misses important dimensions which a psychosocial dictionary would capture (cf. Larcker/Zakolyukina 2010, 5). Studies in which researchers manually collect data have the major disadvantage that the manual collection is related to high costs. For this reason, these types of studies have relatively small sample sizes. As a result, the generalizability is limited and most samples have only low power (cf. Li 2011, 4).

By using a computer-based approach researchers can analyze any number of texts avail- able in a cost-efficient way. As a result larger sample sizes can be generated that im- prove the generalizability and increase the power of the tests. Furthermore this approach may lead to more follow-up research, since replication becomes easier (cf. Li 2011, 5).

2.4.2 Dictionary approach vs. statistical approach

When having selected the computer-based over the manual approach, researchers have the possibility to either use a dictionary approach or a statistical approach. The former is based on a mapping algorithm in which, according to predefined rules, the computer program classifies words into different categories (cf. Li 2010, 1058). These predefined rules and categories are usually based on psychosocial dictionaries developed by psy- chologists. The most widely used are the General Inquirer (GI), the Linguistic Inquiry and Word Count (LIWC), and DICTION.5 In most studies they are used to scan texts for positive and/or negative words and to calculate whether the overall tone is optimistic or pessimistic (cf. Larcker/Zakolyukina 2010, 5; cf. Loughran/McDonald 2011, 35 f.).

The advantage of using one of these dictionaries is that the researcher cannot bias the composition of the word list (cf. Loughran/McDonald 2011, 35). On the other hand there are several negative aspects and many researchers have criticized the use of these dictionaries for the textual analysis of corporate disclosure. One reason is that the dictionary-based approach cannot take the context of a sentence into account. This leads for example to the scenario in which the word “increase” is treated as positive, no matter if it’s used to describe the development of revenues or costs (cf. Li 2011, 6).

Furthermore many words have several meanings and therefore it is not possible to develop word lists that are universally valid (cf. Loughran/McDonald 2011, 36).

According to Berelson (cf. 1972, 164 f.), the crucial success factor of content analysis is the categorization. If categories are not appropriate for a special setting, it might lead to noise or wrong results. Since most dictionaries and word lists are built for psychology, researchers have pointed out that the categorization might not work well for the setting of corporate disclosure (cf. Li 2011, 6).6 Loughran/McDonald have dedicated a study to finding out whether the H4N list can be used for corporate disclosures. They reach the conclusion that this word list should rather not or only very cautiously be used in this context since it can lead to a high rate of misclassified words (cf. Loughran/McDonald 2011, 62).7 Based on these finding, they developed “Fin-Neg”8 - a tailored list of negative financial words (cf. Loughran/McDonald 2011, 36). Another finding of theirs (that is also known as “Zipf’s law”) is that in general a small number of words has a very high influence on the overall result. In order to allow less frequently appearing words to have a greater impact, they include a weighting scheme in the Fin-Neg list (cf. Loughran/McDonald 2011, 50).

The advantage of discipline-specific, tailored word lists is that the measurement error is reduced. Thus the power is increased and the associated attenuation bias in the parameter estimates is reduced (cf. Loughran/McDonald 2011, 44). The idea of tailored word lists is also mentioned by Li who recommends future researchers to either use tailored dictionaries or the statistical approach in order to escape the problems that arise when using traditional dictionaries for analyzing corporate disclosure (cf. Li 2011, 7).

Statistical approaches were developed by mathematicians and computer scientists. They use algorithms to calculate statistical correlations between certain key words and for example the type of the document (cf. Li 2011, 6). In one of his studies, Li (2010) uses a Naïve Bayesian technique: First the sentences are reduced to words and these words are weighted by their frequency. Then the algorithm classifies the whole sentence into a predefined category: positive, negative, neutral or uncertain. The “naïve” aspect of this approach is, that the probability of one word occurring is not affected by the occurrence of other words in the text (Li 2010, 1059 f.). A draw-back of this highly sophisticated approach is that it might be too complex to be used outside of accounting research.

Overall it can be concluded that the research on how to measure tone is very broad and constantly developing. This is necessary since the most frequently used method (to use psychosocial dictionaries) has critical weaknesses.

3 Recent Studies

In this section, two recent studies that belong to the field of research on how tone impacts market participants are discussed in detail.

3.1 Feldman et al. (2010)

The first study that is discussed in detail is by Feldman et al. (2010). It was selected because it extends the general field of research. It does not measure the level of tone but tone change and its impact on immediate and delayed market reactions.9 By tone change they mean the difference in positive/negative tone of the current information compared to the previous filing of the same firm (cf. Feldman et al. 2010, 915 f.).

There are several reasons why Feldman et al. measure tone change (and not the level of tone) in the MD&A section. First of all management often uses the MD&A section of the prior year as a basis for the current year’s filing. This leads to a high autocorrelation between the tone levels of the last and the current MD&A section. For this reason a large proportion of the tone level can be expected.10 A further reason why Feldman et al. measure the change of tone is that the drawbacks of the dictionary approach are less severe when measuring successive files with the same dictionary. When comparing the tone change of two MD&As of the same company, for example industry specific factors that would bias the measurement of the tone level are constant over the years and thus do not bias the result (cf. Feldman et al. 2010, 926 f.).

3.1.1 Data, Sample Selection and Variable Definition

Feldman et al. use a database of Charter Oak Investment Systems Inc. that is based on original Compustat entries to derive “as first reported” information (AFR) on earnings, cash-flows and accruals. They stress the advantage of using this information since only when using AFR figures, researchers are using the data that was actually known to market participants (cf. Feldman et al. 2010, 928). The sample selection process consisted of six steps. First all SEC filings were included that started with 10-K, 10K, 10-Q or 10Q in the SEC database as of June 2008. The overall number (382,435) was reduced to 153,988 by excluding for example MD&As with less than 30 words or those that appeared before Q3/1993 or after Q2/2007 (cf. Feldman et al. 2010, 929 f.).11

illustration not visible in this excerpt

Figure 1: Number of potential and sample filings (Source: Feldman et al. 2010, 930).

Figure 1 shows the potential filings and the sample filings (153,988) for the fourth quarter of each year from 1994 - 2006. Overall Feldman et al. (cf. 2010, 935) conclude that the number of observations is sufficiently high in each quarter to make a meaningful portfolio construction possible.

As already explained Feldman et al. want to explore the impact of tone change on market pricing (beyond the impact of SUE and accruals). For this reason the indepen- dent variables they use are: positive (Pos), negative (Neg) and differential tone change (Pos-Neg, which is the change in optimism net of pessimism) as well as SUE and accruals as control variables (cf. Feldman et al. 2010, 927). The measurement of SUE and accruals is done in accordance with the corresponding literature: Accruals are estimated as earnings minus net operating cash flows. The calculation of SUE depends on whether analyst forecasts are available (cf. Feldman et al. 2010, 931).

As a first step to measure tone change they use Loughran/McDonalds’s word list to count the number of positive and negative words.12 They then define negative tone change to be “the change in the proportion of negative words among all words in the MD&A relative to the average pessimistic signal in all periodic SEC filings made in the prior 400 days (scaled by the standard deviation of the signal in the same period)” (Feldman et al. 2010, 927).13 Therefore the negative tone change is higher, the higher the proportion is. The same is done for positive tone change. A third measure used (differential tone change) is derived by taking “the change in the difference of the positive and negative words divided by total words in the MD&A relative to the average of this measure in all periodic SEC filings made in the prior 400 days (scaled by the standard deviation of the signal in the same period)” (Feldman et al. 2010, 927).

The dependent variable is the excess return of buy-and-hold portfolios they constructed. In order to calculate this excess return Feldman et al. first calculate the buy-and-hold return on the security during the holding period. Then they subtract the buy-and-hold return that was generated by a benchmark portfolio during the same holding period (cf. Feldman et al. 931).

Overall Feldman et al. expect that high (low) scores on positive and differential change lead to high (low) immediate and delayed returns. For negative change they expect that the higher the score, the lower the returns. For the control variables SUE and accruals they expect to find the same results as prior studies - this means that high (low) scores on SUE should lead to high (low) returns and high (low) scores on accruals should lead to low (high) returns (cf. Feldman et al. 2010, 933).

3.1.2 Results

Table 1 shows the correlation among the regression variables used. For the short window around the filing (BHR-filing)14 it can be seen that SUE is significantly15 positively correlated with the excess return (0.059) but that there is no significant correlation of accruals with excess return (0.001). There are no surprises in terms of the correlation of excess returns in the period from the filing until the next quarter’s earnings announcement (BHR-drift) with both accruals and SUE.


1 The authors Hales/Kuang/Venkataraman will be abbreviated: Hales et al. in this paper.

2 I thank Feng Li who kindly agreed to send me a copy of this study before it was published.

3 In their study, soft information is manipulable by management whereas hard information is not (cf. Dye/Sridhar 2004, 72).

4 Approaches and results of research on the credibility of textual information cannot be discussed in detail in this paper. For the interested reader see for example: Demers/Vega 2010 or Hüfner 2007.

5 The GI is based on the negative word list HARVARD-IV-4 TagNeg (H4N). See http://www.wjh. for further reference on the General Inquirer (GI). See for further reference on LIWC and for further reference on DICTION.

6 Li has provided a very illustrative example to back this concern: According to the General Inquirer the sentence “In addition, the Company has experienced attrition of its medicare and commercial business in 1998 and 1999 and expects additional attrition.” (Li 2010, 1059) contains no negative words and 10.53 % positive words (“experience” and “expect”). This is a good example since the tone of this sentence is obviously negative but this fact is not detected by the dictionary (cf. Li 2010, 1059).

7 They base their conclusion on the key finding that 73.8 % of all negative words of the Harvard list are not necessarily negative when they appear in corporate disclosures. Examples for words that frequently appear in corporate disclosures and that are negative according to this dictionary but not in the context of corporate disclosures are “vice” or “capital”. Furthermore words like “cancer” or “mine” generally relate to specific industries and not to negative financial events (cf. Loughran/McDonald 2011, 36).

8 This and 5 other lists they developed can be found at:

9 Feldman et al. (cf. 2010, 925 f.) also measure the impact of earnings surprise (SUE) and accruals on market prices because they want to find out whether the impact of tone change is also visible over and above the impact of earnings surprises and accruals.

10 As for earnings (where also a large proportion is expected) it is very common in the accounting literature to measure market reactions to the change rather than to the level (cf. Feldman et al. 2010, 926).

11 Appendix B contains further details on the sample selection process.

12 In a prior version they have used the Harvard list. The overall results do not differ substantially but this was expected since Feldman et al. do not explore the level of tone (in which case results would have been different depending on the choice of the word list) but the change of tone (cf. Feldman et al. 2010, 932).

13 Feldman et al. call negative/positive/differential tone change: negative/positive/differential signal.

14 BHR-filing is the 3-day-period centered on the SEC filing.

15 All numbers in bold letters are statistically different from zero with a significance level below 5 %.

Excerpt out of 36 pages


The Impact of Corporate Textutal Disclosure on Capital Markets
University of Mannheim
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
597 KB
impact, corporate, textutal, disclosure, capital
Quote paper
Saskia Jarick (Author), 2011, The Impact of Corporate Textutal Disclosure on Capital Markets, Munich, GRIN Verlag,


  • No comments yet.
Read the ebook
Title: The Impact of Corporate Textutal Disclosure on Capital Markets

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free