To examine state influences on personality assessments 112 female subjects received anxiety priming in a pre-post-design under lab conditions. The Big Five were assessed with the NEO FFI. The treatment had a uniform effect on four of the Big Five scales. Neuroticism scores increased due to the treatment, which was hypothesized. Extraversion, agreeableness and conscientiousness mean scores decreased significantly in the post-measurement. Furthermore an augmentation in reliability in the post measurement of the Big Five scales was expected and observed but only found to be significant for the conscientiousness scale. Underlying processes leading to the uniform shift in mean scores as well as implications for diagnostic practice and future research are discussed.


The measurement of personality traits has always been a vital diagnostic interest. Personality traits are associated with a wide range of behavioral variables and they are seen as rather stable over the lifespan. Thus a precise measurement of personality traits provides valuable data for prognostic purposes of various kinds. In clinical, forensic, and business psychology settings decisions are made upon the ground of various diagnostic instruments, usually including personality measures. Personality is often regarded to indicate the future path within the given frame of purpose specific data. Therefore the relevance of an accurate measurement of personality traits cannot be emphasized enough.

The Big Five personality traits are considered to be the fundamental dimensions of personality and have been replicated in most western cultures. There are a number of self-report inventories for measuring the Big Five personality traits. The most established among them is the revised NEO Personality Inventory (NEO–PI–R; Costa & McCrae, 1992) and the NEO Five Factor Inventory (NEO-FFI; Costa and McCrae, 1985; 1992), which is a shortened form of the NEO-PI-R. The construction of these inventories is based upon the principles of the classical test theory (CTT), meaning that an observed score consists of the true score of that trait and a measurement error which is completely random-based. Whether such measures are inherently free of situational influences has been a controversial discussed topic (Deinzer et. al., 1995). That is to say, there might be a third component besides the true score and the error influencing the observed score. Situational circumstances of the measurement might act as a bias by adding systematic variance. This state influence, reflecting temporary conditions as well as possible interactions between situations and persons, has to be regarded as confirmed for personality measures since Deinzer et. al. (1995). The authors were able to quantify the state bias in personality measures without actually having to control it. Hence for the most part it is unknown what types of situational influences there are. Exploring possible situational factors that influence personality assessments seems crucial on the path to improving diagnostic tools and bridging the gap between diagnostic practice and theory, which exists since a state bias in personality measurements has been confirmed. In the present study the field of possible situational factors will be narrowed down and eventually the effect of state anxiety will be explored by manipulating the mood of the participants and observing its influence on the assessment of the Big Five.

Assessment of Personality

The common approaches to human personality are trait-based models. Thus personality is defined to be relatively stable over time and to differ among persons. Nowadays the most prevalent approach to personality is the five factor model containing the traits: Neuroticism, Extraversion, Openness to Experience, Agreeableness and Conscientiousness. To assess these personality traits it is common practice to use self report inventories like the NEO Five Factor Inventory (NEO-FFI; Costa and McCrae, 1985; 1992). In these inventories people rate the degree to which they agree to statements like “I like having a lot of people around me” (Item 2 [Extraversion] from the NEO-FFI; Costa and McCrae, 1985; 1992) on a Likert scale. Each trait is measured by multiple items; the scores are aggregated to the corresponding scale. The measurements comply with the paradigm of the classical test theory. Hence, it is assumed that the observed score is a composite of the true score and the measurement error. Since the measurement error is presumed to be a completely random based quantity, the mean error equals zero when test replications are infinite. Thus, the value of the true score can be estimated. Reliabilities of the inventories are known and sufficient.

A prerequisite for that paradigm is that nothing but the trait has a systematic influence on the observed score. That this presumption might not always be true has been part of the scientific discussion, which is known as the person-situation-interaction-debate (e.g. Anastasi, 1983; Bowers, 1973; Epstein and O’Brien, 1985 in Deinzer et. al. 1995). “[…] The idea that a person’s score in any given test is at least partially determined by situational circumstances” (Ziegler et. al., 2009, p. 345) leads to the implication that there might as well be an interactional effect between situations and persons. Both the impact of the situation and a possible interactional effect would then add systematic variance. But in terms of the CTT, all systematic variance is seen as trait variance. In other words, state variance would then falsely be represented in the trait variance. Note that the reliability in the paradigm of the CTT is defined as the ratio of true trait variance to overall variance. Since the trait variance is most likely compromised by an unknown proportion of state variance, this concept of reliability cannot apply any longer. It therefore becomes a necessity to separate the trait variance from the situational variance. This in turn is a problem which cannot be addressed in terms of classical test theory unless all situational aspects can be defined and measured. Given that this task is rather difficult to fulfill, it seems reasonable to scout for a methodical solution “in which it is possible to distinguish between the different variance sources without defining or measuring situational aspects” (Ziegler et. al., 2009, p. 345).

Latent-State-Trait-Models: A brief outline

A theoretical framework that meets these requirements is called latent-state-trait (LST) theory. LST enables us to separate trait variance from variance due to the situation and person-situation-interaction as well as measurement error by computing specific coefficients containing information on consistency, occasion specificity, and reliability of an instrument in a given sample and situation (Ziegler et. al., 2009). These parameters can be estimated via structural equation modeling without manipulating and even without observing the situations in which the measurements are made (Deinzer et. al., 1995). Yet a single test occasion is not sufficient to estimate the LST coefficients. At least two test occasions are needed.

Taking situations into account of psychological measurements and decomposing the observed score into three variance sources is not an actual contradiction to classical test theory. Rather it is a generalization of it, where the CTT represents the exceptional case of an observed score unbiased due to the situation. However, it would be unjustified to call for that exceptional case, as “[…] there is no psychological measurement in the situational vacuum” (Deinzer, et. al., 1995, p. 7). Being able to distinguish between state and trait variance of a given instrument provides valuable information for the selection of an eligible test. For example, when assessing moods or affects one would wish for a relatively high state variance and a low trait variance. While assessing personality, the opposite would be preferable. But information about the state-trait ratio of a diagnostic instrument is not only useful in selecting the most appropriate test. In addition the results of a test have to be evaluated against the background of the variance distribution.

State influence on personality measurement

As outlined above, the LST approach allows quantifying the state variance without having to define, observe, and control situational circumstances. Consequently, Deinzer et. al. (1995) applied “LST theory to three well known [personality] trait questionnaires: the Freiburg Personality Inventory, the NEO Five-Factor Inventory and the Eysenck Personality Inventory” (Deinzer et. al., 1995, p. 1). The authors found that the majority of the variance of all scales is explained by the respective trait. However, it showed that some scales had up to 20% of state variance. In other words, scales which were supposed to measure a personality trait had a situational bias that was responsible for 20% of the overall variance of that scale. Furthermore the proportion of variance explained due to situational influences was not stable over occasions. For instance, Deinzer et. al. could not find any significant state influence on the conscientiousness-scale of the NEO FFI during the second occasion, while the state was accountable for 20% of variance at the first occasion. Within the NEO FFI scales, neuroticism had the most stable state impact across time, while the state influence of all other scales had notable fluctuations between occasions. These findings raise doubt regarding common diagnostic practice. To make comparisons across individuals or groups, scores are required to be free from uncontrolled biases (Reeve and Bonaccio, 2008). Making long-term predictions based on state biased scores would not be acceptable either. Thus, the authors concluded with regard to the NEO FFI that the observed scores “cannot be regarded as pure personality measures” (Deinzer et. al., 1995, p. 18). This study has to be seen as a remarkable success, since it shed light on a long discussed topic by providing clear empirical evidence. However, the implications for diagnostic practice remain to some extent unclear. The LST approach enabled the researchers to quantify the state; this again is not always possible in the field, basically due to the fact that a repeated measurement is needed. Based on these findings, the foremost goal for applied personality diagnostics seems to be improving, controlling, or weighting the setting of the measurement or the influence it has on observed scores. Even though it does not look like an easy endeavor, there is no way around examining the situational influence with regard to content. The question has to be raised: What situational factors influence what traits in what proportion? In the following, known association with the Big Five are reviewed, with the aim of finding a reasonable starting point to further explore the influence of situational circumstances of personality measurement.


