The Concept of Empathy and its Methods of Measurement. A Critique

Theoretical Review of the Concept of Empathy

The exact definition of the construct of empathy has been a subject of debate ever since the term was coined by Edward Titchener in 1909. (Titchener, E. B. 1909), (Batson, 2009), (Gerdes, Segal, & Lietz, 2010). He used the older German concept of “Einfühlung”, which meant projecting one’s feelings onto an external object. In other words, getting somebody to feel what one is currently feeling. (Duan, Hill. 1996)

Titchener’s concept of empathy was broader, including the awareness of another person’s affective state, which includes their thoughts and emotions; as well as sharing other peoples’ feelings. Another theorist, Edmund Husserl, defined empathy as putting oneself into the shoes of another person (Husserl E. 1962). However, George Herbert Mead’s definition is the most basic one used today: the ability to role-take, or understand another person’s current situation, and adjust one’s own behaviour in response, for example, by acting prosocially (Mead, G. H. 1934), (Thompson & Gullone, 2003). Therefore, at a broad level, most definitions of empathy include an affective component, that is, feeling other peoples’ emotions, which can be positive or negative. For example, feeling happy because somebody else is displaying outward signs or expressions of happiness such as laughter. Empathy also includes a cognitive component, which means pushing aside one’s own current point of view, and trying to see situations from another person’s vantage point by using one’s imagination (Hoffman, M. L. 2007), (Hein, G., & Singer, T. 2008), (Wispé, L. 1986).

There are many theorists who view empathy as a trait or ability which can differ among people, that is, some people are more empathic than others (Book, H. 1988), (Feshbach, N. D., & Feshbach, S. 2009). Other researchers view empathy as situation-specific, that is, regardless of one’s empathic abilities or development, their empathic experience differs from depending on the situation. For example, one may cry along with others at a funeral, whereas the same person may not cry along with their friend over the death of a pet fish. (Barrett-Lennard, Godfrey. 1993), (Hoffman, M. L. 1984). Some theorists are more strict in their definitions of empathy. For example, Bird & Viding, (2014) believe that the perceiver of another’s affective state must feel the exact same feelings, and think the same thoughts in order for it to constitute as affective empathy. Other scholars are more lenient, and believe that the perceiver must simply have an appropriate response to another’s affective expressions (Baron-Cohen & Wheelwright, 2004). For instance, the perceiver must display expressions of sadness during a funeral as opposed to smiling, or laughing with joy.

The construct of empathy is similar to many other concepts. It’s subset of cognitive empathy has been likened to Theory of Mind, which refers to the understanding of others mental states in general. Mental state includes thoughts, beliefs, desires as well as emotions and intentions. Therefore, cognitive empathy, that is, trying to understand or interpret another person’s affective state can be called “Emotional Theory of Mind” (Walter, Henrik. 2012). Empathy is also related to mimicry, which means matching one’s affective expressions such as smiles, cries, body postures and other overt signs of affective displays to another person’s emotional expressions (Hatfield, E., Cacioppo, J. T., & Rapson, R. L. 1994), (Walter, Henrik. 2012). Emotional contagion is also similar to empathy, and has been labelled as the primitive, or reflexive version of empathy, such that one can literally “catch” (like the common cold) another person’s emotions (Le, Rapson, 2009). For example, babies cry when they hear other babies crying. However, mimicry and emotional contagion are different from empathy in that the latter concept depends on one’s ability to be self-aware as well as understand that other people are distinct from oneself, and they have their own mental world of thoughts and emotions that are separate from one’s own mental world. For example, when babies mimic each other’s crying, they do so without even understanding that the other babies are separate humans themselves with their own wants and needs. They just cry because it is a reflexive reaction, as opposed to actually sharing the other babies’ emotional states and understanding their mental processes.

Another couple of concepts that are related to empathy includes compassion and sympathy (Decety, J., & Lamm, C. 2009), (Preston & de Waal, 2002). Sympathy means “feeling for” another person, such as pity for a homeless person. Compassion takes a step further in that, along with recognizing that another person is in distress and feeling sorry for them, one also makes an effort or action to alleviate, or reduce their distress, such as by giving food to a starving homeless person, or even taking them into one’s home and providing shelter and warmth for them (Singer, Lamm. 2009). Another feature that distinguishes empathy from compassion and sympathy is that in empathy, one matches the emotions of another person (ex: feeling sad because another person feels sad), whereas in the other 2 concepts, one does not have to share the same emotions. For instance, one may not feel sad because a homeless person is feeling sad due to hunger, rather, the sympathetic person may feel pity for the other person, while a compassionate person may also try to make them feel better, such as by giving them money or help.

Critique of Measurement Tools of Empathy

Interpersonal Reactivity Index (IRI)

This test was created by Mark H. Davis (1980) with the goal of measuring cognitive and affective empathy. The test originally consisted of 50 items, with some items borrowed from the Emotional Empathy Scale by Mehrabian and Epstein (1972) and the Fantasy-Empathy Scale (Stotland, E. 1978). The final version today consists of 28 items, and the test has 4 subscales: Personal Distress Scale (PD), which measures anxiety and other negative emotions in stressful situations with other people; Fantasy Scale (FS), which measures one’s ability to imagine how they would feel if they switched places with a fictional character in a book or movie; Empathic Concern Scale (EC), measures one’s sympathy and compassion for others; and finally, Perspective-Taking Scale (PT) which measures one’s ability to see situations from other people’s point of view.

The PD, EC and FS subscales measure affective empathy and the PT subscale measures cognitive empathy. Each item is in a statement form, and the test-taker needs to rate each item on a likert scale consisting of 5 points, ranging from 1 = “does not describe me at all” to 5 = “describes me very well”. Nine of the items are worded in reverse form (i.e negative form) such as item 14: “Other people’s misfortunes do not usually disturb me a great deal”. This test measures trait empathy, that is, the dispositional empathy that is characteristic of one’s personality. Total scores for the IRI are obtained, as well as subscores for each of the 4 subscales.


Test-retest: In the studies by Davis, M. H. (1980) and Fernandez, Dufey, (2011) tested their participants twice with a 60-75 days interval in between and results showed correlations of 0.61 to 0.79 for males and slightly higher reliability for females (0.62 to 0.89).

The test-retest reliability was the highest for the E.C subscale (0.89 for males and 0.81 for females), and the the PT subscale had the lowest test-reliability (r=0.67 between testing for both males and females) (Fernandez, Dufey, 2011). The IRI also showed high test-retest reliabilities when using foreign samples. For example, Huang, X. et al (2012) tested Chinese teachers and found that the correlations between the tests and retests ranged from 0.70 to 0.78.

Internal Reliability: The Cronbach alphas of the IRI subscale had values in the 0.70s range and the subscales’ alpha values ranged from 0.61 to 0.81 in several different studies (Konrath, Sara. 2013), (Fernandez, Dufey, 2011), (Davis, 1980). The alpha values of the individual subscales differed among different studies. Fernandez, Dufey, (2011) reported the PT subscale as having the lowest internal consistency, and EC having the highest, whereas Gilet, Mella, (2013) found that the EC subscale had the lowest internal consistency and the FS had the highest. The alpha reliability for such a scale is somewhat better (.83) than it is for the subscales (ranging from .71 to .80).

Other translated versions of the IRI also report good internal consistencies. Huang, X. and colleagues (2012) also reported good internal consistencies in their chinese sample, reign form ranging from .61 to .85 for the subscales. The Portuguese version’s subscales had cronbach’s alphas between 0.65 and 0.79 for the 4 subscales, (Manarte, L, 2017), and the Dutch version between 0.73 and 0.83 (de Corte, Buysse, 2007).

The internal consistency for the total IRI score is somewhat better (a=0.83) than for the subscales (a= 0.71 to 0.80 according to Cliffordson, C. (2002).


Factor analysis: Most studies found support for the 4 factor model originally proposed by Davis, M. H. (1983). Hawk, Keijsers, (2013) conducted confirmatory factor analyses on the IRI and its 4 factors and results revealed a good fit for the model, with CFI values ranging from .956 to .962, and RMSEA values ranging from below 0.08 (ranging from .049 to .065), and SRMR values ranging from .037 to .050, indicating that there wasn’t a significant difference between the observed results and the expected correlation matrix. Gilet, Mella, (2013) somewhat found support for these results: the chi square value was less than 3 times the size of the degree of freedom: (%2 (344) = 789, p < .01), the RMSEA was less than 0.08 (0.065), the SRMR was 0.07. However, the CFI was below 0.90 (a mere 0.81).

Murphy, Costello (2018) did not find support for Hawk, Keijsers’s results (2013): the CFI was below 0.90 (0.870, the RMSEA was above 0.08 (0.11), the chi square value was more than 3 times the size of the degree of freedom (%2 = 1899.96, df = 344, p < .001). (Murphy, Costello 2018).

Convergent validity: The EQ has positive correlations with several other measures of empathy. It correlates positively with the Jefferson Scale of Physician Empathy (JSPE), which is a test used to measure doctors and other healthcare providers’ general perspective taking, compassionate care, and a third, more specific factor called “standing in the patients’ shoes”. (Hojat, Mangione, 2009). The total IRI has a correlation of r=0.45 with the JSPE (Murphy, Costello 2018), and correlations ranging from r = 0.40 with the Compassionate Care subscale of the JSPE (r = 0.40) to a weaker correlation (r = 0.22) with the third factor of the JSPE.

The four scales of the IRI correlate with the total scores of the JSPE quite differently: ranging from a moderate correlation of r=0.48 between the EC subscale and the total JSPE score, to a non-significant correlation of r=0.02 between the Personal Distress subscale and total JSPE score. In fact, the Personal Distress subscale does not correlate with any of the subscales of the JSPE either. The Fantasy scale of the IRI correlates at r=0.37 with the Compassionate Care of the JSPE and the general perspective taking factor of the JPSE (r=0.24). The IRI correlates moderately ( r ~ 0.49) with the Empathy Quotient (Melchers, M. et al 2015) and weakly (r ~ .35) with the cognitive subscales of the Brief Empathy Scale (BES) (Jolliffe, Farrington. 2006) as well as the Affective and Cognitive Measure of Empathy (ACME) (Vachon & Lynam, 2016). However, the subscale of the IRI all have differing correlations with different tests. For example, the EC subscale of the IRI has a strong correlation with the affective subscale of the ACME (0.80) and a correlation of r=0.60 with the Questionnaire Measure of Emotional Empathy (QMEE), a correlation between r = 0.51- 0.63 on the Empathy Quotient (EQ), and between r r=0.18-0.59 on the affective subscale of the Basic Empathy Scale (BES) (Baldner & McGinley, 2014), ( Davis, M.H, 1983), (Melchers et al., 2015), (Vachon & Lynam, 2016). But the FS subscale correlates moderately (r= .52) with the QMEE, and with the Empathy Quotient (r= .36-.46). (Baldner & McGinley, 2014), ( Melchers et al., 2015).

The PD subscale of the IRI correlates at an average of r=0.24 with the QMEE and r =.33 with the BES in 2 studies (Davis, M. H. 1983), (Baldner & McGinleys 2014).

However, other studies found that it has poor correlations (r= - 0.06 to 0.20 to) with the Empathy Quotient (EQ) (Melchers et al., 2015) and almost zero correlations with the Toronto Empathy Questionnaire (TEQ) (Baldner & McGinley, 2014).The authors reasoned that the Personal Distress scale isn't actually measuring empathy ata ll, and is only measuring one’s focus on themselves.

Discriminant validity: The Perspective Taking and Empathic concern subscales of the IRI have a weak negative correlation with aggression and lack of moral behaviour, that is, behaviour that does not conform to socially accepted rules, = -.24, p < .001) (Wang, Wang, 2017). Also, the Personal distress subscale has a negative relation with Extraversion and Agreeableness. The authors reasoned that people who tend to experience intense negative reactions in response to other peoples’ problems tend to distance themselves from people altogether to avoid such stress (Hawk, Keijsers, 2013). Also, the IRI is negatively correlated with psychopathy, measured by Levenson Self-Report Psychopathy scale (LSRP) (Tamura, Sugiura, 2016). Primary psychopathy has weak negative correlations with the PT and EC subscales of the IRI (r= -0.25 and -0.27 respectively) and secondary psychopathy has a moderate negative correlation with the PT subscale (r= -0.47), which makes sense, since secondary psychopaths tend to be more aggressive. The subscales of the IRI, EC, FS, PD and PT all have weak but negative correlations with narcissism (r = -.15, r = -.14, r = -.30, r = -.17 respectively) (Chukwuorji, Uzuegbu, 2018).

Basic Empathy Scale (BES):

The BES was created by Jolliffe & Farrington (2006) and is based on the definition of empathy by Cohen and Strayer (1996), which is: experiencing another person’s affective state (affective empathy) and understanding their emotions (cognitive empathy).

The BES is a 20-item test that one’s propensity to experience and understand other people’s anger, sadness, fear or happiness. There are 11 items that measure affective empathy and 9 items that measure cognitive empathy. Each item is presented as a statement and the test-taker has to rate, on an ordinal scale ranging from 1-Strongly Disagree to 5-Strongly Agree, how much the statement is reflective of the test-taker's affective or cognitive empathic tendencies.

A total score is provided by adding the scores from the positively-worded items such as item 16: I can usually realize quickly when a friend is angry. However, reverse-scoring is used for the items that are worded negatively such as item 19 “I am usually not aware of my friend’s feelings”. For ex: a score of 1-Strongly Disagree would be turned into a score of 5-Strongly Agree. Separate scores for the affective and cognitive components are also calculated.


