The Social Origins of Music

Insights from Empirical Studies with Preschool Children

Doctoral Thesis / Dissertation, 2011

179 Pages, Grade: magna cum laude


Table of Contents





1 General Introduction
1.1 Overview
1.2 Terminology
1.3 Universals of Musical Behavior
1.4 Systems of Transmission
1.5 Finding Verifiable Hypotheses
1.6 The Developmental Approach
1.7 Goals of the Thesis

2 Study 1: Prosocial Effects of Joint Music Making
2.1 Introduction
2.2 Methods
2.3 Results
2.4 Discussion

3 Study 2: Ontogeny of Rhythmic Entrainment
3.1 Introduction
3.2 Methods
3.3 Results
3.4 Discussion

4 Study 3: Individual Differences in Rhythmic Entrainment
4.1 Introduction
4.2 Methods
4.3 Results
4.4 Discussion

5 Study 4: Local Differences in Musical Enculturation
5.1 Introduction
5.2 Methods
5.3 Results
5.4 Discussion

6 General Discussion
6.1 Overview
6.2 Evolutionary Functions of Music
6.3 Evolutionary History of Music
6.4 Biological Prerequisites for Music
6.5 Universal Structures of Music
6.6 Applications in Music Education and Therapy
6.7 Conclusion


Appendix A: Supplementary Material for Study

Appendix B: Supplementary Material for Study

Appendix C: Questionnaire Used in Study

Statement of Independence


Musizieren ist ein kennzeichnendes und einzigartiges Verhalten unserer Spezies: In allen Kulturen kommen Menschen zusammen, um gemeinsam Musik zu machen oder der Musik anderer zu lauschen. Wissenschaftler streiten derzeit darüber, ob Musizieren auf angeborenen psychologischen Mechanismen beruht, die in Anpassung an Musik evolvierten, oder ob die Vielfalt musikalischer Verhaltensweisen nur ein Beispiel für Flexibilität und Erfindungsgabe des menschlichen Geistes ist, wonach Musik ein Produkt unserer Kultur wäre. Vertreter letzterer Ansicht lassen sich wiederum zwei Lagern zuordnen: jene, die glauben, Musizieren diene nur dem Vergnügen und bringe keine Überlebensvorteile mit sich, und jene, die meinen, bestimmte musikalische Verhaltensweisen hätten eine kulturell adaptive Funktion inne. Das Ziel meiner Dissertation war, Verhaltensstudien mit Kleinkindern durchzuführen, um dieser Debatte neue empirischen Fakten zu liefern. In Studie 1 testete ich die Hypothese, dass bestimmte Musikrituale entstanden, um prosoziales Verhalten innerhalb der Gruppe zu stärken. Ich fand, dass Paare vierjähriger Kinder nach gemeinsamem Musizieren öfter miteinander kooperieren und sich helfen. In Studie 2 untersuchte ich die Ontogenese der menschlichen Fähigkeit, sich synchron zum Takt der Musik zu bewegen. Entsprechend der Hypothese, dass dieses Verhalten seinen Anfang in sozialen Aktivitäten nahm, fand ich heraus, dass Kinder ihre Bewegungen akkurater zum Schlag einer Trommel synchronisieren, wenn sie in sozialem Kontext trommeln. Studie 2 offenbarte große individuelle Unterschiede im Synchronisationsverhalten der Teilnehmer. Diese Unterschiede gaben Anlass zu der Hypothese, dass sich die Synchronisationsfähigkeit durch soziale Lernprozesse während der musikalischen Enkulturation entwickelt. Darum erweiterte ich Studie 2 um eine neue Bedingung (gemeinsames Trommeln, jedoch ohne visuellen Zugang zu den Bewegungen des Partners) und einen Elternfragebogen über die musikalischen Erfahrungen der Teilnehmer. Ich erhob vergleichende Daten in Leipzig (Studie 3) und in Salvador, Brasilien (Studie 4), davon ausgehend, dass Kinder in diesen beiden Städten unterschiedliche Erfahrungen mit Musik machen. In beiden Stichproben korrelierten die individuellen Unterschiede im Synchronisationsverhalten mit denen in der musikalischen Erfahrung. Außerdem gerieten Leipziger Kinder eher aus dem Takt, wenn das Trommeln des Partners nicht sichtbar war, wohingegen Kinder aus Salvador ihr Trommeln auch ohne visuellen Zugang synchronisierten. In der Diskussion komme ich zu dem Schluss, dass die ersichtliche Angepasstheit diverser musikalischer Erscheinungsformen an bestimmte Situationen am ehesten durch kumulative kulturelle Evolution erklärt werden kann. Angeborene psychologische Mechanismen, welche in Anpassung an spezielle musikalische Fertigkeiten evolvierten, sind keine zwingende Voraussetzung.


Creating music is a distinctive and unique behavior of our species: Humans of all cultures occasionally gather to make music together or to simply listen to the music of others. However, there is currently some discussion amongst scientists as to whether such behavior is a manifestation of innate psychological mechanisms that evolved as adaptations for music, or whether the diversity of musical behavior simply exemplifies the flexibility and inventiveness of the human mind, in which case music should be considered a product of human culture. Scholars advocating the latter view can be divided further into those who believe that music is an evolutionary by-product, existing only for hedonistic reasons, and those who argue that certain musical behaviors emerged because they serve some culturally adaptive function. The goal of my dissertation was to add new empirical data to this debate by conducting behavioral studies with preschool children. In study 1 I tested the hypothesis that certain musical rituals emerged to foster social bonding, ultimately increasing prosocial in-group behavior. Specifically, I found that joint music making enhances subsequent spontaneous cooperative and helpful behavior among pairs of 4-year-old children. In study 2 I investigated the ontogeny of rhythmic entrainment, the human ability to move in synchrony to a musical beat. In accordance with the hypothesis that synchronous behavior first occurred during social activities, I found that children spontaneously entrain their movements to an external drum beat at earlier ages and with higher accuracy if that beat is presented in a social context. Study 2 revealed large inter-individual differences in synchronization accuracy. To explain these differences, I hypothesized that rhythmic entrainment develops via social learning processes during early musical enculturation. To test this hypothesis I extended the original design by adding a new condition (joint drumming without visual access to the partner’s movements) and a parental questionnaire about the participants’ musical experience. I collected comparable data from Leipzig, Germany (study 3), and Salvador, Brazil (study 4), assuming that children from those two cities gain qualitatively different experience with music. I found that in both samples the individual differences in synchronization accuracy correlated with those in musical experience. Furthermore, children from Leipzig tended to drum out of synchrony if the partner’s movements were audible yet hidden from view, whereas children from Salvador spontaneously synchronized their movements even without visual access to the partner’s drumming. I discuss my results in light of the above scenarios and come to the conclusion that music’s apparent adaptiveness to various instances of its use can be best explained by cumulative cultural evolution. The existence of evolved psychological mechanisms specific to particular musical skills is no necessary prerequisite.


During the past years, I have had the privilege of working together and sharing ideas with many open-minded and inspiring people, whose advice and helping hands have contributed substantially to this dissertation.

Michael Tomasello, I owe you a debt of gratitude for fulfilling my wish to be able to prepare this dissertation on the evolutionary origins of music under your supervision at the psychology department of the Max Planck Institute for Evolutionary Anthropology in Leipzig. Ever since our first meeting, you responded to my research proposals with healthy skepticism on the one hand and a great deal of confidence on the other. Your balanced, respectful and provident advice guided me safely all the way from spinning the first vague ideas of potential empirical studies to their successful implementation and well-placed publication.

Peter Hammerstein, I am highly appreciative of your continuous interest in my research. Upon our first meeting two years ago, you agreed without hesitation to supervise my candidature for a doctoral degree at the Humboldt University of Berlin, a job you couldn't have done any better. I am further grateful for the insights I gained from our scientific discussions and your constructive comments on previous drafts of this thesis, boosting the writing process considerably.

Peter Keller, ever since I started as a doctoral student in Leipzig, I have been able to count on your scientific counsel as an expert on music cognition. I sincerely appreciate that you always acknowledged my ideas, reflected on them in critical discussions, and introduced me to many great people from the academic community. It goes without saying how much I welcomed your comments on a pre-version of the thesis and your agreement to be a member of my committee.

This thesis also owes much to my department colleagues, who have contributed to its genesis. Anna Albiach, Grace Fletcher, Johannes Grossmann, Gerlind Große, Katharina Hamann, Daniel Haun, Robert Hepach, Alenka Hribar, Henrike Moll, Roger Mundry, Katrin Riedl, Marco Schmidt, Jasmin Steinwender, Amrisha Vaish, Felix Warneken, Martina Wittig, Emily Wyman, you have all been of help in your own ways, either by discussing my research ideas, improving study designs, interpreting results, reviewing manuscripts, or simply by enduring my grumbling at those times when the testing or writing appeared to be stagnating. Jasmin, I especially acknowledge your valuable comments on a previous version of the thesis. Roger, I owe a further debt to your patience and concern during our countless statistics sessions. The numerous analysis tools we developed together and applied successfully to my data pushed the quality of this dissertation to a much higher level.

Sincere thanks are also due to the research coordinators, without whom testing children would have been impossible. Katharina Haberl, Jana Jurkat, Angela Loose, Elena Rossi, Claudia Salomo, and Eva Siegert, you taught me how to properly design and run behavioral studies with children back when I was working as a research assistant myself. Kathrin Greve, Mareike Sera, and Gesa Volland, your professional support later helped me to productively carry out my own studies. Waldemar Beser, Petra Jahn, Vladislava Nadova, Manuel Reinartz, Annett Witzmann, and Henriette Zeidler, thank you for all your great help on the administrative side. Finally, the lunatic mission of testing 50 Brazilian children over a period of only 2 weeks would have never been accomplished without the considerable support provided by Beatriz Ilari, Angelita and Alexandre Schultz. I am deeply grateful for your commitment and stamina. Bea, I further appreciate your invaluable help in analyzing and interpreting this data set.

Besides my immediate colleagues and collaborators, I have been privileged to share ideas with many brilliant scientists, who have lastingly influenced the way I think about music. Eckart Altenmüller, Gisa Aschersleben, Emma Cohen, Ian Cross, Ellen Dissanayake, Robin Dunbar, Tecumseh Fitch, Sven Grawunder, Erin Hannon, Tommi Himberg, Stefan Koelsch, Reinhard Kopiez, Edward Large, Devin McAuley, Daniel Mietchen, Katie Overy, Ani Patel, Jessica Phillips- Silver, Bruno Repp, Adena Schachner, Stefanie Stadler Elmer, Laurel Trainor, and Sandra Trehub, the substance of my publications and this thesis benefited greatly from our discussions at conferences, symposia or numerous informal meetings. Furthermore, I want to acknowledge that this research would not have been possible without the generous financial support of the German National Academic Foundation, the FAZIT Foundation, and the Max Planck Society.

Finally, I owe a profound debt of gratitude to my family. In every sense, I would not have been here were it not for my sister, grandparents and parents. Throughout my life, you have remained an unfailing and generous source of love, counsel and inspiration. But the one person who stood by me closest throughout the ups and downs of this whole endeavor is my wife Katharina. I am deeply grateful for your infinite patience, constant encouragement and wholehearted love. Last of all, I would like to dedicate this dissertation to my son Laurin. May the fortune of always being able to choose the occupation you are enthusiastic about escort you through your life, as well.


From the perspective of a musician, which I am, creating and listening to music seems the most natural activity. According to my parents, I used to sing along to their lullabies even before I learned how to speak properly. My early childhood was full of rhythm, song and my mother’s guitar playing. During ten years of violin lessons, I never really learned how to read sheet music, instead surviving the weekly rehearsals by secretly copying my teacher’s recital. However, my teenage years were to have the strongest impact on my musical identity, during which I taught myself how to play the guitar and express my feelings and experiences through melodies and lyrics, finally comprehending why most songs from the radio are about a crush you either cannot reach or you missed out on. However, I found that creating or listening to music on your own could never top the awesome experience of blending your intentionally created sounds with those of others, be it as part of a choir, a punk rock band or a symphony orchestra.

Thus, my favorite questions from the perspective of a musician, inspiring the research I have carried out for this dissertation, are: What motivates young children to sing and dance? Why does music create affective experience better than any other art? How can joint music making produce such a strong feeling of togetherness? Why is this passion for music only shared by some, but not all members of society?

From the perspective of a biologist, which I am as well, music does however rank among the most bizarre kinds of human behavior. Music is universal, a significant feature of every known culture, and it involves major investment of resources. And yet, it does not serve an obvious, uncontroversial function for those who create it or listen to it. In this regard, it stands in sharp contrast to most other enjoyable types of human behavior (eating, sleeping, talking, sex etc.), which do have clear adaptive functions. Actually, music was one aspect of human behavior that Charles Darwin was uncertain, he could explain, writing in The Descent of Man, and Selection in Relation to Sex (1871): ‘As neither the enjoyment nor the capacity of producing musical notes are faculties of the least use to man ... they must be ranked amongst the most mysterious with which he is endowed.’ Moreover, such behavior is absent in our closest living relatives, the great apes: Chimpanzee pant hooting or gorilla chest beating is generally not considered to be musical.

Thus, I join the ranks of many biologists since the time of Darwin who have been puzzled by the evolutionary origins of music, asking: Why then is music so pervasive in human life? Are we musical today because music helped our ancestors survive? Have the human mind and body been shaped by natural selection for music or is it basically a cultural invention?

1 General Introduction

‘Music is a product of the behavior of human groups, whether formal or informal: It is humanly organized sound.’

John Blacking, How Musical is Man?, 1973, p.10

1.1 Overview

The primary aim of this dissertation was to add new empirical data to the scientific debate on the evolutionary origins of music by conducting behavioral studies with preschool children. The goal of the first chapter is to introduce the reader to the theoretical concepts and research strategies I used as a basis for developing the empirical studies presented in Chapters 2-5. In the final chapter I discuss the results of my studies against a broader backdrop of music evolution theory.

This general introduction is further structured as follows: The first three sections cover the theoretical concepts underlying my research. I start by emphasizing the obligatory distinction between music as sound, culturally learned musical behavior, and the genetically based human faculty to create music (1.2). Next, I examine ethnographic sources for evidence of universal musical structures and typical contexts where music occurs, since their identification may allow me to delineate the scope of musical behavior that should be prioritized in an evolutionary approach (1.3). Then I argue that music is not just a conveyer of structured sound, but in itself a socially learned and culturally transmitted system, drawing the conclusion that musical behaviors are constrained by three transmission systems operating simultaneously, albeit on different time- scales: genetic evolution, cultural evolution and individual social learning (1.4).

After my theoretical concepts have been explained, the remaining three sections introduce the research strategies I applied. I start by categorizing the theoretical questions about the origins of music being put forward in related literature and suggesting some constraining strategies for conceiving empirical studies (1.5). Then, I introduce and defend the methodological approach I used to verify particular hypotheses on the origins of music or components thereof: the investigation of young children’s musical behavior (1.6). In the final section of the general introduction, I outline the individual goals of each of the four empirical studies included in this dissertation (1.7).

1.2 Terminology

Any empirical approach to 1 music that employs evolutionary theory should try to characterize music as rigorously and completely as possible. Only then does it seem feasible to attempt both to understand how music relates to other aspects of human life and to formulate proposals about evolutionary roots of human musical behavior. When looking at the types of evidence that are available to help shape theories of music evolution, the primary source should, of course, be music itself, and it is crucial that researchers investigating the evolution of music pay attention to developments in ethnomusicology, the science which aims to provide a precise and universally valid account of the phenomenon we are trying to understand (Kirby, 2007). The identification of musical properties that are observed across cultures and historic periods is most essential, because such analyses allow us to delineate the scope of activities under evolutionary discussion and prevent us from inadvertently defining music from the perspective of contemporary Western forms (Livingstone and Thompson, 2006).

Alan P. Merriam, one of the founders of modern ethnomusicology, suggested that music can best be explored in terms of a tripartite model that embraces music (1) as ‘sound’ (exploring how music is structured in terms of rhythm, harmony, melody, dynamic, or timbre), (2) as ‘behavior’ (which embraces the musical and non-musical acts of musicians, and the activities in which the production of music is embedded, like dance and ritual) and (3) as ‘concept’ (how people think about music in terms of its powers and its relations to other domains of human life) (Cross, 2009, Merriam, 1964). However, with the present research in focus, I will restrict the word music to Merriam’s first notion, namely when referring to the auditory outcome of musical behavior. The term musical behavior will instead be used to encompass vocal and instrumental melodic and percussive human activities, including dance and other musical rituals.

John Blacking, another pioneer of modern ethnomusicology, emphasized the interplay of human biology and culture in the creation of music. He wrote that ‘music is a synthesis of cognitive processes which are present in culture and in the human body: The forms it takes, and the effects it has on people, are generated by the social experiences of human bodies in different cultural environments.’ (Blacking, 1973, p. 89). To distinguish the biological roots of human musicality from their implantation in human culture, I will use the term music faculty (Hauser and McDermott, 2003) whenever referring to the mosaic of innate anatomical features and cognitive skills utilized for producing and processing music.

Music, on the one hand, is indisputably an invention of human culture, having been transmitted, reshaped and adapted according to every generation’s needs over tens of thousands of years. The human music faculty, on the other hand, can be seen as those elements of human biology that did and do confine the possible spectrum of musical behaviors and thus the spectrum of music itself. I believe that current scientific disagreement as to the possible innateness and adaptive significance of music often runs into a false dichotomy, when the above definitions get mixed up (see McDermott and Hauser, 2005, and the response by Trainor, 2006). For example, a certain type of musical behavior can be highly adaptive in a particular cultural context, even if most or perhaps all of the innate anatomical features and cognitive skills necessary to produce the behavior were originally selected for other adaptive purposes. Consequently, as Fitch (2006a) argues, if one discusses music as an undifferentiated whole, or as a unitary cognitive ‘module’, one risks overlooking the fact that musical behavior integrates a wide variety of domains (cognitive, emotional, perceptual, motor), may serve a variety of functions (entertainment, mother-infant bonding, mate choice, group cohesion) and may share key components with other behavioral systems like spoken language or gestural communication. Thus, questions like ‘When did music evolve?’ or ‘What is music for?’ seem unlikely to have simple, unitary answers (Fitch, 2006a).

1.3 Universals of Musical Behavior

Musical traditions differ from each other, and these differences have no obvious correlations with genetic differences in their creators. Musical variation is thus a hallmark of those aspects of musical behavior that are learned (Kirby, 2007). However, this variation seems to be constrained in various ways. In other words, there exist certain universals that become obvious when a large number of musical behaviors across different cultures are examined, or when historically distant musical traditions are compared. Finally, Nettl (2000) pointed out that universals need not apply to all musical behaviors. A feature that is found in four out of five musical styles in the world is nevertheless of great interest to anyone studying the evolution of music.

How these constraints on variation arise is somewhat debatable - they could either reflect those aspects of musical behavior that are not learned (i.e. those that are innate), or they could result from universal properties of the way music is used (Kirby, 2007, McDermott and Hauser, 2005). Sometimes, this controversy suffers from additional confusion if the terms ‘use’ and ‘function’ are mixed up or treated synonymously (Justus and Hutsler, 2005). However, in evolutionary theory, a function of a given structure is defined as the purpose for which it was originally selected, whereas a use is a purpose that this structure allows but which was not the one for which it was originally selected (Williams, 1966). In any case, the identification of current universal structures and uses of music may allow us to delineate the scope of musical behavior that should be prioritized in an evolutionary approach, and to disentangle the many circulating theories on music’s evolutionary structures and functions.

When examining ethnographic sources for evidence of universal structures and uses of music, particular emphasis should be placed on traditional small-scale societies1, especially if their economies are based on hunting and gathering (Morley, 2003). This was presumably the only subsistence strategy employed by human societies since the appearance of Homo sapiens some 0.2 million years ago, being replaced by agriculture only gradually with the spread of the Neolithic Revolution2 some 10,000 years ago (Bettinger, 1991). Although traditional small-scale societies of the present should not be considered to be a direct analogy for Paleolithic hunter-gatherer societies (O’Connell, 1995), their methods of subsistence, states of technology, and population sizes may be similar. Since these three parameters define their lifestyle and social organization to a large degree, the contexts within which musical behavior occurs may also be similar to those of Paleolithic hunter-gatherer societies (Morley, 2003).

Making Music is Social Behavior

Merriam, Blacking and many other ethnomusicologists have provided considerable field data from around the world to underscore that musical behavior is usually embedded in a social context. In many, perhaps most traditional small-scale societies, music making involves overt action and active group engagement, and is employed not only in caregiver-infant interaction, entertainment and courtship, but also in ritual1 ceremonies, particularly at times of significant life transitions, such as during initiation rites, weddings or funerals (Cross, 2009). This social embeddedness (Tolbert, 2001) seems almost to exclude Western forms of listening to music, which often occur passively and in a solitary setting, rather than in a social one. However, Cross (2003) argued that during such solitary listening experiences the music itself constitutes a trace of human activity - that music conveys a ‘sense of agency’ (Overy and Molnar-Szakacs, 2009) - with which a private listener may virtually interact.

Furthermore, human music typically occurs in a performative context: Particular songs or dances recur in specific rituals, often stressing supernatural or mystical themes (Arom, 2000, Nettl, 2000). These contexts vary considerably from culture to culture, but all cultures seem to differentiate celebratory music from dirges or laments, adult’s music from children’s music, lullabies from work songs, or draw some similar distinctions (Fitch, 2006a).

Finally, ethnomusicological research indicates that musical behavior might be better conceived of as a mode of interaction among people, with fluid boundaries between creators and listeners. This view stands in sharp contrast to how musical performances are conventionally perceived in contemporary Western societies, namely that one group of people (the performers) actively creates music for another, passive group of people (the audience). This suggests that evolutionary sciences need to find means of addressing the study of music’s proximate and ultimate functions which are inherent or emergent in processes of musical interaction (Cross and Tolbert, 2009).

Joint Music Making Implies Collaboration

What may at first sound rather trivial turns out to be a critical feature of musical group behavior, with far-reaching consequences on the interpretation of ongoing cognitive processes: A musical group performance can only be successful if the participants agree on the shared goal to sing, drum or dance together at the same time (Koelsch, 2010). Although individuals may decide to join a musical performance based on different immediate interests, and certain musical group phenomena emerge as by-products of such independent but simultaneous decisions (e.g. dancers in a nightclub), it is far more typical that performers need to adapt to a greater or lesser extent to their partners’ behavior, i.e. synchronizing and coordinating body movements with one another (Melis and Semmann, 2010).

Psychological research suggests that mutual cooperation in humans is special with regard to the underlying proximate mechanisms, above all powerful skills of intention reading and a motivation to share psychological states with others (Tomasello et al., 2005). These mechanisms seem to allow humans to employ cooperative strategies more flexibly, more efficiently and in a wider range of situations than other species (Melis and Semmann, 2010).

So in the case of music, when individuals form the shared goal of singing in a choir or playing in a band together, they also want their partners to be committed to that goal and to be successful in their roles in order to accomplish the goal (similar to Gräfenhain et al., 2009). Finally, it is of critical importance to distinguish such inter-dependent music making based on shared intentionality (Tomasello and Carpenter, 2007) from musical encounters in which participants view the other performers as mere service providers who are there to allow them to achieve their own individual goals (e.g. the attendees of a rock concert just want to dance to the music).

Human Song: Discrete Pitches and Blending of Voices

Another universal manifestation of musical behavior is singing (Brown et al., 2004, Lehmann et al., 2009/2010). However, human song differs from most animal song systems in that its harmonic structure relies on a discrete set of pitches - a scale - from which notes are chosen to build melodies (Nettl, 2000). This is a key feature which also differentiates human song (with discrete pitch) from human speech (with continuously variable pitch), although there are exceptions on both sides. Furthermore, octaves are perceived as equivalent in almost all cultures, and virtually all scales of the world consist of seven or fewer pitches per octave (Stevens and Byron, 2009).

The utility of a rather small set of discrete pitches becomes obvious during joint singing - a universal manifestation of musical behavior - as it allows harmonious blending of two or more voices. This leads us to another salient difference between song and speech: Singing in a group of two or more people normally involves overlap in auditory production whereas human speech usually occurs in turn-taking conversations (Brown et al., 2004).

A third design feature of song that is shared with speech is that the underlying pitch structures are transposable: A melody is recognized as being the same when it is performed or sung on a higher starting note. This is because, in human music, a melody is defined by the relationships between notes, not just the absolute frequencies of the individual notes (Fitch, 2006a). Thus, two singers with very different pitch ranges (e.g. a father and his child) can still sing the same melody together, despite using a different set of pitches to do so.

Although humans are by far the most complex singers in nature, the human song system demonstrates many features that are shared by other singing species, like the capacity for phonatory invention and improvisation, the organization into repetitive repertoires, or the importance of imitative vocal learning for song acquisition (Fitch, 2006a). However, vertically integrated, multi-part harmonic singing is absent in non-human species. This is a key feature, rendering the human song system different from that of other species, being specialized as it is for coordinated multi-person vocal blending (Brown et al., 2004).

Isochronic Pulse and Rhythmic Entrainment

Another important design feature that distinguishes music both from animal song systems and human speech is its use of isometric rhythms (Fitch, 2006a). Music tends to be isochronic, meaning that there is a regular periodic pulse (also termed the beat, or tactus) which provides a reference framework for other temporal features of the music (Arom, 2000). However, Fitch (2006a) notes that isochrony is a relative feature: Virtually no music is perfectly isochronic, and some musical styles rather freely vary the underlying pulse (e.g. in classical music). Still, the utility of a mutually perceived isochronic pulse becomes evident during group dancing - another universal manifestation of musical behavior - as it allows the synchronization of movements of two or more people at once.

Music’s capability to enable participants to dance in a group or to play in a band together is based on the human capacity of rhythmic entrainment (Clayton et al., 2004). Rhythmic entrainment should be distinguished from the ability of most animals (including humans) to move in a metric, alternating fashion. What is special about humans is not only their capacity to move rhythmically, but their ability to entrain their movements to an external timekeeper, such as a beating drum. This is a key feature of both music and dance, and evolutionary accounts of music must explain the emergence of this ability of humans to synchronize their movements in a rhythmic fashion with those of conspecifics or other external timekeepers (Brown et al., 2000).

Music Involves Dance

As already indicated, music obviously shares many features with dance: They are both temporally organized, and described in terms of rhythm, tempo, beat, pace, and movement (Michell and Gallaher, 2001, Morley, 2003). In fact, one rarely exists without some form of the other. Since the concept of music is interwoven with that of dance in the majority of cultures and music is viewed as action in much ethnomusicological literature (Stevens and Byron, 2009), Cross (2007) suggested that it would be parsimonious to treat music and dance either as intrinsically related or simply as different manifestations of the same behavioral phenomenon.

Music Induces Affect

Another most salient universal of music is its association with affect (Juslin and Sloboda, 2001). And despite a huge cross-cultural variability of the kinds of association, some universals are indeed observable. For example, emotional excitement in music is generally expressed and induced through loud, fast, accelerating, and high-registered sound patterns (Brown et al., 2000). However, not everyone agrees on the significance of such associations. Cross (2003) has questioned conceptions of music that connect it solely and by necessity with affect, arguing that its uses are not restricted to the expression or induction of emotion1. Furthermore, affective interaction is not specific to music, but is relevant to many expressive forms of human behavior, for example, theatrical performance, preaching, or poetry. Identifying and defining the scope of affective cues in music remains a fundamental challenge that requires perceptual, cross-cultural, and cross-modal studies (Livingstone and Thompson, 2006).

General Introduction 9

Music has a Generative Structure

The fact that music uses a finite, usually small, set of elements to generate limitless pattern variety is an interesting formal property - its generativity (Merker, 2006). To do so it needs - just like language - ‘particulate elements’ (Abler, 1989), that is, distinct elements that do not blend to an average when combined, and this it achieves by ‘discarding’ the greater part of the continua of pitch and duration to retain only sets of individual pitches (creating a particular scale) and discrete durations with proportional values (creating a particular rhythm). This provides the entry point to music as ‘self diversifying’ (Abler, 1989) or a ‘generative system’ (Lerdahl and Jackendoff, 1983), details on which are discussed by Merker (2002).

Music is Culturally Transmitted

Musical signals (like linguistic signals) are much more complex than any of the various innate forms of vocalization available to our species, like groans, sobs, laughter, and shouts (Fitch, 2006a). Thus, musical skills and behaviors are not passed on biologically from the parents to their offspring, but are rather learned through experience and participation (Lamont, 2009). Sloboda (1985) distinguished between two types of music acquisition: enculturation1 (autonomous and effortless) and training (specialized, deliberate and conscious). Whereas the majority of developmental studies available have focused on the mechanisms and effects of formal musical training in Western societies, most children across the world acquire their musical knowledge via everyday exposure and participation (Hannon and Trainor, 2007). So, evolutionary scientists should rather focus on the process of musical enculturation.

1.4 Systems of Transmission

Hannon and Trainor (2007) emphasized that early music learning can proceed normally with little feedback in the way of explicit reinforcement (e.g. correction of a child’s wrong tune or rhythm by its parents). In other words, musical behavior can be reliably acquired purely through observation and participation. Thus, during musical enculturation in early childhood, music - like language - can transmit sufficient information about the covert rules that guide its own construction and context-specific usage.

In language acquisition, this enculturation process has been termed iterated learning to reflect the fact that linguistic behavior is transmitted through observational learning from other members of the culture who underwent the same learning process during their own childhood (Kirby, 2007, Kirby et al., 2004).

If we accept that, in a similar way, music is not only a conveyer of structured sound, but in itself a socially learned and culturally transmitted system, then an individual’s musical knowledge is the result of observing and joining in the musical behavior of others (Vygotsky, 1978, Tomasello, 1999, Tomasello et al., 1993). This view opens up a much broader avenue for investigating the origins of music, whereby its design features are confined to three transmission systems operating simultaneously, albeit on three different time-scales: genetic evolution, cultural evolution1 and individual social learning (Christiansen and Kirby, 2003). Importantly, these three dynamical systems interact in non-trivial ways (see Figure 1).

Figure 1. Three interacting transmission systems of human music. According to Christiansen & Kirby (2003), who originally proposed this tripartite model to explain the evolution of language.

Abbildung in dieser Leseprobe nicht enthalten

First, the mechanisms for learning music (whether being domain-general or music-specific) are part of our biological heritage, and are thus subject to genetic evolution. Therefore, any attempt to define the human music faculty by listing its components should pay considerable attention to the social learning mechanisms applied during early music acquisition (Trehub and Hannon, 2006).

Second, these social learning mechanisms in turn impact on and shape the cumulative cultural evolution of music. In other words, if anything like a human instinct to learn music exists it will be shared by all humans and set the broader limits within which musical behaviors can persist through the alternation of generations and diversify as cultural entities together with the many other social practices of any given culture (Fitch, 2006b).

Third, the musical behaviors and skills that emerge from the dynamics of this cultural diversification may themselves potentially affect the biological, or better ‘cultural’ fitness (Mesoudi et al., 2006) of the individuals possessing them and thus may ultimately impact on the evolutionary trajectory of the innate learning mechanisms for them. In other words, the complex circle of interactions between the three dynamical systems opens the possibility of gene-culture co-evolution (Richerson and Boyd, 2005), which might have not only led to musical behaviors that are especially efficient in creating desirable effects but also to an innate proclivity for learning and producing exactly these kinds of behaviors (Trehub and Hannon, 2006).

1.5 Finding Verifiable Hypotheses

The theoretical interest in the evolutionary origins of music increased considerably within the last two decades. Scholars from various disciplines ranging from archaeology to neuroscience, from biology to philosophy have contributed to the debate by proposing hypothetical scenarios to explain the existence of human music in general or particular manifestations thereof. However, before diving into the subject, it is worth reflecting on the sorts of questions that theorists are trying to answer. It may be that some confusion in the current debate actually arose because different questions have been asked (Kirby, 2007).

I suggest that questions regarding the evolution of music can be roughly divided into four categories. Note that it is tempting to compare these categories with Tinbergen’s (1963) four famous evolutionary why questions. However, my categorization is rather based on the foci of interest that appear in current literature on music evolution (Brown, 2000, Cross, 2009, Fitch, 2006a, Wallin et al., 2000, Vitouch and Ladining, 2009/2010).

Types of Questions Regarding the Origins of Music:

1. Function: How could music evolve? What, if any, were the selective pressures involved? Should music be regarded as a genetic adaptation, a cultural technology (Patel, 2008), a behavioral fossil (Fitch, 2005), or dismissed as a spandrel1 ?
2. Evolutionary History: What is the evolutionary story of music? When did it evolve? Were there intermediate stages? What was the impact of biological and cultural evolution?
3. Biological Prerequisites: What are the innate components of the music faculty? Which of them are shared with other species? Which of them are domain-general mechanisms, only exploited by human music? Which components, if any, are specific to music and thus may have been shaped by natural selection for music?
4. Structure: Why is music the way it is and not some other way? How can an evolutionary approach explain the great variation in musical styles, but at the same time particular musical universals we observe?

I agree with Fitch (2006b) that questions concerning the origin of music are worth asking by scientists first and foremost if they are experimentally verifiable. As a current example, Patel (2006) hypothesized that the human capacity for rhythmic entrainment is no direct genetic adaptation for music, but rather a spandrel of vocal mimicry, relying on the same neural circuits that provide the tight link between auditory processing and motor production necessary for complex vocal imitation in humans. Patel’s hypothesis predicts that other vocal mimicking species could learn how to synchronize their movements to a musical beat, if raised by humans. Indeed, three years later two independent studies reported evidence for rhythmic entrainment to music in enculturated members of the parrot clade, who’s members are known to possess open-ended vocal learning skills (Patel et al., 2009, Schachner et al., 2009). This example shows how contemporary comparative research can provide useful insights into the neurobiology and evolution of human music.

As a counterexample, Mithen (2005) presented a book-length argumentation that today’s music is a relict from a former adaptation, namely an earlier hominin1, song-like communication system (an idea that was originally proposed by Darwin, 1871). Based on his expertise in human paleontology and archaeology, Mithen developed a detailed hypothetical picture of the evolutionary history of a kind of protolanguage he dubbed ‘Hmmmm’, meaning ‘holistic, manipulative, multimodal, musical and mimetic’.

However, the direct fossil evidence of musical behavior is sparse: The earliest indisputable musical artifacts of Homo sapiens found to date are flute-like bone pipes from Geissenklösterle in southern Germany, dating back ca. 40,000 years (Conard et al., 2009), and belonging to the Aurignacian culture2. This date more or less coincides with the arrival of modern humans in Europe and suggests that something like music was of high significance to the new immigrants. Yet, a simpler form of music, maybe limited to singing, dancing, and drumming, might have predated the invention of flutes in our species by several ten thousand years, but would have left no fossil record at all (Fitch, 2006a).

Given the transitory nature of musical performance, we are extremely unlikely to ever know what kind of musical behavior our hominin ancestors engaged in. Did Australopithecines dance? Did Neanderthals sing? These questions, however fascinating, will probably never be answered with any certainty. Instead, the proper role of evolutionary hypothesizing should be to spur ideas for contemporary empirical studies (Fitch, 2006b). For example, Darwin’s hypothesized link between music and language evolution suggests comparing the genetic, neural, and developmental mechanisms underlying musical and linguistic abilities in modern humans. Indeed, Koelsch and colleagues (2002) have demonstrated that processing musical syntax engages a very similar network of brain regions to that activated by language processing. Thus, the genetic processes involved in setting up and controlling these brain regions affect both music and language, which can be interpreted as evidence for their shared evolutionary history (Trainor, 2006).

In conclusion, I believe that the most important role of evolutionary theorizing about music is to generate testable predictions, to act as a guide for designing modern empirical research and creating new data in a range of fields like comparative biology, anthropology, genetics, the neurosciences, or - and this is the focus of this dissertation - developmental and cross-cultural psychology. However, one should be aware that new empirical data supporting a hypothesis raised from any of the four question categories (on Function, History, Prerequisites, and Structure) should be compatible with the interpretation of data previously created for the purpose of answering questions from any of the other categories (Kirby, 2007). This highlights an important and difficult challenge facing the study of music evolution: the need for cooperation between researchers working on different aspects of human music.

1.6 The Developmental Approach

The goal of my dissertation was to add new empirical data to the debate on the evolution of music by conducting behavioral studies with preschool children. My interest in investigating the musical behavior of young children arose for a number of reasons.

First and foremost, in order to fully understand a complex behavioral phenomenon like music making, it is crucial to not only look at its full manifestation at adulthood, but also investigate its developmental trajectory (Tinbergen, 1963). Developmental research suggests that young infants are already sensitive to many of the musical universals that are the foundation of musical styles worldwide. For example, they are tuned to consonant and harmonic patterns, to melodic vocalizations, and to metric rhythms (Trehub, 2001). Parents also adapt their usual singing and rocking styles in ways that are convenient to their infant listeners. Later, during early childhood, active musical engagement becomes particularly important: Imitating the basic musical practices of their culture, most young children love to sing, dance and drum (McPherson, 2006). But of course, this developmental trajectory and the level of musical competence achieved depend heavily on individual experience, which - according to the tripartite model of music transmission outlined above - deserves a careful scientific exploration.

Second, kindergarten-aged children are better subjects than adults if one wants to rule out normative knowledge and experience in complex social institutions as sources for their interpretation of a particular kind of musical manipulation and for their decision making during any subsequent dependent measures (Olson and Spelke, 2008). More specifically, when studying the musical behavior of pre-school children, one can basically exclude formal musical training and rather focus on informal musical enculturation (Hannon and Trainor, 2007) when discussing the causes of any observed effects.

Third, if a study is carefully designed, embedded in an appropriate cover story, and - ideally - carried out in an environment familiar to the children (e.g. the kindergarten), the subjects don’t even realize that they are being tested and thus behave truly naturally and spontaneously. This pre-condition can, of course, also be reached in studies with adults, but their higher skepticism and scrutiny makes it much harder to avoid over-interpretation of the experimental manipulation1.

1.7 Goals of the Thesis

The empirical part of this dissertation consists of four independent studies, each focusing on a different aspect of music, driven by a different question from the list outlined above. Yet, I connected all four studies by placing them in the same evolutionary framework. The ages I studied ranged from 2 to 4 years, an age window suggested by relevant prior work and determined by the question being addressed. I hope that this strategy - the inclusion of multiple approaches and a relatively wide age range - will contribute substantially to our understanding of music’s ontogenetic and phylogenetic origins.

Study 1: Prosocial Effects of Joint Music Making

How could music evolve? Questions regarding the selective pressures involved in the emergence of human music have traditionally stirred up much theoretical debate, but only little empirical research. Therefore, the first study (Chapter 2) was aimed at testing predictions from the currently most popular hypothesis about a culturally adaptive function of music, namely that

certain forms of musical group behavior emerged as tools that create joint commitment1 (Gilbert, 1989), foster social bonding and group cohesion2 (Cartwright, 1968), and ultimately increase prosocial in-group behavior and cooperation. Specifically, I predicted that ‘joint music making’ increases subsequent spontaneous cooperative and helpful behavior even among pairs of 4-year-old children; relative to a carefully matched control condition with the same level of social and linguistic interaction, but no music.

Study 2: Ontogeny of Rhythmic Entrainment

Why is the rhythm of most human music based on an isochronic pulse? What are the evolutionary origins of the human tendency to move in synchrony with such rhythmic music? To answer these questions, the aim of the second study (Chapter 3) was to investigate the early ontogeny of human rhythmic entrainment. I hypothesized that rhythmic entrainment first occured in human cultures as a fundamentally social activity, predicting that children spontaneously synchronize their body movements to an external beat at earlier ages and with higher accuracy if that stimulus is presented in a social context. To test this prediction, I invited children at three different ages (2.5, 3.5, and 4.5 years) to drum along with either an adult human partner, a drumming machine, or a drum sound coming from a speaker, and then compared their spontaneous tempo adjustment and synchronization accuracy between age groups and conditions. Using circular statistics to calculate synchronization accuracy proved to be a convenient method when handling the variable drumming behavior of young children, because it avoids any subjective exclusion criteria of response beats, as they would be necessary for conventional linear statistics.

Study 3: Individual Differences in Rhythmic Entrainment

How can an evolutionary approach explain particular universals in musical skills (like the human capacity for rhythmic entrainment), but at the same time the great variation in individual competence observable in (particularly Western) societies? Indeed, study 2 revealed very large inter-individual differences in synchronization accuracy in 2 to 4-year-old German children. To explain these differences, I hypothesized that musical skills, like the capacity for rhythmic entrainment, do not represent music-specific biological adaptations, but emerge via social learning processes during enculturation. As mentioned before, Patel (2006) suggested that the human capacity to move in rhythmic synchrony with music is an evolutionary spandrel, relying on the neural circuitry for complex vocal learning. But then the question arises: Which are the cultural constraints that lead to the first really synchronized musical behavior? I hypothesized that after infants experienced entrainment while being passively moved to music (Trainor, 2007), they learn via imitation during musical interactions that actively synchronizing movements to music is part of the cultural convention1. Thus, in light of the results from study 2, I predicted that 3- year-old children who grow up in an environment with presumably more musical practice in relevant social settings should spontaneously synchronize better during the joint drumming task established in study 2. Therefore, one aim of the third study (Chapter 4) was to test this specific prediction, by including a detailed background survey.

The second aim of study 3 was to follow up on another remaining question from study 2, namely whether the increase in synchronization accuracy in the social condition was due to the shared intention of drumming together in synchrony, or rather due to the child’s enhanced visual perception while drumming with a human model. Therefore, I extended the design of study 2 by way of a new experimental condition, in which the child drummed together with the experimenter in a social context, but this time without perceiving any additional visual rhythmic cues (by introducing a visual barrier between the two drum partners).

Third, after finding in study 1 that joint music making indeed increases prosocial behavior in 4- year-old children, I wanted to test whether this effect was due solely to the interpersonal synchrony created during the musical manipulation, as would be suggested by two recently published studies with adults (Hove and Risen, 2009, Wiltermuth and Heath, 2009). To do so, I further extended the design of study 3 by introducing two transfer-tests, measuring the children’s willingness to help and share resources with the experimenter depending on whether they had just drummed together with him or drummed on their own, along with a pre-recorded beat, while the experimenter was doing something else.

Study 4: Local Differences in Musical Enculturation

How does the local culture shape musical competence in childhood? The aim of the fourth study (Chapter 5) was to extend my methodological spectrum by adding a cross-cultural perspective. Comparative cultural studies are an established tool to test predictions on how local cultural practices shape the ontogeny of psychological competences (Heine and Norenzayan, 2006). Furthermore, I assumed that if the effects I observed in study 3 are also evident in a different culture, then they are more likely to be universal.

For these reasons, I decided to repeat the experiment from study 3 in Salvador da Bahia, a coastal city in Northeastern Brazil, known for its rich heritage of Afro-Brazilian musical practices and a large concentration of live-performing musical ensembles, many of which are based on drumming and percussion (Reiter, 2009). I assumed that children growing up in Salvador have many opportunities to experience rhythmic music making both at home and in public, probably more than those residing in Leipzig, Germany, where performances of rhythmic music - either at home or in public - are supposedly not as frequent and cultivated (Wingerter, 2005). The latter assumption might explain why the German children from study 3 drummed with less synchrony if the experimenter’s movements were hidden from view compared to joint drumming vis-à-vis.

I predicted that children from Salvador would spontaneously synchronize during joint drumming not only better than children from Leipzig, but also regardless of whether they see the codrummer or just hear him, based on the assumption that they have learned the convention, that drumming is a collective endeavor that includes synchrony, from regular observation and participation in rhythmic activities in their daily lives (Young and Ilari, in press).

Study 1: Prosocial Effects of Joint Music Making 19

2 Study 1: Prosocial Effects of Joint Music Making

2.1 Introduction

The evolutionary origins of music are a puzzle, since music lacks any obvious adaptive function (Darwin, 1871, Wallin et al., 2000). Some theorists have speculated that it actually has no adaptive function, but rather music was invented as a pure pleasure stimulant and all components of the human music faculty originally evolved for non-musical purposes, like for language, fine motor- control, or emotional communication (James, 1890, Pinker, 1997). This spandrel hypothesis provides a plausible theoretical rationale for the initial step in music evolution at the point where our human ancestors produced the first music-like behaviors, but it does not preclude the possibility that music could have later acquired some adaptive function, either biological or cultural.

As music making is an omnipresent behavior across all cultures (Merriam, 1964), with deep roots in human ontogeny (Trehub, 2001), an ancient history of at least 40,000 years (Conard et al., 2009), and powerful psychological effects on mood and emotions (Juslin and Sloboda, 2001), other theorists have proposed various adaptive functions of musical behaviors - at least at some stage of human evolution. Such adaptive theories are not necessarily mutually exclusive, since each may account for certain aspects of human music today which could have evolved at different evolutionary periods. In addition, it is important to be clear about whether music in its biological or cultural dimension is at issue: Particular styles of music are products of cultural evolution, whereas the innate and universal components of the human music faculty are products of genetic evolution.

Darwin (1871) proposed that music, and song in particular, once had an adaptive function that it no longer has (i.e. it is a behavioral fossil, Fitch, 2005). In this view, major components of the human music faculty originated in an ancient, pre-linguistic, songlike communication system comprising learned and complex acoustic signals (Brown, 2000b, Mithen, 2005, Richman, 1993). At a later stage in human evolution this communication system was upgraded to a more efficient one - human language - leaving our species with the innate predisposition to create today’s music. Analogous to birdsong during courtship, Darwin suggested that song-like forms of behavior first evolved by means of sexual selection, as individuals advertised for mates (for an extended argument see Miller, 2000).

The idea that music originally evolved as a display 1 was also put forward by Merker (2000), who proposed that synchronous chorusing by hominin males served to display coalition strength, helping to defend territory and at the same time attract migrating females. Similarly, Hagen and Bryant (2003) suggested that group music making and dancing originated in between-group displays, which eventually evolved into a signal 2 system communicating internal stability and the group’s ability to act collectively, thereby establishing meaningful relationships - whether cooperative or hostile - between groups (see Hagen and Hammerstein, 2009/2010, for an extension of the argument).

Such theories, however, have a hard time explaining how such a signaling system could be invented and stabilized in the first place within large groups of often non-related individuals, since it appears rather open to cheating. For example, individuals might participate in the musical group performance, but only pretend to share the group’s coalition agreement, later taking personal advantage of the others’ commitment. Furthermore, these between-group signal theories are supported by rather sparse ethnomusicological evidence and do not account for the majority of musical encounters observed across cultures today, where musical behavior is part of peaceful, in-group ceremonies outside any sexual or competitive context (Clayton, 2009).

Another set of theories treat music not as a signal system but as a behavioral tool - thereby circumventing the problem of cheating. For example, Dissanayake (2000) and Falk (2004) have advocated a kin-selected function for an ancient musical communication system in mother-infant bonding: Prosodic utterances might have served to keep mothers and their infants in psychological contact when they were physically separated, for example, while mothers prepared food or made tools. Indeed, the use of lullabies to soothe infants is considered a human universal (Trehub, 2001) and when it comes to communicating emotion through infant-directed speech3, ‘the melody is the message’ (Fernald, 1989).

However, the hypothesis currently receiving the strongest scientific interest, is that music making and dancing, once invented, turned out to be effective tools to establish and maintain social bonds and joint commitment among the members of social groups, ultimately increasing cooperation and prosocial in-group behavior (Huron, 2001, McNeill, 1995, Roederer, 1984). As outlined in detail in the general introduction, music in traditional small-scale societies is typically produced for pragmatic reasons (Bohlman, 2000), integrated into ritual group ceremonies that are usually considered to be essential for the maintenance of the group’s identity, with the music being an indispensable part of it (Clayton, 2009, Dissanayake, 2006).

On the proximate level, several universal features of human music (Fitch, 2006a, Stevens and Byron, 2009) - like its ritualized context, periodic pulse (beat), discrete pitches, and a highly repetitive repertoire - may contribute to solving the proposed adaptive problem of maintaining group cohesion. Specifically, they all make music more predictable than, for example, language and thus facilitate coordination between multiple individuals at once via synchronization of body movements and blending of voices.

This hypothesis of music as a behavioral tool for supporting group cohesion, predicts that group music ultimately increases joint commitment (Gilbert, 1989) and fosters subsequent cooperation among the performers. Indeed, Anshel and Kipper (1988) found that adult Israeli males cooperated better in a prisoner’s dilemma game and score higher on a questionnaire on trust after a group singing lesson, compared to passive music listening, active poetry reading, or just watching a film together. Likewise, Wiltermuth and Heath (2009) showed that US students scored higher on a weak-link coordination-exercise and a public-goods game after joint singing along with a song played from headphones, compared to no singing or forced asynchronous singing (via playing the same stimulus at individual tempi). However, adding synchronous limb movement (by moving plastic cups from side to side on a table) to the synchronous singing condition did not improve the scores in the subsequent economic games.

Yet, for the evolutionary argument, much stronger evidence would be provided if similar prosocial effects could be shown in young children. Kindergarten children are presumably not engaged in sexual advertising, nor do they have to form coalitions out of fear of encountering rival neighboring groups. In terms of the group cohesion hypothesis, kindergarten children are a better test than adults because children this young, especially in Western cultures, have had little experience of institutionalized music occurring for external pragmatic reasons. Therefore, one can probably ignore normative knowledge as a source for their interpretation of the manipulation phase and for their decision making during the dependent measures (Olson and Spelke, 2008).

But since all human children have musical preferences and skills (Trehub and Hannon, 2006, Zentner and Eerola, 2010), it would be very telling if involvement in joint music making and dancing were to somehow influence children’s spontaneous altruistic and cooperative tendencies.

In this first study, therefore, I had pairs of four-year-old children participate in a three-minute episode of interactive play. Using the same set-up, procedure, and cover story, children either interacted with one another (and an adult) in the context of traditional music making - that is, with dancing, singing, and playing percussion instruments to a novel, but easy-to-learn children’s song (Musical condition) - or they interacted with one another (and an adult) during basically the same joint activity but without singing, dancing, or playing instruments (Non-musical condition). Immediately after this manipulation phase, each pair participated in two social interactions designed to test their willingness to (1) help their partner and (2) cooperate on a problem-solving task. I predicted, according to the group cohesion hypothesis, that prior engagement in joint music making should make children behave in a more prosocial manner, i.e. they would spontaneously help each other more and solve a task jointly rather than alone.

2.2 Methods


A final sample of 96 four-year-old children was included in the study (48 males and 48 females, M = 4.5, range: 4.0 to 4.99 years). Participants were selected from a database of children whose parents had previously agreed to their children’s participation in infant studies. Children from this database came from mixed socio-economic backgrounds, however with a majority from middle and lower-middle classes. The participants for the current study were recruited and tested at 16 different urban daycare centers. The two children in each pair were always recruited from the same kindergarten group. This way I could assume that the peers knew each other from frequent previous interaction. In addition, both children were asked whether they knew the other peer, and they had to agree to play the new game together before the experimenter brought them to the testing room. Children in a pair were familiar with each other in order to create a situation analogous to those present in traditional small-scale societies. However, to ensure that each pair was randomly assigned to one of the two conditions, the order of conditions was defined prior to testing. Importantly, I ensured that the children in this study were not aware that they were participating in a scientific study, instead they joined the experimenter in the belief that they were going to play some novel games the experimenter had brought along. Another 15 pairs took part in the study but were not included in the final sample, either because they did not pass the warm- up task (three pairs), because they did not pass the cooperation test (two pairs), or because they did not pass the helping test (10 pairs). The exclusion criteria for each test are detailed below.

General Study Design

In addition to the musical vs. non-musical manipulation, this study had gender as a predictor variable. For simplification I only paired children of the same gender. As a result, this study had a between-participants, 2 x 2 design, with two predictor variables (condition & gender) and two dependent measures (voluntary helping & spontaneous cooperative problem-solving). The final sample size was 48 pairs with 12 pairs for each condition-gender combination. Each session lasted about 20 minutes and consisted of four main episodes: (1) experimental manipulation phase, (2) dependent measure one, (3) manipulation phase repeated, and (4) dependent measure two. My reason for running the manipulation phase again before the second dependent measure was to reinforce any prosocial effects of music in case they were transient and faded while the children were concerned with the tasks that followed. During data acquisition I alternated the condition from session to session, and counterbalanced the order of dependent measurements within sessions (either helping test first or cooperation test first).


All testing was done by the same experimenter. He recruited the participants in the kindergarten groups and administered all sessions. A technical assistant was present in the testing room, but only spoke during a short introduction to the children and briefly as part of a demonstration of the cooperation test (see below). In order to avoid any third-party/authority biases or bystander effects, the experimenter and the technical assistant were not present during both dependent measures so that the two children had to solve both tasks on their own.

Manipulation Phase

I designed both conditions such that the tasks were equally difficult, plausible, and motivating for the children; that the children had to follow exactly the same sequence of actions in order to reach the same joint goal; and that the amount of movement as well as gestural and verbal interaction was leveled across conditions. To do so, the experimenter strictly followed the same script to demonstrate the consecutive actions and instruct the children, either embedded as dance and sung phrases during the Musical condition or as non-dancing movement and spoken phrases during the Non-musical condition.

I developed a detailed background story to keep the children motivated throughout the whole session and to cover each condition as experimenter-guided pretend play. For a detailed protocol of the set-up and procedure, please refer to Appendix A. The cover story of the manipulation phase included a ‘garden pond’ (an oval blanket) inhabited by nine colored frogs, sitting in trios on three lily pads at the pond’s edge (Figure 2). Each frog could be used - according to the condition - either as a normal toy by letting it hop up and down arround the floor, or as a musical instrument (idiophone) by scraping its back with an additional stick. At the beginning of the first manipulation phase, the experimenter introduced one extra frog to the children and - as a warm-up task - asked each child to hold the frog by him or herself and copy the experimenter’s actions according to the condition. In order to pass the warm-up task each child had to voluntarily pick up the frog at least once and imitate the experimenter’s action. After the warm- up, the experimenter pretended that the nine frogs in the pond were still asleep and needed to be woken up either by a ‘morning song’ (Musical condition) or by some ‘morning exercise’ (Non- musical condition). After one round of demonstration, where the children only had to watch the experimenter, he invited the children to pick up a frog by themselves in order to help wake them up. Then, he again demonstrated the task sequence according to condition, this time asking the children to ‘do as I do’. During the next three minutes of semi-guided play the children voluntarily imitated the experimenter’s actions and copied his utterances.

In the Musical condition, children had the opportunity to follow the experimenter in walking around the pond while synchronizing their steps to the pulse of the music and singing a novel, but easy-to-learn song (see Appendix A for details of the song) along to guitar chords (pre- recorded) using the frogs as instruments in synchrony with the song’s lyrics. Children usually picked up the lyrics quickly and spontaneously imitated the experimenter’s singing and dancing. In the Non-musical condition, the experimenter walked and crawled around the pond while letting the frogs jump in non-synchronized intervals with accompanying utterances. Also in this condition, the children spontaneously copied the experimenter’s utterances and imitated his actions properly. The whole group song/exercise was performed three times during one manipulation phase until all frog trios were ‘awake’.

Figure 2. Manipulation phase of study 1. During the Musical condition, two children danced around a pond (blue blanket) together with the experimenter, while singing a novel children’s song and playing percussion instruments (wooden bullfrogs) in time with their singing and additional background music. The Non-musical condition was based on the same setup, procedure and cover story. However, I omitted any musical features while moving around the pond: The frogs were introduced as simple toys and only spoken language was used for communication.

Abbildung in dieser Leseprobe nicht enthalten

The experimenter followed basically the same script as listed in Appendix A during both conditions, such that the order and timing of events were identical during the whole manipulation phase. Furthermore, in order to constantly control the level of joint activity during play, I repetitively integrated short tasks that required the joint action1 of all three participants:

(1) picking up the frogs from one lily pad at exactly the same moment, (2) putting them back simultaneously, or (3) tapping the frogs together to represent three pretend ‘kisses’ once every round (see Appendix A for details). All of these joint action tasks were triggered by certain key words in the experimenter’s sung / spoken instructions and could only be accomplished in time if both children constantly paid attention to their play partner’s behavior and coordinated their actions accordingly. Finally, the experimenter’s verbal instructions during the Non-musical playing had exactly the same content and were as frequent as those instructions embedded in the lyrics of the song during the Musical playing. In order to sound natural during both conditions, the instructing phrases where highly repetitive and did rhyme (to be applied as lyrics in the song of the Musical condition), but at the same time had a straightforward grammatical structure (to be applied as ‘ordinary’ verbal instructions in the Non-musical condition).

The only difference between the two conditions was that the musical manipulation phase included most of the universal features of music listed in the general introduction, which are - in this combination - not shared with speech and other forms of social interaction (Fitch, 2006a, Nettl, 2005). First, a periodic pulse underlying the children’s song functioned as a shared reference for the children to synchronize their body movements, i.e. (1) the scraping of the frogs with the sticks and (2) the footsteps while ‘dancing’ around the pond. Therefore I played the song at a tempo of 115 beats per minute, which is close to the spontaneous tempo of human locomotion (MacDougall and Moore, 2005). Second, the use of discrete pitches and a highly repetitive melodic structure allowed the children to reproduce the song easily and sing in chorus with the experimenter. Third, the discretization of time and pitch in music made the children’s actions and utterances more predictable and ritualized in the Musical condition, creating a joint performative context. Finally, the integration of music into the interactive game created an additional expressive mode that lies beyond the referential and propositional use of words in language. This ‘a- referential expressiveness of music’ (Fitch, 2006a) has been shown to effectively induce affect in the listener (Juslin and Sloboda, 2001, Juslin and Västfjäll, 2008).

Figure 3. First dependent measure of study 1: spontaneous helping. I created a situation in which one child (Victim) had a sudden accident, which presented the other child (Responder) with the free choice of either helping actively, waiting or continuing to pursue his or her own play activity. Both children were instructed to carry tubes filled with colored marbles from a rack towards a novel apparatus, where each tube could be applied in an attractive game (Panel A). During the actual test, this apparatus served as a distracter, creating costs for the responder, because he or she had to postpone his or her own goals in order to wait for and/or help the victim. The accident happened when both children lifted their last tubes: The bottom lid of the Victim’s tube fell off such that the marbles in it spilled all over the floor. I only coded the Responder’s behavior between the moment they noticed the accident and the moment they continued using his or her tube with the distracter. I categorized this child's response according to the effort he or she made to fix the Victim’s tube and/or collect the marbles (Panel B).

Abbildung in dieser Leseprobe nicht enthalten


1 Use of Footnotes: I decided to use footnotes whenever a new term required definition for those readers who are not acquainted with the respective subject. My intention was to spare the expert such digressions in terminology. Thus he or she can ignore particular footnotes, causing minimal interruption to the flow of reading. Only if a term is of central importance for the dissertation have I placed its definition in the main text.

1 So-called small-scale societies are distinct cultural groups consisting of less than a few hundred people, usually dependent on hunting and gathering, pastoralism, or non-intensive farming. They do not have cities or complex economic and political systems. In contrast, large-scale societies consist of between thousands and millions of people often living in big cities, and depend on intensive agriculture and a complex political and economic infrastructure.

2 The Neolithic Revolution describes the prehistoric shift from hunting and gathering to agriculture and settlement. This transition was ultimately necessary for the rise of modern civilization by creating the foundation for the later process of industrialization and sustained economic growth. Archaeological data indicate that various forms of domestication of plants and animals arose independently in six separate locales worldwide ca. 10,000-7,000 years ago.

1 In anthropology, a ritual is defined as a formalized, predetermined set of symbolic actions generally performed in a particular environment at a regular, recurring interval. The set of actions that comprise a ritual often include recitations, singing, group processions, repetitive dance, manipulation of sacred objects, etc. The general purpose of rituals is to express some fundamental truth or meaning, evoke spiritual, numinous emotional responses from participants, or engage a group of people in unified action to strengthen their communal bonds.

1 According to Izard (1971), a definition of emotion should take into account the following three components: a) the subjective experience or conscious feeling of emotion, b) the processes that occur in the brain and nervous system, and c) the observable expressive patterns of emotion. With relevance to the present work, Darwin (1872) saw the adaptive function of emotional expressions (component c) in communicating the organism’s psychological state and behavioral intentions (component a), which becomes particularly relevant in social species, ultimately promoting social bonding or solution of conflicts, and consequently group activities such as cooperative breeding, hunting or food sharing.

1 In psychology, enculturation is defined as the process through which a child learns the accepted behaviors, norms, and values of the culture by which he or she is surrounded, mainly through experience and observation.

1 Cultural evolution is the structural development (change) of a society over time. In this sense, it is the cultural equivalent of genetic evolution, though the mechanisms invoked are different: If we define culture as the sum of traditions and information that vary among groups, the transmission of these differences across generations rests on social interactions (imprinting, imitation, learning, or teaching) that can permanently change the phenotype. Culture therefore consists of non-genetic, heritable differences among populations and requires overlapping generations that allow intergenerational transmission of phenotypic traits. For a critical discussion see Mesoudi et al. (2006).

1 Not all current aspects of human cognition, behavior, or morphology are the result of genetic adaptation. Spandrels (also called evolutionary by-products) are characteristics that did not solve any recurring problem and have not been shaped by natural selection, but are a consequence of being associated with some adaptation (Buss, 2008). The navel would be an example of a morphological by-product in humans as in all mammals.

1 The Hominini (Engl. hominins) is a tribe of Homininae that comprises one species of the genus Homo (Humans), and two species of the genus Pan (the Common Chimpanzee and the Bonobo), their ancestors, and all extinct lineages of their common ancestor, who lived approximately 5 to 6 million years ago.

2 The Aurignacian is an archaeological culture of the Upper Paleolithic, located in Europe and southwest Asia. It lasted within a period from approximately 45,000 to 35,000 years ago. The Aurignacian is also known for the oldest known examples of sophisticated stone needles, harpoons, jewelry, figurative art and cave paintings.

1 This bias has been dubbed the Hawthorne effect, defined as a form of reactivity whereby subjects improve or modify an aspect of their behavior being experimentally measured simply in response to the fact that they are being studied, not in response to any particular experimental manipulation.

1 Gilbert (1989) argued that human collective actions rest on a special kind of interpersonal commitment, what she called a joint commitment - a single pledge to whose creation each participant makes a contribution. As a result of this commitment all members of the party gain certain obligations and rights: No one should suddenly break off from the joint activity without checking with the others, and everyone can complain about the other’s sudden interruption.

2 Group cohesion is the force bringing group members closer together. It has an emotional and a task-related dimension. Emotional cohesiveness is derived from the connection that members feel to other group members and to their group as a whole. Task-cohesiveness refers to the degree to which group members share group goals and work together to meet these goals (Cartwright, 1968).

1 In sociology a convention can refer to any kind of social rule that is commonly adhered to and accepted by the members of a particular society. These rules are not necessarily written in law or otherwise formalized, instead they are socially constructed and inherited.

1 In behavioral ecology, a display is a genetically fixed and thus rather reflexive behavior, evoked by particular stimuli or emotional states, that in some way attracts attention and affects the behavior of others. Examples can be found in the context of territory defense, courtship and food competition.

2 Displays should be distinguished from intentionally created, communicative signals, which are learned usually via ritualization and (in humans) imitation and are thus flexible in both form and purpose, being used strategically for particular social goals.

3 Infant-directed speech, or ‘motherese’ is a nonstandard form of speech, often unconsciously used by adults when talking to infants and young children. It is usually delivered with a ‘cooing’ pattern of inflection which is different from normal adult speech: high in pitch and with many exaggerated, glissando-like rises and falls.

1 Sebanz et al. (2006) defined joint action as ‘any form of social interaction whereby two or more individuals coordinate their actions in space and time to bring about a change in the environment. [They] propose that successful joint action depends on the abilities (1) to share representations, (2) to predict actions, and (3) to integrate predicted effects of own and others’ actions.’

Excerpt out of 179 pages


The Social Origins of Music
Insights from Empirical Studies with Preschool Children
Humboldt-University of Berlin  (Institut für Theoretische Biologie)
magna cum laude
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
5895 KB
Music, Evolution, Children, Social, Development, Psychology, Biology, Culture, Art, Human, Behavior
Quote paper
Sebastian Kirschner (Author), 2011, The Social Origins of Music, Munich, GRIN Verlag,


  • No comments yet.
Look inside the ebook
Title: The Social Origins of Music

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free