Carry-over or prediction? Investigating the predictive coding model using an auditory listening task

Bachelor Thesis, 2018

22 Pages, Grade: 1,0



Researches have come up with the framework, that for the fluency of our perception we fundamentally rely on top-down predictions, which occur prior to the appearance of actual external stimuli. These predictions lead to very specific modulations of our perceptual units to facilitate perception. The theory behind this framework is the predictive coding theory, which has gathered increasing interest in research. The predictive coding theory could provide a better understanding of how we cope with perceiving our complex environment. For this study focus lies on the auditory domain. A recent study, conducted by Demarchi et al. (2018), could find evidence supporting the predictive coding framework. By analyzing MEG data they could even show, that predictions are so sharply tuned, that they contain specific tonotopic information about an upcoming tone. Due to the fact, that they trained a classifier on pre-stimulus data to decode post-stimulus data, their results are confounded with a carry-over effect (activity still present from previous stimuli). The purpose of this study is supporting this study and rule the carry-over effect out as the only explanation for their findings. We therefore conducted a follow-up experiment and changed the paradigm, as we included conditions with fixed and random stimulus omissions. Since no prediction activity should be found when the omission is fixed, a higher mean decoding accuracy in the random omission condition would directly indicate towards a tone-specific prediction. In our MEG-experiment we can provide further evidence for the findings of Demarchi et al. (2018), by finding this very result.

1. Introduction

The prediction of future events is one of the most interesting abilities our brain has. Our “predictive brain” helps us to “look into the relevant future” (Bar, 2007, 2011; Bubic, 2010) . The mechanism our brain uses to create these predictions has become a subject of research in many studies (Huang & Rao, 2011). Instead of passively waiting to be activated due to external stimulation different authors claim, that our brain is proactively generating predictions and an internal model of our environment, by constantly using the information of current and past events (Arnal & Giraud, 2012; Bar, 2007, 2011; Kveraga, Ghuman, & Bar, 2007). Generating these predictions is a great challenge for our brain since natural signals can be highly redundant (Huang & Rao, 2011). An example is that in natural pictures the intensities of pixels neighboring each other are likely to be correlated (Field, 1987; Huang & Rao, 2011). Moreover, in an analogical manner, the intensities of pixels are also likely to correlate over time, due to an objects persistence (Dong & Atick, 1995; Ruderman & Bialek, 1994). Thus it would be very inefficient and redundant for our brain to directly represent a raw image by activating multiple receptors. Therefore, it has been proposed that the function of early sensory processing is to reduce redundancy of natural input and to recode the incoming sensory information into a more efficient form (Huang & Rao, 2011; Marsh & Campbell, 2016). To understand these mechanisms, researchers have come up with a model referred to as predictive coding (Ekman, Kok, & De Lange, 2017; Huang & Rao, 2011). Predictive coding proposes, that neural networks tend to learn statistical connections of our environment to then remove predictable input and transmitting only the not predicted information (Dürschmid et al., 2016; Wacongne, Changeux, & Dehaene, 2012).

Most research regarding the mechanism of predictive coding in the sensory system has been done in the visual field. And there is convincing evidence that the model of predictive coding describes visual processing in the brain well (Huang & Rao, 2011). For example Rauss et al. (2011) state, that early V1 activity in humans is strongly modulated by top-down cognitive control. Less research in the auditory section of sensory processing has been made to date (Huang & Rao, 2011). Our auditory system often deals with subsequent and sequential type of sensory input (Bendixen et al., 2012). Hence it benefits from, similar to visual processing, predictions to facilitate the sequential processing. According to the predictive coding model it is supposed that, if our brain expects a specific auditory input it creates a neural prediction of this expected auditory input before it is actually heard (Iria SanMiguel, Saupe, & Schröger, 2013; Wacongne et al., 2011). Considering the neural code of the prediction two assumption are proposed in literature. Firstly, the better a stimulus can be predicted, the closer the top-down modulated neural activity of the prediction matches the actual neural activity once the stimulus is perceived (Bendixen, Schroger, & Winkler, 2009; Bendixen et al., 2012). Secondly, if a stimulus is predicted by our brain, but the actual perceived input differs, there should be a mismatch in the neural activity, which should be detectable (Arnal & Giraud, 2012; Iria SanMiguel et al., 2013). In other words, predicted input should produce weaker neural activation than an unexpected one. Haenschel et al. (2005) provide evidence for the first assumption. They repeatedly let participants hear the same (standard) sound. Hence the confidence of our brain expecting the standard sound should increase over time. Using an EEG, they found an increasing positivity over frontocentral electrodes correlating with the number of repetitions. More evidence was found for example by Vuust et al. (2009). Moreover, Smith and Lewicki (2011) found a striking similarity between learned natural sounds and the impulse response of nerves. One way of testing the second assumption using regularity-violating stimuli (Bendixen et al., 2012). The easiest way is to repeat one standard tone repeatedly. Once the neural system has learned this regularity it expects the standard tone to be heard. If the standard tone is occasionally replaced by an unexpected, deviant tone, there should be a mismatch between prediction and the neural activity of the actual input. This difference can been found in EEG data. Different studies show, that a deviant sound normally produces a mismatch negativity (MMN) in an event-related potential (ERP) (Kujala, Tervaniemi, & Schröger, 2007; Näätänen, 1990) which is widely accepted as an indicator for auditory prediction (Alho, 1995; Cowan, Winkler, Teder, & Näätänen, 1993; Duncan et al., 2009; R. Näätänen, Paavilainen, Rinne, & Alho, 2007). Studies using a repetition suppression design found more evidence (Baldeweg, 2006; Todorovic & de Lange, 2012). But it is not totally certain whether repetition suppression is an effect of just expectation or also confounded with adaption (Grill-Spector, Henson, & Martin, 2006).

An interesting study was performed by Sanmiguel et al. (2013). They investigated the neural code of sensory prediction in auditory listing tasks, by using omission trails. They hypothesized, that if the specific characteristics of an upcoming stimulus are unknown to our top-down modulation, the sensory system should not be able to generate any prediction regarding the stimulus, because it does not have the means to represent the stimulus. Their results supported their hypothesis. However comparing ERP’s to detect indicators for prior predication activity is an indirect way to investigate predictions. (Hansen & Pearce, 2014; Willems, Frank, Nijhof, Hagoort, & Van Den Bosch, 2016). Still, the approach using omission trails is of great interest. When omitting a highly expected stimulus, participants ‘hear’ the absence. It has been found in previous studies that in such circumstances a time-locked neural response to the omitted sound can be observed. (Yabe, Tervaniemi, Reinikainen, & Näätänen, 1997). Using these omission responses is a great way to study top-down predictions without the interference of bottom-up activity (Yabe et al., 1997). In other words, the neural activity in these short silent time windows is not explainable by feedforward propagation caused by the presence of any stimulus (Demarchi et al. 2018).

Keeping the above theoretical implementations in mind, the study of Demarchi et al. (2018) investigated whether auditory predictions carry specific information about the following auditory event. In contrast to Todorovic and de Lange (2012), they did not use a repetition design but varied the regularity of the sound sequence, which contains four different carrier frequencies of sounds, by modulating the entropy level, using a parameter described by Nastase et al. (2014). Four conditions were produced varying from high entropy (low predictability of the upcoming sound) to low entropy (high predictability of the upcoming sound) including omission trails. By training classifiers with Magnetoencephalography (MEG) brain data at different time points they decoded the tone-frequency heard. The most interesting result and the purpose of this follow up study was that a classifier that was trained on an unexpected omission trail could be used to decode the carrier frequency of tones if entropy was low. To be more precise a classifier trained on pre-omission data and tested on post-stimulus data was able to decode above chance. In other words, it could be possible that a classifier trained on pre-stimulus omission data is able to predict post-stimulus sound data. As mentioned, by using omission trails top-down prediction activity can be decoupled from bottom-up activity (Yabe et al., 1997). So if the classifier is capable of doing this is it means, that the prediction activity of the brain, measured during the omission trail contains enough information for the classifier to decode the frequency of a tone. Since pre-omission prediction activity is mostly the top-down activity it would mean that the neural code of the prediction is similar to the actual activity produced due to sensory input, which would fit into the predictive coding model. The study of Demarchi et al. (2018 has a vital advantage over previous studies using the omission response paradigm. Most results of studies conducted in this field, regarding a prediction, are confounded with the activity of the prediction error (I. SanMiguel, Widmann, Bendixen, Trujillo-Barreto, & Schroger, 2013). Because they use post-stimulus or post-omission data, the brain is surprised, if a predicted stimulus does not occur. Hence for the omission response they cannot differentiate between prediction and prediction error. This is called a back-ward looking approach, since the prediction cannot be analyzed directly and is detected through indicators after its appearance. Since Demarchi et al. (2018) used pre-stimulus data to predict post-stimulus data, they directly access the prediction data. Furthermore they elegantly bypass the problem of the prediction error confounding the data linked to the prediction, because a prediction can only be violated after the omission of a predicted stimulus and not before. This setup of the experiment is called a forward-looking approach (Willems et al., 2016).

But the above study has one curtail problem in its design, which is the reason for this study. Since the omissions were always unpredictable for the brain it cannot be ruled out that the findings occur only due to a carry-over effect. By making adjustments to the design this study tries to replicate the above findings and to find out, if on top of the carry-over effect also prediction activity can be found. To test this, we created four different conditions, which vary in entropy level (high and low) and the predictability of the omission (random and fixed). For low entropy conditions: If the brain can predict the absence of a tone (i.e. an omission) there should not be any sound-specific prediction activity (Bendixen et al., 2012; SanMiguel et al., 2013), whereas if the omission is not predictable we expect sound-specific prediction activity. Hence, we hypothesize, that in a condition with low entropy and random omissions, omission-to-sound decoding percentage is higher than chance (H1). Nevertheless, we do not rule out the possibility, that decoding in a condition with low entropy and fixed omission is slightly above chance. As Demarchi et al. (2018) state, this can be caused by a carry-over and is no prediction effect. To clarify, if a neural prediction is found, on top of a carry-over, we need to compare both low entropy conditions to each other. As mentioned, if the omission is predictable, no sound-specific prediction is expected (Bendixen et al., 2009;Bendixen et al., 2012; SanMiguel et al., 2013), meaning information content in the brain activity should be lower. This leads us to the main hypothesis. We hypothesize omission-to-sound decoding in a condition with low entropy and low omission predictability to be better than in a condition with low entropy and high omission predictability (H2). If entropy is high, we do not expect a good omission-to-sound decoding, no matter, if the omission is predictable or not.

2. Materials and Methods

2.1 Participants

A total of 21 volunteering subjects (12 females/9 males, all right-handed) participated in the experiment. Two subjects (2 females) were excluded, due to outlying data. All subjects were asked to give a written consent of being informed and filled out a sociodemographic questionnaire, before entering the MEG. At the time of the experiment, the remaining participants were between 19 and 37 years old (M =23.32, SD =4.16). No previous neurological or psychiatric disorder was reported by any participant. The experimental protocol was approved by the ethics committee of the University of Salzburg and has been carried out in accordance with the Declaration of Helsinki.

2.2 Experimental design and data acquisition

We used a SPF4 - 22-design (Cochran & Cox, 1950) presented in Figure 1. All participants participated in each of the four conditions. The Participants watched a silent nature documentary to stay alert while the auditory stimuli, which they listened to passively, were presented to them binaurally via in-ear headphones (SOUNDPixx, VPixx technologies, Canada). The sound sequence (sinusoidal tones) ranged between 200 and 2000HZ and were spaced logarithmically. The tones lasted 100 ms each (5 ms linear fade in / out) and 333ms were between two sounds. For exact timing of the stimuli the Psychophysics Toolbox (PTB-3), running on Matlab was used (Brainard, 1997; Kleiner et al., 2007). Overall the participants were exposed to eight blocks, undergoing every condition twice, each containing 2000 stimuli. 10% of the sounds were omitted in every block, hence yielding 200 omission trials (50 per sound frequency). An omission was never followed by an omission. The number of tones per frequency was balanced within the blocks. To modulate entropy between conditions different transition matrices (Nastase, Iacovella, & Hasson, 2014) were used (see Figure 1). In the random-sound-random-omission condition and in the random-sound-fixed-omission condition all of the four tones had the same probability, hence the upcoming tone is not predictable (high entropy). In the ordered-sound-random-omission condition and the ordered-sound-fixed-omission condition entropy is low since the upcoming sound is predictable. For conditions random-sound-random-omission and ordered-sound-random-omission omissions are not predictable, whereas they are for conditions random-sound-fixed-omission and ordered-sound-fixed-omission.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1. Experimental Design ordered-sound-random-omission condition (OSRO), ordered-sound-fixed-omission condition (OSFO), random-sound-random-omission condition (RSRO), random-sound-fixed-omission condition (RSFO)

Five head position indicator (HPI) coils, anatomical landmarks (nasion, preauricular points) and around 300 head shape points of the scalp sampled with the Polhemus FASTTRAK digitizer, before entering the MEG cabin. We used a whole head MEG (Elekta Neuromag Triux, Elekta Oy, Finland) in a standard passive magnetically shield room to record the magnetic signal at 1000 Hz (hardware filters: 0.1 ­ 330 Hz). Signals were recorded with 102 magnetometers and 204 orthogonally placed planar gradiometers, which were placed at 102 different positions. Two bipolar electrooculographic (EOG) and one bipolar electrocardiographic (ECG) channels with a sampling size of 1kHz were applied to the subject. Before the start of the experiment hearing thresholds for all subject were determined using the 2000Hz sound and a Bayesian adaptive method (Kontsevich & Tyler, 1999; Sanchez, Lecaignard, Otman, Maby, & Mattout, 2016). The toolbox used was the variational Bayesian approach (VBA) toolbox running on Matlab (Daunizeau, Adam, & Rigoux, 2014). 40 dB were added to the threshold of the 2000Hz sound and the remaining three sounds were amplified respectively.

2.3 Analysis

The general data analysis was done using MNE (Gramfort et al., 2014), an open source Python software for analyzing and visualizing MEG and EEG data. To reduce external noise (mainly 16.6Hz, and 50Hz plus harmonics) and to standardize the head position in the MEG, respectively to head movement, the maxfilter program of the MNE, was used. It uses the single space separation (SSS) method (Taulu, Kajola, & Simola, 2004; Taulu, Simola, & Kajola, 2005). Also using signal-space projection (SSP) (Uusitalo & Ilmoniemi, 1997) eye-blinking, EOG and ECG data (separately for magnetometers and gradiometers) were detected and filtered. A window sinc, finite impulse response (FIR), filter (Widmann & Schröger, 2012) ranging between 0.1 Hz and 20 Hz was applied to the continuous data. With regard to the stimulus, the data was then segmented from 200ms before to 400ms after stimulus onset and down-sampled to 50 Hz. To extract the data to an excel file, we excluded self-repetitions from all data of the four conditions. The classifier pipeline consisted of a standardizer (scaling all channels to 0 mean and unit variance) and a linear support vector machine (SVM) classifier (Suykens & Vandewalle, 1999). Scikit-learn (Pedregosa et al., 2012), an open source machine learning software for Python, was used to train the SVM classifier on omission data and it was tested on sound data. The trails were then labeled according to the carrier frequency of the omitted sound. Classification accuracy was averaged for each participant for each condition. The average was used to depict the classifier’s capability to decode over time (i.e. time­generalization analysis at sensor level) (for details see Demarchi et al., (2018)). To test our expectation, that pre-stimulus data contains predictions referring to post-stimulus input, we set a time window of interest. We therefore extracted the decoding accuracy of the classifier trained between 100ms before and stimulus onset and tested between 80ms and The hypothesis’ were tested using SPSS (IBM). An explorative data analysis was performed and two outlying subjects (4 and 15) were excluded from the data as presented in Figure 2.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2. Explorative data analysis


Excerpt out of 22 pages


Carry-over or prediction? Investigating the predictive coding model using an auditory listening task
University of Salzburg
Catalog Number
ISBN (eBook)
ISBN (Book)
Predictive Coding, Auditory, MEG, Python
Quote paper
Nicolas Neef (Author), 2018, Carry-over or prediction? Investigating the predictive coding model using an auditory listening task, Munich, GRIN Verlag,


  • No comments yet.
Read the ebook
Title: Carry-over or prediction? Investigating the predictive coding model using an auditory listening task

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free