Free online reading

## Contents

1 Introduction

2 Experimental results

2.1 Phase locking

2.2 Reliability with fluctuating stimulus

2.3 Time precision in the LGN

3 Temporal encoding

3.1 Time encoding window

3.2 Stimulus reconstruction

3.3 Distinction between coding strategies

4 Information Theory

4.1 General Principles

4.2 Applying the direct method

4.3 Quantifying time resolution

4.4 Temporal patterns

5 Discussion

5.1 Outlook

A Appendix

A.1 Calculation of the reconstruction filter

A.2 Encoding distinction formula

B Bibliography

## Abstract

The nervous system must encode and process information using the action potentials of neurones. There is considerable controversy as to how this is achieved. Most neuroscientists assume that the rate of discharge of action potentials is all that carries information; however, more recent work suggests that the precise timing of the discharge events could also carry information. The essay will address the theoretical basis for these two contrasting views of neural encoding. It may also address how experimental data can be used to address this problem, since the analysis and interpretation of such data is far from straightforward.

## 1 Introduction

One of the most fundamental questions of neurosciences is how our brain encodes information. This paper aims firstly at clarifying the notions of rate and time coding and secondly at explaining how information theory can quantify the fraction of different coding strategies. This quantification will show the relevance of time coding strategies. Furthermore, we will identify the significant questions with respect to time coding.

Perception, awareness and behavioural output are all represented in spike trains of neurons. Hence, an understanding of the neural code in spike trains is one of the key questions. Moreover, the quality of the neural code can reveal the degree to that biological systems are optimised. Here, we will concentrate on the more constricted question of the nature of the neural code. What kind of complex features are in the spike train which go beyond the mean firing rate? What is their role?

We will start with a short discussion on the concepts of rate and time coding showing their equivalence according to conventional definition. In the second part, we will review experiments indicating the pertinence of precise timing. However, using the stimulus reconstruction analysis technique, we will identify a time encoding which goes beyond precise timing. In the last part, we will use information theory to quantify the coding efficiency with respect to the time resolution. Additionally, we will see that a certain part of the information is in temporal pattern.

Conventionally, the rate of a spike train is defined as counting the spikes in a certain time window. Admittedly, the choice of the time window is ar- bitrary and hence, by construction of small time windows, the measure has a high time resolution. In general, a stimulus s(t) causes a response, the spike train. The spike train is described accurately in arrival times of spikes, t1, t2, . . ., tn or in short {ti}. Using the language of probability, a complete characterization of the neural response is contained in the conditional proba- bility distribution P [{ti}|s(t)] which measures the likelihood that spikes will arrive at the set of time {t1, t2, . . . , tn} given a certain stimulus s(t). Clearly, we can use this as a characterization of a time coding in a reasonable sense. But this conditional distribution is closely related to time dependent rate coding. More properly, defining the counting function

illustration not visible in this excerpt

where

illustration not visible in this excerpt

so that f (x) will count whether a particular spike of the spike train {ti} has occured in the time bin Δt centered on time t. The time dependent rate is obviously defined by dividing the counting function by the width of the time bin Δt and taking the limit after averaging n(t).

illustration not visible in this excerpt

This is a Dirac delta function. Hence,

illustration not visible in this excerpt

For the last step, we observed that we have to account for the randomness of spike trains. This randomness is described by a probability distribution, which is the conditional probability distribution P [{ti}|s(τ )] in this case. We have to integrate over all possible arrival times in order to average over arrival times. Additionally, we sum over all possible number of events N .

The rate is the mean of the conditional distribution which describes arrival times. Therefore the rate is time dependent and hence, can track changes in stimulus parameter with high time precision. However, higher moments of the distribution may carry additional information. Later, we will show how to quantify this extra amount of information. But, we will begin with experimental evidence of ‘time coding’ and continue with a proper distinction between time encoding and time dependent rate coding.

## 2 Experimental results

In this chapter we will present evidence for the importance of precise timing.

### 2.1 Phase locking

Phase locking is a typical example of a time dependent firing rate. The requirement is that the stimulus is periodic in time, i.e. a sinusoidal pressure

Figure 1: Phase locking. The neuron fires with highest probability at high stimulus amplitude Thus, interspike intervals are often integer multiples of the stimulus period as illustrated in the picture. At high frequencies phase locking becomes plurivalent. Source: Darwin, 1994

illustration not visible in this excerpt

wave at the eardrum (see figure 1), which is a pure tone. Each auditory neuron has a characteristic frequency, where the least energy is needed to stimulate it. At the appropriate frequency the neuron firing rate changes with the phase. Hence the probability of firing is highest every 2π. Naturally, this does not mean that the neuron fires every period. In fact, the time dependent firing rate is

r(t) = r0 + Asin(ωt + f )

where the case A > r0 is subject to half wave rectification. Though the mean firing rate (over large time windows) is constant until a certain sound intensity threshold is reached, the firing is phase locked and the frequency is decoded with the timing of the spikes. At low frequencies, the intensity of sound is the same at both ears and location is only decoded by time differ- ences in auditory neurones. At high frequencies, the time differences become ambiguous but shadowing effects give information about the sound location. In this context it is worth to note, that the mammalian ear encodes fre- quency to a certain extent by locus: Specific small frequency ranges activate specific neurons by tectorial membrane deflection. Pitch is encoded in higher cortical regions. Proof of this is that it is sufficient to present successive har- monics to different ears in order to reconstruct the fundamental (Houtsma and Goldstein, 1972). But if only high harmonics (of order > 12) are pre- sented, coincident phases support reconstruction (Houtsma and Smurzynski, 1990). Hence the purpose of phase locking is not only to locate sounds by measuring phase differences but phase locking gives also contribution to the identification of pitch.

### 2.2 Reliability with fluctuating stimulus

Figure 2: Time precision of firing pattern with constant (A) and fluctuating (B) stimulus. The constant stimulus with little information to be encoded shows a irregular firing behaviour. The repeated fluctuating stimulus displays the same response with high reliability. Top: superimposed responses, 10 trials. Bottom: raster plot of spike trains, 25 trials. Source: Mainen and Sejnowski, 1995

illustration not visible in this excerpt

Mainen and Sejnowski (1995) discovered a highly divergent behaviour of a cortical neuron in spiking behaviour. A constant current pulse (stimu- lus) evoked a irregular response with high variability. The same neuron was stimulated by the same stimulus for 25 times but showed significant different spike train patterns (figure 2, A). However, stimulating the neuron repeat- edly with a fluctuating stimulus current (Gaussian white noise = frequency spectrum is continuous and uniform) but applying the same stimulus in all trials, the patterns show with high reliability a high time precision (figure 2, B). This experiment demonstrated that cortical neurons have a consistant be- haviour with time dependent coding strategies. In particular, the fluctuating stimulus is a more natural and realistic condition than the constant pulse. Consequently, at least in this case, neurons work with a higher reliability under natural condition than under simple but unrealistic conditions.

### 2.3 Time precision in the LGN

Figure 3: Temporal coding in the LGN. Different unique stimuli reveal a flat spectrum in the average count per time binwhich is represented in the PSTH. In constrast, the repeated stimulus displays prominent exaltations. Source: Reinagel and Reid, 2000

illustration not visible in this excerpt

The lateral geniculate nucleus (LGN) of the thalamus is part of the vi- sual pathway and forwards visual information to the cortical level (primary visual cortex - V1). For accurate perception, relibale transmission is ex- pected. Reinagel and Reid (2000) analysed the statistical features of LGN spike trains.

Spatially uniform visual stimuli with random time varying luminance were presented, while spike trains from 11 well isolated individual neurons in the LGN of anaesthetized cats were recorded. One white noise stimulus was re- peated 128 times and recordings were compared with recordings from unique stimuli. For this, we count the average number of spikes in bins and draw this information into the peristiumulus time histogram (PSTH). Figure 3 shows the PSTH of unique stimuli at the top and below the PSTH of the repeated stimulus where the shape of the stimulus is drawn thereunder. Some peaks of the lower PSTH have width of 1 ms, hence a certain stimulus is encoded with high time precision.

We will return to the quantitative analysis of this experiment in chapter 4.

## 3 Temporal encoding

Let us now introduce an alternative and mathematically rigorous concept of a time code, called time encoding which was developped by Theunissen and Miller (1995). First of all, we implement the idea of the time encoding window. Based on this, we can distinguish between rate and time encoding by comparing the frequency spectrum of signal and response (spike train). This distinction shows us intrinsic time scales of the neural system and alternative encoding and decoding mechanisms.

### 3.1 Time encoding window

The time encoding window is defined as the duration of a neuron’s spike train which corresponds to a single symbol in the neural code. A signal where the time scale of changes in the stimulus parameter is much longer than the time scale of the behavioral response can be regarded as nearly stationary. In order to increase acuity, the time encoding window should be as large as possible. In this scenario, the time scale of the behavioral response is the limiting factor. One example is the coding of the shape of a stationary object by the visual system.

In contrast, for constantly moving visual stimulus the time scale of variation of stimulus parameter can be shorter than behavioral response or decision time. In this case, the size of the encoding time window is limited by the rate at which the relevant stimuli parameter are changing. In fact, the intrinsic time scale of neural computation is the limiting factor and the dynamic vari- ation in the stimuli can only be encoded up to a certain rate. As an example, a neuron is supposed to encode a 100 Hz component of a signal. Hence one period is 10 ms and you need a time encoding window of 5 ms. By counting the sum of two neighbouring windows we can extrapolate the amplitude. The phase can be measured by the ratio of the counts of these two neighbouring windows. In general, this statement is called Nyquist theorem.

For each spike, we can determine the preceding stimulus waveform. By av- eraging about many preceding stimulus waveforms, we construct by defini- tion the reverse correlation function. The duration of the reverse correlation function corresponds to the integration window which is defined as the time preceding a certain point in the response pattern in which a change in the signal significantly affects this response pattern. Physiological integration processes are responsible for this integration window. Taking the 100 Hz signal from above, the reverse correlation function is several periods of the signal, but the encoding time window is only half a period. We see that the encoding time window can be much shorter than the integration window. In fact, the integration time gives an upper bound on the time limit of cor- relation between variation of the stimulus and the response function. The integration time can be regarded as electrophysiological reaction time, and hence, the encoding time window is always smaller then the integration time window.

For further analysis of the meaning of the encoding time window, we need the method of stimulus reconstruction which is a recently developed tool of stochastic system analysis.

3.2 Stimulus reconstruction

In general, the methods from stochastic systems analysis give a relationship between stimulus and response.

Natural stimuli have many different frequency components and each of them might be encoded in corresponding encoding time windows. The analysis gets very complex as each spike could be part of all different encoding time windows. The conventional approach of stochastic system analysis is to con- struct an operator which transforms the signal into the response. Usually, the response is given in the form of the probability distribution of the spike train given the stimulus P [{ti}|s(t)]. Of course, for this construction, we have to average over many trials. But the organism has only one single spike train as a sample and has to decide on this basis. We would like to know more about the prediction power of this single spike train. Furthermore, which part of the frequency spectrum of the spike train is relevant for reconstructing the stimulus? If we don’t know the encoding model, the reverse approach must be taken. This approach was developed by Bialek et al (1991). The aim is to transform the spike train into an optimal estimate of the stimulus. The idea is that we treat the neural system as a black box.

We write the estimate of the stimulus sestin the linear form

illustration not visible in this excerpt

where h1(τ ) is the linear response function. More general,

illustration not visible in this excerpt

and hn(τ ) associates the output (response) to the input (stimulus) in nth order. Optimising the estimate is done by calculating the minimum of the error function

illustration not visible in this excerpt

The solution of this problem is sketched in Appendix A.1.

In particular, with this method we can analyze the encoding of each frequency component of the stimulus independently from all other frequency components and hence, solve the problem of the multiple overlapping of the encoding time windows. To see this we transform the optimal estimate we found in A.1 into the frequency spectrum (see A.2). In this description, we can quantify the contribution of each part of the spike train’s frequency spectrum for each frequency component of the signal.

### 3.3 Distinction between coding strategies

The distinction between rate and temporal encoding is based on the com- parison between the frequency spectra of signal and response. From the discussion above, we know that each stimulus frequency needs a proper en- coding time window. The size of this encoding time window is half of the corresponding period, e.g. 5 ms for a 100 Hz component.

If a certain frequency component of the signal is only related with the mean spike count of the correlated encoding time window, we call this rate encod- ing. Using the stimulus reconstruction, rate encoding is defined as follows: each frequency component of the signal can only be composed of terms that involve the same or lower frequency components of the spike train response pattern. From the point of view of the encoding distinction formula (A.2) this means, that the term of response pattern’s higher frequency components gives no significant contribution to the frequency component of the optimal stimulus estimate. Evidently, the number of spikes in each encoding time window does not have to correspond linearly with the stimulus parameter, and hence, in general we have a nonlinear rate encoding scheme.

In contrast, a temporal encoding scheme is one with additional correlation between frequency component of the stimulus and patterns on a time scale less than the encoding time window in the response function. In this case, the second part of the distinction formula of the optimal estimate in A.2 gives significant contribution to the description of the stimulus parameter. In other words, the time dimension of the spike train is really used for in- formation encoding and does not only represent the time scale of stimulus changes. In general, information which is not temporal in nature but also in- formation temporal in nature can be encoded using the temporal dimension. The second possibility would imply a rescaling of time scales.

An interesting example of these encoding schemes is described by Lemon and Getz (2000). In the cockroach periplaneta americana olfactory information is represented by a short time scale rate code in olfactory sensory neurons. A relatively few number of projection neurons carry this information to higher cortical areas. Surprisingly, spike train analysis suggest that these projec- tion neurons carry a higher rate of information by using a temporal encoding scheme.

We emphasize that high temporal precision is not equal to temporal encod- ing. High temporal precision in the spike train instead can mean encoding of high frequency components of the stimulus. The phase locking example (2.1) is clearly a rate code as temporal patterns in time scales of less than the period corresponding to the frequency of the stimulus do not occur. In contrast, the example with data from LGN neurons (2.3) has high time res- olution up to 1 ms whereas the stimulus were updated only every 7.8 ms. Consequently, we have now discovered a time encoding scheme. Nevertheless, one should also take into account that sharp edges in luminance occur at the frame boundaries. These edges correspond to high frequency components in the frequency domain. In any case, we will see below, that even temporal encoding cannot account for all information in this data.

Of course, following the stimulus changes with high time resolution reduces the size of the encoding time window. When taking into account the intrinsic size of an action potential (∼ 1ms) and the absolute refractory period (∼ 1ms) variations of temporal patterns inside the small encoding time window are not anymore noticeable.

In ensembles of cells spike train patterns are not limited by refractory periods and in fact, the number of possible patterns is much greater. Experiments conducted on behaving monkeys (Abeles, 1993) and on anesthetized cats (Engel, 1992) suggest temporal encoding in spike train patterns. Regardless of this most interesting topic, we continue our review with the analysis of the single neuron’s behaviour.

## 4 Information Theory

How can the vague notion of information be described in mathematical terms? Also, how can the content of information of a specific spike train be evaluated? First of all, I will give a rough answer to those questions. Subsequently, I will introduce the term of mutual information. We will use the direct method to estimate the information of the spike trains measured in the LGN (see 2.3). We will see how the coding efficiency is dependent on the time resolution. This will give us a quantitative estimate of the significance of precise timing. Moreover, we will also quantify the information coded in temporal patterns.

### 4.1 General Principles

The foundations of information theory were developed by Shannon in 1948. He derived a measure of uncertainty or entropy. This corresponds to infor- mation as information is, roughly speaking, a decrease in uncertainty.

Each spike train {ti} has a probability P [{ti}] of being observed. A neuron has a certain set of possible spike trains. When we measure a spike train, we gain information proportional to the ‘surprise’ of observing this particular spike train out of the set of all possible spike trains. Hence we expect entropy and information to be a decreasing function of P [{ti}]. Additionally, when recording two spike trains {ti1} and {ti2} from two independent neurons, we expect the gain of information to be additive. The probability of observing these two spike train is P [{ti1}]P [{ti2}] and therefore the additivity condition gives for the information I

I(P [{ti1}]P [{ti2}]) = I(P [{ti1}]) + I(P [{ti2}])

But the logarithm is defined by monotony and additivity, thence, we described the entropy completely but ignored arbitrary constant and base (which can be summarized in one constant). Conventionally, information is defined in units of bits

I(P [{ti}]) = −log2(P [{ti}])

Figure 4: Binning. A binary string code is attributed to a spike train. This is one approach to make the neural response measurable. Source: Strong et al, 1998

illustration not visible in this excerpt

and the entropy according to Shannon is this measure, averaged over all

possible responses ∑

illustration not visible in this excerpt

In theory, the range of all possible spike trains which is the entropy would give us the information capacity of a neuron. But we observed already in experiments, that a repeated stimulus does not lead to exactly the same reponse every time but that in fact, we have a certain variability in spike trains. This variability is called noise and limits the information capacity of a neuron. Therefore, we can find the true information capacity Imby subtracting the entropy of the noise Hnoise from the full response entropy H

Im = H − Hnoise

The quantity Im is also called mutual information. The noise entropy is calculated easily by averaging over all possibe responses at given stimulus s

illustration not visible in this excerpt

For practical computing of this quantities, we have to write down all mea- sured spike trains. One method is to divide the time axis into small time bins of size Δτ . Whenever a spike occurs in a particular time bin, the value 1 is assigned to this bin. Otherwise the bin is labeled with the value 0 (see figure 4).

This characterization of the entropy is not only dependent on the time resolution Δτ but also on length T of the spike train being considered. The fundamental problem of finding the mutual information is the large amount of data needed to specify the relevant probability distributions. Recently, Strong et al (1998) explained how to achieve a good approximation of entropy and mutual information (see below).

### 4.2 Applying the direct method

Three general approaches can help to estimate the information in spike trains, giving us a lower bound, an upper bound and a direct estimate of the infor- mation.

The first approach was described by Bialek et al (1991) and is extensively discussed in Rieke et al (1997). They derived a lower bound of the informa- tion Rinfo which is based on the signal to noise ratio (SNR) of the stimulus estimate:

illustration not visible in this excerpt

This procedure relies on the stimulus reconstruction method described above which gives us an optimal stimulus estimate. As the stimulus estimate is derived from the response, the stimulus estimate contains less information about the stimulus than the response. This is called the data processing inequality theorem. Hence by calculating the information in the stimulus estimate we derive a lower bound for the information in the response. For the signal to noise ratio, we take sest as the signal and n = s − sest as the noise.

Second, when assuming that neuronal response and neuronal noise are inde- pendent and both have Gaussian distributed, we can obtain an upper bound of the information. Here, the mean neuronal response is taken as signal, the deviations of each indivudual response from the mean is the noise. By this, we can use the formula above again. The information obtained is an upper bound because a Gaussian distribution has the maximal possible entropy. Particularly, the ratio between lower and upper bound quantifies the qual- ity of the model we used to derive the optimal estimate. Moreover, we can identify those stimulus parameters (frequencies) which are encoded preferen- tially (using the Fourier transform of sest, A.2). For derivations of formulas see Rieke et al (1997) and Borst et Theunissen (1999).

Nonetheless, here we will concentrate on the direct method as specified by Strong et al (1998), which measures the information directly. This approach is simpler as we don’t need to identify relevant stimulus parameter and can use spike train statistics only. The direct method is more satisfying than deriving only boundaries as it gives the correct information measure. On the other hand, it can be difficult to accumulate a sufficient amount of data.

Figure 5: Entropy rate against the reciprocal word length. At high word length, the entropy rate estimation breaks down due to insufficient data. The true entropy rate is obtained by extrapolating the more reliable data. Source: Reinagel and Reid, 2000

illustration not visible in this excerpt

The idea behind the estimate of the information crucially depends on the construction of spike train words where the binary states of the time bins correspond to the single letters. T is still the entire duration of the spike train and is assumed to be very large. In addition, we define Q(t) which is sequence of length L with L/Δτ zeros and ones. Q(t) is called a word. Here, t denotes the time of the first bin of the word Q. The probability that a word Q occurs at any time during the entire spike train is P(Q). We would like to measure the information independent of the length of the sequence and hence, introduce the information rate H = H/L. Then

illustration not visible in this excerpt

which we call the word length dependent entropy estimate.

However, if there is any correlation between successive intervals, e.g. between Q(t) and Q(t + L), then part of the information of Q(t + L) can be predicted by Q(t) and vice versa. Hence, our measure includes redundant information and our calculation of H gives an upper bound of the information rate. Ad- mittedly, the upper bound depends on L and with L → ∞ the redundancy gets less and less important. Basically, we can forget about ‘neighbouring’ effects because boundaries of a d dimensional object grow with d−[[1]] but a word behaves like a string which is a one dimenionsal object. On the other hand, with larger L we need more data in order to specify the probability distribution P (Q) properly.

The following explicit calculation are based on data from the LGN neurons as described in 2.3. In figure 5, the entropy rate H(Δτ = 0.6ms, L) has been calculated for different lengths of words and drawn against 1/L. We can observe a proportionality of the entropy rate to 1/L for small L.When L grows larger than 12 ms (which corresponds to 20 bins 0.6 ms wide) the dependence changes due to the sampling problem: there is not enough data available. But we can extrapolateS[illustration not visible in this excerpt]

illustration not visible in this excerpt

where CΔτ is a constant. With inserting the entropy estimates with suf- ficient data sizes (=when [[1]]/L is sufficiently large) we can extrapolate the entropy estimate to infinite large word length and thus we can find the true entropy S(Δτ ).

For HNoise, one can apply the same method using data from repeated trials with the identic white noise stimulus. The difference between the extrapo- lated entropy and the extrapolated noise entropy gives the mutual informa- tion which is in the LGN spike trains 102 bits/s . This is an extraordinary high rate of information compared with other spike train analyses. This means that the cat can distinguish between binary signals after 10 ms of spike train.

### 4.3 Quantifying time resolution

We were particularly interested in a quantitative measurement of ‘conven- tional’ time coding. With information theory as a tool, we can give a precise

Figure 6: Time resolution. We estimate the information rate for different time resolutions as explained above.The mean rate over time windows of 64 ms carries small but significant information. Observing that the information rate does not level off until a bin size of at least 0.6 ms shows the significance of high time resolution. Source: Reinagel and Reid, 2000

illustration not visible in this excerpt

answer now. For different time resolutions, we compute several estimates as as function of word length L and then extrapolate to infinity length of words (figure 6). As expected, we get more information with increased time resolu- tion. For data sampling reasons, the smallest bin size was 0.6 ms only, and we cannot state at what time resolution the information rate plateaus. But even this 0.6 ms time resolution implies that timing is more precise than the smallest interspike interval (here, the absolute refractory period is estimated to be ∼ 2.7 ms).

### 4.4 Temporal patterns

As we observe refractory periods and bursts, there is certainly temporal struc- ture in spike trains. But what is the quantitative relevance of these patterns in coding?

In the entropy estimation above, we considered long words of length L which include temporal structure. In contrast, if we estimate the information with L=1, we assume independence between bins. In fact, this information (L=1) corresponds to all information which is contained in the peri stimulus time histogram (PSTH). But as the entropy rate changes with word length, the in-

Figure 7: The mutual information rate is drawn against the reciprocal word length. The true data reveal that mutual information increases with word length which means, that temporal patterns are significant. The temporal patterns are also en- coded locally: scrambling the time bins makes the mutual information nearly in- dependent from the word length. Because of higher time precision, the scrambled model carries more information than the Poisson model. Source: Reinagel and Reid, 2000

illustration not visible in this excerpt

dependence assumption is certainly wrong. Fortunately, we can quantify the dependence easily (Reid and Reinagel, 2000) by introducing a new quantity Z, which evaluates the information in temporal patterns. Z is the difference between the total entropy and the entropy under the independence assump- tion.

illustration not visible in this excerpt

The data from the LGN recordings give Z(Δτ = 0.6ms) = 25 bits/s. From 11 cells 9 cells had a positive Z value. This could be explained by an external noise source which has a long enough time scale to affect more than one spike. Alternatively, electrophysical effects of one spike like the refractory period could affect the timing of the following spike.

On the other hand, one cell had a significant negative Z value. This coincides with prevalent occurence of bursts. But bursts are a very stereotyped structures and hence compromise redundancy (Z < 0).

With a total mutual information of 102 bits/s, Z(Δτ = 0.6ms) = 25 bits/s implies that one quarter of the information is in temporal pattern, whereas three quarters can be estimated only evaluating the PSTH. One can also prove that the information in local pattern is local. For this the words were not composed from L/Δτ succeding bins but from bins which were seperated in time. As shown in figure 7 (scrambled), the estimate of the information rate depended only very weakly on the word length in this case. Taking the real data from the PSTH of the neuron, we can generate spike train according to a time dependent Poisson process which gives the same temporal precision. Here the estimate of information is independent of L, so Z = 0 (figure 7). But as it can be observed from the figure, even if the cell does not encode temporal pattern (L=1, scrambled), the information rate is higher than predicted by the Poisson model. In fact, only part of the discrep- ancy between Poisson model and real data is due to temporal pattern. The other part can be explained by the exact spiking which is more precise than expected from a Poisson model. More accurately, the ratio between variance and mean spike count is considerably lower than 1 which is the characteristic of a Poisson process.

## 5 Discussion

In our quest to understand the exact meaning of coding schemes, we finally arrive at a stage where can we distinguish between different classes of coding schemes.

1. rate encoding

2. temporal encoding

3. temporal pattern encoding

Refering to Theunissen and Miller (1995), we defined rate encoding as a scheme where a given frequency component of a stimulus is completely de- scribed by the same or lower freqeuncy components in the spike train spec- trum. In contrast, in the temporal encoding scheme higher frequency com- ponents of the spike train spectrum were needed to characterize a given frequency component of the stimulus. For these two definitions, the stimu- lus reconstruction method allowed a proper distinction between signal and response spectra.

The language of information theory revealed a third coding scheme. Introducing words describing part of spike trains allowed us to estimate entropy and information rate. Even more, we were able to distinguish between the total entropy and the entropy assuming independence between neighbouring bins (the word length is only one bin long). The difference, labeled Z, is that part of the information which is in temporal pattern.

As discussed above, part of the data from the LGN recordings is temporal en- coded. However, temporal encoding can be described by a time varying rate of arbitrary high time resolution. This time varying rate is usually drawn in form of a peri stimulus time histogram (PSTH). But as shown above, tempo- ral pattern cannot be characterized by information contained in a PSTH. As the temporal pattern evalutation goes beyond the conventional PSTH, this method is supposed to be more powerful than the more technical temporal time encoding distinction. Rather we understand the purpose of the tempo- ral time encoding procedure in clarifying the landscape of coding schemes. Historically, the term time coding referred to a diversity of phenomena. In particular, high time resolution and temporal pattern coding were both de- scribed by temporal coding. We identified two ‘true’ temporal coding schemes with temporal encoding and temporal pattern coding. High time resolution can be time dependent rate coding only. Nonetheless, we were also able to quantify time resolution (see figure 7).

In a more extensive discussion, natural time scales like the mean interspike interval, refractory periods and the integration window would deserve more attention. Explaining the role of of information theory in spike train analy- sis would need a more thorough treatment of upper and lower bounds with respect to all kind of assumptions (Borst and Theunissen, 1999). Instead, we concentrated on a few benchmarks hoping that this will be sufficient for an introduction into this topic.

### 5.1 Outlook

We investigated certain aspects of the behaviour of the single neuron’s spike train. However, Victor (2002) suggests an alternative method to estimate information of spike trains. Instead of binning, he preserves the topological structure of spike trains and the entropy estimate is based on the distance to the closest neighbor spike. Numerical results indicate that this approach is more robust and more rapidly converging than conventional binning.

The true challenge is the understanding of coding in ensembles of neurons. There is a controversy over the contribution of time pattern in ensembles of neurons to information representation. Some suggest that correlation carries significant information. Riehle et al (1997) observe a synchronization of individual spikes during stimulus expectancy and real performance in the motor cortex of monkeys. However, Shadlen and Newsome (1998) suggest that high variability in spike trains of cortical neurons allow only ensemble rate coding. Panzeri et al (1999) present an approach towards quantifying the correlation between spike trains on short time scales. For this, they expand the expression of the mutual information and break down the second order term in three parts. One represents only a firing rate term, another shows noise dependent correlations and the last presents stimulus dependent correlations. Results show that most information is carried with the firing rate only. Other approaches beyond information theory exist, but this area is still vastly unexplored. As methods and applications have been developed only over the last few years, this area promises further insights.

## A Appendix

### A.1 Calculation of the reconstruction filter

Our task is to find the optimal estimate sest of the stimulus s. In general, we can write the estimate as an expansion of functionals, called Volterra series.

illustration not visible in this excerpt

Here, the input x is the response function of the neuron (the spike train) [illustration not visible in this excerpt]. Of course, the formula above is sufficiently general and we could write output y instead of sest. We expand all filters in power series of causal functions, e.g.

illustration not visible in this excerpt

and causality is preserved by fk(τ ) = 0 for τ < 0. Hence,

illustration not visible in this excerpt

We optimize our estimate by minimizing the mean square error of the difference between stimulus and stimulus estimate

illustration not visible in this excerpt

Here, s is the observed stimulus. We introduced the delay time τdelay to pay regard to the finite time between stimulus and response, respectively stim- ulus estimate. The description of the kernels is complete with the charace- terization of the coefficients α. We start with the linear coefficients only but generalize then.

illustration not visible in this excerpt

With

illustration not visible in this excerpt

one can write

illustration not visible in this excerpt

By inversion this gives the solution

illustration not visible in this excerpt

However, we can easily generalize to

illustration not visible in this excerpt

where the vector α is

α = (α1,α2,...,α11,α12,...,α22,...) ,

and the vector R

illustration not visible in this excerpt

Of course, for computational purposes the series expansion must be truncated after a finite number of terms. The goodness of the results can be checked by comparing them with the acausal filter. The acausal filter is derived in a similar way and is the optimal linear approximation. For further explaination see A.8.1 in Rieke et al (1997)

### A.2 Encoding distinction formula

We transform the stimulus estimate into its frequency spectrum. Afterwards, we will be able to tell which part of the response spectrum takes part in the construction of a specific component of the stimulus estimate. Hereby, we can characterize the nature of encoding.

We know

illustration not visible in this excerpt

with

illustration not visible in this excerpt

deriving the Fourier transform of each kernel, e.g.

illustration not visible in this excerpt

and executing the inverse Fourier transform F−[[1]] yields (1). However, we can evaluate (2) at one specific frequency and get

illustration not visible in this excerpt

Substituting ω2 = ω − ω1 and similarly for higher terms, one get

illustration not visible in this excerpt

We can group together the right hand side of (3) into the lower frequency component up to ω and the upper frequency component above ω.

illustration not visible in this excerpt

where the first group (e.g. labeled srateest (ω)) is sufficent for rate encoding while the second group stemp+est (ω) is needed in addition to characterize temporal encoding schemes. Therefore,

sest(ω) = srateest (ω) + stemp+est (ω)

## B Bibliography References

Comments

This paper is written as an essay for part III in mathematics. The main difficulty was defining the problem. After working through several books (Rieke et al., Dayan and Abbot, Gerstner) and consulting Dr Baker I singled out the question of time coding. The most important papers (which I found) were those of Theunissen and Miller (1995), Strong et al. (1998), Borst and Theunissen (1999) and Reinagel and Reid (2000). The derivation in A.1 includes some own arguments but is related to A.8.2 in Rieke et al (1997). The derivation in A.2 is similar to Theunissen and Miller (1995).

Acknowledgment

I would like to express gratitude to my supervisor Dr. Stuart Baker whose guidance was crucial for the successful completion of this project.

1 Abeles, M., H. Bergman, E. Margalit and E. Vaadia (1993). Spatiotem- poral firing patterns in the frontal cortex of behaving monkeys, J. Neurophys. 70:1629-1638

2 Bialek, W., F.Rieke, R.R. Van Steveninck and D. Warland (1991). Reading a neural code, Science 252, 1854-1857

3 Borst, A. and F.E. Theunissen (1999). Information theory and neural coding, Nature Neurosci. 2(11), 947-957

4 Darwin, C. (1994). Perception: Ear and Auditory Nerve, www.biols.susx.ac.uk/home/Chris Darwin

5 Engel, A.K., P. Knig, A.K. Kreiter, T.B. Schillen and W. Singer (1992). Temporal coding in the visual cortex: New vistas on inte- gration in the nervous system. TINS 155:218-226

6 Houtsma, A.J.M. and J.L. Goldstein (1972). The central origin of the pitch of complex tones: Evidence from musical interval recognition, J. Acoustical Soc. of America, 51, 520-529

7 Houtsma, A.J.M. and J. Smurzynski (1990). Pitch identification and discrimination for complex tones with many harmonics, J. Acoustical Soc. of America, 87, 304-310

8 Lemon, W. and W. Getz (2000). Rate code input produces temporal code output from cockroach antennal lobes, Biosystems 58, 151-158

9 Mainen, Z.F. and T.J.Sejnowski (1995). Reliability of spike timing in neocortical neurons, Science 268, 1503-1506

10 Panzeri, S., S.R. Schultz, A. Treves and E.T.Rolls (1999). Correlations and the encoding of information in the nervous system, Proceedings of the Royal Society B

11 Reinagel, P. and R.C. Reid (2000). Temporal coding of visual infor- mation in the thalamus, J. Neurosci. 20(14) 5392-5400

12 Riehle, A., S. Gruen, M. Diessman and A. Aertsen (1997). Spike sy- chronization and rate modulation differentially involved in motor cortical function, Science 278: 1950-1953

13 Rieke, F., D. Warland, R.R. de Ruyter van Steveninck and W. Bialek (1997). Spikes: exploring the neural code. Cambridge, MA:MIT

14 Shadlen, M.N. and W.T.Newsome (1998). The variable discharge of cortical neurons: implications for connectivity, computation and coding, J. Neurosci. 18(10):3870-3896

15 Shannon, C.E. (1948). A mathematical theory of communication, Bell Sys. Tech. J.27, 379-423, 623-656 (republished at cm.bell-labs.com)

16 Strong, S.P., R. Koberle, R.R. de Ruyter van Steveninck and W. Bialek (1998). Entropy and information in neural spike trains, Phys Rev Lett 80:197-200

17 Theunissen, F. and J.P.Miller (1995). Temporal encoding in nervous systems - a rigorous definition, J. Comput. Neurosci. 2:149-162

18 Victor, J.D. (2002). Binless Stategies for estimation of information from neural data, Phys Rev E66,051903

- Quote paper
- Creutzig, Felix (Author), 2003, Information Encoding in Small Neural Systems, Munich, GRIN Verlag, https://www.grin.com/document/107825

Publish now - it's free

Comments