Information Encoding in Small Neural Systems
Felix Creutzig
April 8, 2003
Contents
1
Introduction
1
2
Experimental results
2
2.1 Phase locking . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2.2 Reliability with fluctuating stimulus . . . . . . . . . . . . . . .
4
2.3 Time precision in the LGN . . . . . . . . . . . . . . . . . . . .
5
3
Temporal encoding
6
3.1 Time encoding window . . . . . . . . . . . . . . . . . . . . . .
6
3.2 Stimulus reconstruction . . . . . . . . . . . . . . . . . . . . . .
7
3.3 Distinction between coding strategies . . . . . . . . . . . . . .
8
4
Information Theory
10
4.1 General Principles . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Applying the direct method . . . . . . . . . . . . . . . . . . . 12
4.3 Quantifying time resolution . . . . . . . . . . . . . . . . . . . 14
4.4 Temporal patterns . . . . . . . . . . . . . . . . . . . . . . . . 15
5
Discussion
17
5.1 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
A Appendix
19
A.1 Calculation of the reconstruction filter . . . . . . . . . . . . . 19
A.2 Encoding distinction formula . . . . . . . . . . . . . . . . . . . 21
B Bibliography
22
i
Abstract
The nervous system must encode and process information using the action
potentials of neurones. There is considerable controversy as to how this is
achieved. Most neuroscientists assume that the rate of discharge of action
potentials is all that carries information; however, more recent work suggests
that the precise timing of the discharge events could also carry information.
The essay will address the theoretical basis for these two contrasting views
of neural encoding. It may also address how experimental data can be used
to address this problem, since the analysis and interpretation of such data is
far from straightforward.
1
Introduction
One of the most fundamental questions of neurosciences is how our brain
encodes information. This paper aims firstly at clarifying the notions of
rate and time coding and secondly at explaining how information theory can
quantify the fraction of different coding strategies. This quantification will
show the relevance of time coding strategies. Furthermore, we will identify
the significant questions with respect to time coding.
Perception, awareness and behavioural output are all represented in spike
trains of neurons. Hence, an understanding of the neural code in spike trains
is one of the key questions. Moreover, the quality of the neural code can
reveal the degree to that biological systems are optimised. Here, we will con-
centrate on the more constricted question of the nature of the neural code.
What kind of complex features are in the spike train which go beyond the
mean firing rate? What is their role?
We will start with a short discussion on the concepts of rate and time cod-
ing showing their equivalence according to conventional definition. In the
second part, we will review experiments indicating the pertinence of precise
timing. However, using the stimulus reconstruction analysis technique, we
will identify a time encoding which goes beyond precise timing. In the last
part, we will use information theory to quantify the coding efficiency with
respect to the time resolution. Additionally, we will see that a certain part
of the information is in temporal pattern.
Conventionally, the rate of a spike train is defined as counting the spikes
in a certain time window. Admittedly, the choice of the time window is ar-
bitrary and hence, by construction of small time windows, the measure has
a high time resolution. In general, a stimulus s(t) causes a response, the
spike train. The spike train is described accurately in arrival times of spikes,
t
1
, t
2
, . . . , t
n
or in short {t
i
}. Using the language of probability, a complete
characterization of the neural response is contained in the conditional proba-
bility distribution P [{t
i
}|s(t)] which measures the likelihood that spikes will
arrive at the set of time {t
1
, t
2
, . . . , t
n
} given a certain stimulus s(t). Clearly,
we can use this as a characterization of a time coding in a reasonable sense.
But this conditional distribution is closely related to time dependent rate
coding. More properly, defining the counting function
n(t) =
i
f
t - t
i
t
1
where
f (x) = 1 if
-1
2
x
1
2
f (x) = 0
otherwise
so that f (x) will count whether a particular spike of the spike train {t
i
} has
occured in the time bin t centered on time t. The time dependent rate is
obviously defined by dividing the counting function by the width of the time
bin t and taking the limit after averaging n(t).
r[t; s( ] lim
t
1
t
n(t)
This is a Dirac delta function. Hence,
r[t; s( )] =
i
(t - t
i
)
=
N
T
0
dt
1
T
0
dt
2
. . .
T
0
dt
N
P [{t
i
}|s( )]
N
i
=1
(t - t
i
)
For the last step, we observed that we have to account for the randomness
of spike trains. This randomness is described by a probability distribution,
which is the conditional probability distribution P [{t
i
}|s( )] in this case. We
have to integrate over all possible arrival times in order to average over arrival
times. Additionally, we sum over all possible number of events N .
The rate is the mean of the conditional distribution which describes arrival
times. Therefore the rate is time dependent and hence, can track changes
in stimulus parameter with high time precision. However, higher moments
of the distribution may carry additional information. Later, we will show
how to quantify this extra amount of information. But, we will begin with
experimental evidence of `time coding' and continue with a proper distinction
between time encoding and time dependent rate coding.
2
Experimental results
In this chapter we will present evidence for the importance of precise timing.
2.1
Phase locking
Phase locking is a typical example of a time dependent firing rate. The
requirement is that the stimulus is periodic in time, i.e. a sinusoidal pressure
2
Figure 1:
Phase locking. The neuron fires with highest probability at high stimulus
amplitude Thus, interspike intervals are often integer multiples of the stimulus
period as illustrated in the picture. At high frequencies phase locking becomes
plurivalent. Source: Darwin, 1994
wave at the eardrum (see figure 1), which is a pure tone. Each auditory
neuron has a characteristic frequency, where the least energy is needed to
stimulate it. At the appropriate frequency the neuron firing rate changes with
the phase. Hence the probability of firing is highest every 2. Naturally, this
does not mean that the neuron fires every period. In fact, the time dependent
firing rate is
r(t) = r
0
+ Asin(t + f )
where the case A > r
0
is subject to half wave rectification. Though the
mean firing rate (over large time windows) is constant until a certain sound
intensity threshold is reached, the firing is phase locked and the frequency is
decoded with the timing of the spikes. At low frequencies, the intensity of
sound is the same at both ears and location is only decoded by time differ-
ences in auditory neurones. At high frequencies, the time differences become
ambiguous but shadowing effects give information about the sound location.
In this context it is worth to note, that the mammalian ear encodes fre-
quency to a certain extent by locus: Specific small frequency ranges activate
specific neurons by tectorial membrane deflection. Pitch is encoded in higher
cortical regions. Proof of this is that it is sufficient to present successive har-
monics to different ears in order to reconstruct the fundamental (Houtsma
3
and Goldstein, 1972). But if only high harmonics (of order > 12) are pre-
sented, coincident phases support reconstruction (Houtsma and Smurzynski,
1990). Hence the purpose of phase locking is not only to locate sounds by
measuring phase differences but phase locking gives also contribution to the
identification of pitch.
2.2
Reliability with fluctuating stimulus
Figure 2:
Time precision of firing pattern with constant (A) and fluctuating (B)
stimulus. The constant stimulus with little information to be encoded shows a
irregular firing behaviour. The repeated fluctuating stimulus displays the same
response with high reliability. Top: superimposed responses, 10 trials. Bottom:
raster plot of spike trains, 25 trials. Source: Mainen and Sejnowski, 1995
Mainen and Sejnowski (1995) discovered a highly divergent behaviour of
a cortical neuron in spiking behaviour. A constant current pulse (stimu-
lus) evoked a irregular response with high variability. The same neuron was
stimulated by the same stimulus for 25 times but showed significant different
spike train patterns (figure 2, A). However, stimulating the neuron repeat-
4
edly with a fluctuating stimulus current (Gaussian white noise = frequency
spectrum is continuous and uniform) but applying the same stimulus in all
trials, the patterns show with high reliability a high time precision (figure 2,
B). This experiment demonstrated that cortical neurons have a consistant be-
haviour with time dependent coding strategies. In particular, the fluctuating
stimulus is a more natural and realistic condition than the constant pulse.
Consequently, at least in this case, neurons work with a higher reliability
under natural condition than under simple but unrealistic conditions.
2.3
Time precision in the LGN
Figure 3:
Temporal coding in the LGN. Different unique stimuli reveal a flat
spectrum in the average count per time binwhich is represented in the PSTH. In
constrast, the repeated stimulus displays prominent exaltations. Source: Reinagel
and Reid, 2000
The lateral geniculate nucleus (LGN) of the thalamus is part of the vi-
sual pathway and forwards visual information to the cortical level (primary
visual cortex - V1). For accurate perception, relibale transmission is ex-
pected. Reinagel and Reid (2000) analysed the statistical features of LGN
spike trains.
Spatially uniform visual stimuli with random time varying luminance were
presented, while spike trains from 11 well isolated individual neurons in the
5
LGN of anaesthetized cats were recorded. One white noise stimulus was re-
peated 128 times and recordings were compared with recordings from unique
stimuli. For this, we count the average number of spikes in bins and draw this
information into the peristiumulus time histogram (PSTH). Figure 3 shows
the PSTH of unique stimuli at the top and below the PSTH of the repeated
stimulus where the shape of the stimulus is drawn thereunder. Some peaks
of the lower PSTH have width of 1 ms, hence a certain stimulus is encoded
with high time precision.
We will return to the quantitative analysis of this experiment in chapter 4.
3
Temporal encoding
Let us now introduce an alternative and mathematically rigorous concept
of a time code, called time encoding which was developped by Theunissen
and Miller (1995). First of all, we implement the idea of the time encoding
window. Based on this, we can distinguish between rate and time encoding by
comparing the frequency spectrum of signal and response (spike train). This
distinction shows us intrinsic time scales of the neural system and alternative
encoding and decoding mechanisms.
3.1
Time encoding window
The time encoding window is defined as the duration of a neuron's spike train
which corresponds to a single symbol in the neural code. A signal where the
time scale of changes in the stimulus parameter is much longer than the time
scale of the behavioral response can be regarded as nearly stationary. In
order to increase acuity, the time encoding window should be as large as
possible. In this scenario, the time scale of the behavioral response is the
limiting factor. One example is the coding of the shape of a stationary object
by the visual system.
In contrast, for constantly moving visual stimulus the time scale of variation
of stimulus parameter can be shorter than behavioral response or decision
time. In this case, the size of the encoding time window is limited by the rate
at which the relevant stimuli parameter are changing. In fact, the intrinsic
time scale of neural computation is the limiting factor and the dynamic vari-
ation in the stimuli can only be encoded up to a certain rate. As an example,
a neuron is supposed to encode a 100 Hz component of a signal. Hence one
period is 10 ms and you need a time encoding window of 5 ms. By counting
the sum of two neighbouring windows we can extrapolate the amplitude. The
phase can be measured by the ratio of the counts of these two neighbouring
6
windows. In general, this statement is called Nyquist theorem.
For each spike, we can determine the preceding stimulus waveform. By av-
eraging about many preceding stimulus waveforms, we construct by defini-
tion the reverse correlation function. The duration of the reverse correlation
function corresponds to the integration window which is defined as the time
preceding a certain point in the response pattern in which a change in the
signal significantly affects this response pattern. Physiological integration
processes are responsible for this integration window. Taking the 100 Hz
signal from above, the reverse correlation function is several periods of the
signal, but the encoding time window is only half a period. We see that the
encoding time window can be much shorter than the integration window.
In fact, the integration time gives an upper bound on the time limit of cor-
relation between variation of the stimulus and the response function. The
integration time can be regarded as electrophysiological reaction time, and
hence, the encoding time window is always smaller then the integration time
window.
For further analysis of the meaning of the encoding time window, we need
the method of stimulus reconstruction which is a recently developed tool of
stochastic system analysis.
3.2
Stimulus reconstruction
In general, the methods from stochastic systems analysis give a relationship
between stimulus and response.
Natural stimuli have many different frequency components and each of them
might be encoded in corresponding encoding time windows. The analysis
gets very complex as each spike could be part of all different encoding time
windows. The conventional approach of stochastic system analysis is to con-
struct an operator which transforms the signal into the response. Usually, the
response is given in the form of the probability distribution of the spike train
given the stimulus P [{t
i
}|s(t)]. Of course, for this construction, we have to
average over many trials. But the organism has only one single spike train as
a sample and has to decide on this basis. We would like to know more about
the prediction power of this single spike train. Furthermore, which part of
the frequency spectrum of the spike train is relevant for reconstructing the
stimulus? If we don't know the encoding model, the reverse approach must
be taken. This approach was developed by Bialek et al (1991). The aim is
to transform the spike train into an optimal estimate of the stimulus. The
idea is that we treat the neural system as a black box.
7
We write the estimate of the stimulus s
est
in the linear form
s
est
(t) =
d h
1
( )
N
i
=1
(t - t
i
- )
=
N
i
=1
h
1
(t - t
i
)
where h
1
( ) is the linear response function. More general,
s
est
= s(t) =
i
h
1
(t - t
i
) +
ij
h
2
(t - t
i
, t - t
j
) + . . .
and h
n
( ) associates the output (response) to the input (stimulus) in nth
order. Optimising the estimate is done by calculating the minimum of the
error function
E[s(t), s
est
(t)] = |s(t) - s
est
(t)|
2
. =
2
The solution of this problem is sketched in Appendix A.1.
In particular, with this method we can analyze the encoding of each fre-
quency component of the stimulus independently from all other frequency
components and hence, solve the problem of the multiple overlapping of the
encoding time windows. To see this we transform the optimal estimate we
found in A.1 into the frequency spectrum (see A.2). In this description,
we can quantify the contribution of each part of the spike train's frequency
spectrum for each frequency component of the signal.
3.3
Distinction between coding strategies
The distinction between rate and temporal encoding is based on the com-
parison between the frequency spectra of signal and response. From the
discussion above, we know that each stimulus frequency needs a proper en-
coding time window. The size of this encoding time window is half of the
corresponding period, e.g. 5 ms for a 100 Hz component.
If a certain frequency component of the signal is only related with the mean
spike count of the correlated encoding time window, we call this rate encod-
ing. Using the stimulus reconstruction, rate encoding is defined as follows:
each frequency component of the signal can only be composed of terms that
involve the same or lower frequency components of the spike train response
pattern. From the point of view of the encoding distinction formula (A.2)
this means, that the term of response pattern's higher frequency components
gives no significant contribution to the frequency component of the optimal
8
stimulus estimate. Evidently, the number of spikes in each encoding time
window does not have to correspond linearly with the stimulus parameter,
and hence, in general we have a nonlinear rate encoding scheme.
In contrast, a temporal encoding scheme is one with additional correlation
between frequency component of the stimulus and patterns on a time scale
less than the encoding time window in the response function. In this case,
the second part of the distinction formula of the optimal estimate in A.2
gives significant contribution to the description of the stimulus parameter.
In other words, the time dimension of the spike train is really used for in-
formation encoding and does not only represent the time scale of stimulus
changes. In general, information which is not temporal in nature but also in-
formation temporal in nature can be encoded using the temporal dimension.
The second possibility would imply a rescaling of time scales.
An interesting example of these encoding schemes is described by Lemon and
Getz (2000). In the cockroach periplaneta americana olfactory information
is represented by a short time scale rate code in olfactory sensory neurons. A
relatively few number of projection neurons carry this information to higher
cortical areas. Surprisingly, spike train analysis suggest that these projec-
tion neurons carry a higher rate of information by using a temporal encoding
scheme.
We emphasize that high temporal precision is not equal to temporal encod-
ing. High temporal precision in the spike train instead can mean encoding
of high frequency components of the stimulus. The phase locking example
(2.1) is clearly a rate code as temporal patterns in time scales of less than
the period corresponding to the frequency of the stimulus do not occur. In
contrast, the example with data from LGN neurons (2.3) has high time res-
olution up to 1 ms whereas the stimulus were updated only every 7.8 ms.
Consequently, we have now discovered a time encoding scheme. Nevertheless,
one should also take into account that sharp edges in luminance occur at the
frame boundaries. These edges correspond to high frequency components in
the frequency domain. In any case, we will see below, that even temporal
encoding cannot account for all information in this data.
Of course, following the stimulus changes with high time resolution reduces
the size of the encoding time window. When taking into account the intrin-
sic size of an action potential ( 1ms) and the absolute refractory period (
1ms) variations of temporal patterns inside the small encoding time window
are not anymore noticeable.
In ensembles of cells spike train patterns are not limited by refractory periods
and in fact, the number of possible patterns is much greater. Experiments
conducted on behaving monkeys (Abeles, 1993) and on anesthetized cats
(Engel, 1992) suggest temporal encoding in spike train patterns. Regardless
9
of this most interesting topic, we continue our review with the analysis of
the single neuron's behaviour.
4
Information Theory
How can the vague notion of information be described in mathematical
terms? Also, how can the content of information of a specific spike train
be evaluated? First of all, I will give a rough answer to those questions.
Subsequently, I will introduce the term of mutual information. We will use
the direct method to estimate the information of the spike trains measured in
the LGN (see 2.3). We will see how the coding efficiency is dependent on the
time resolution. This will give us a quantitative estimate of the significance
of precise timing. Moreover, we will also quantify the information coded in
temporal patterns.
4.1
General Principles
The foundations of information theory were developed by Shannon in 1948.
He derived a measure of uncertainty or entropy. This corresponds to infor-
mation as information is, roughly speaking, a decrease in uncertainty.
Each spike train {t
i
} has a probability P [{t
i
}] of being observed. A neuron
has a certain set of possible spike trains. When we measure a spike train,
we gain information proportional to the `surprise' of observing this particular
spike train out of the set of all possible spike trains. Hence we expect entropy
and information to be a decreasing function of P [{t
i
}]. Additionally, when
recording two spike trains {t
i
1
} and {t
i
2
} from two independent neurons, we
expect the gain of information to be additive. The probability of observing
these two spike train is P [{t
i
1
}]P [{t
i
2
}] and therefore the additivity condition
gives for the information I
I(P [{t
i
1
}]P [{t
i
2
}]) = I(P [{t
i
1
}]) + I(P [{t
i
2
}])
But the logarithm is defined by monotony and additivity, thence, we de-
scribed the entropy completely but ignored arbitrary constant and base
(which can be summarized in one constant). Conventionally, information
is defined in units of bits
I(P [{t
i
}]) = -log
2
(P [{t
i
}])
10
Figure 4:
Binning. A binary string code is attributed to a spike train. This is one
approach to make the neural response measurable. Source: Strong et al, 1998
and the entropy according to Shannon is this measure, averaged over all
possible responses
H = -
i
=1
P [{t
i
}]logP [{t
i
}]
In theory, the range of all possible spike trains which is the entropy would
give us the information capacity of a neuron. But we observed already in
experiments, that a repeated stimulus does not lead to exactly the same
reponse every time but that in fact, we have a certain variability in spike
trains. This variability is called noise and limits the information capacity
of a neuron. Therefore, we can find the true information capacity I
m
by
subtracting the entropy of the noise H
noise
from the full response entropy H
I
m
= H - H
noise
The quantity I
m
is also called mutual information. The noise entropy is
calculated easily by averaging over all possibe responses at given stimulus s
H
noise
= -
i
=1
P [{t
i
|s}logP [{t
i
|s}]
and the mutual information is
I
m
= H - H
noise
= -
i
=1
P [{t
i
}]logP [{t
i
}] +
i
=1
P [{t
i
|s}]logP [{t
i
|s}]
11
For practical computing of this quantities, we have to write down all mea-
sured spike trains. One method is to divide the time axis into small time
bins of size . Whenever a spike occurs in a particular time bin, the value
1 is assigned to this bin. Otherwise the bin is labeled with the value 0 (see
figure 4).
This characterization of the entropy is not only dependent on the time reso-
lution but also on length T of the spike train being considered.
The fundamental problem of finding the mutual information is the large
amount of data needed to specify the relevant probability distributions. Re-
cently, Strong et al (1998) explained how to achieve a good approximation
of entropy and mutual information (see below).
4.2
Applying the direct method
Three general approaches can help to estimate the information in spike trains,
giving us a lower bound, an upper bound and a direct estimate of the infor-
mation.
The first approach was described by Bialek et al (1991) and is extensively
discussed in Rieke et al (1997). They derived a lower bound of the informa-
tion R
inf o
which is based on the signal to noise ratio (SNR) of the stimulus
estimate:
R
inf o
=
-
d
2
log
2
[1 + SN R]
This procedure relies on the stimulus reconstruction method described above
which gives us an optimal stimulus estimate. As the stimulus estimate is
derived from the response, the stimulus estimate contains less information
about the stimulus than the response. This is called the data processing
inequality theorem. Hence by calculating the information in the stimulus
estimate we derive a lower bound for the information in the response. For
the signal to noise ratio, we take s
est
as the signal and n = s - s
est
as the
noise.
Second, when assuming that neuronal response and neuronal noise are inde-
pendent and both have Gaussian distributed, we can obtain an upper bound
of the information. Here, the mean neuronal response is taken as signal, the
deviations of each indivudual response from the mean is the noise. By this,
we can use the formula above again. The information obtained is an upper
bound because a Gaussian distribution has the maximal possible entropy.
Particularly, the ratio between lower and upper bound quantifies the qual-
ity of the model we used to derive the optimal estimate. Moreover, we can
identify those stimulus parameters (frequencies) which are encoded preferen-
12
tially (using the Fourier transform of s
est
, A.2). For derivations of formulas
see Rieke et al (1997) and Borst et Theunissen (1999).
Nonetheless, here we will concentrate on the direct method as specified by
Strong et al (1998), which measures the information directly. This approach
is simpler as we don't need to identify relevant stimulus parameter and can
use spike train statistics only. The direct method is more satisfying than
deriving only boundaries as it gives the correct information measure. On the
other hand, it can be difficult to accumulate a sufficient amount of data.
Figure 5:
Entropy rate against the reciprocal word length. At high word length,
the entropy rate estimation breaks down due to insufficient data. The true entropy
rate is obtained by extrapolating the more reliable data. Source: Reinagel and Reid,
2000
The idea behind the estimate of the information crucially depends on the
construction of spike train words where the binary states of the time bins
correspond to the single letters. T is still the entire duration of the spike
train and is assumed to be very large. In addition, we define Q(t) which is
sequence of length L with L/ zeros and ones. Q(t) is called a word. Here,
t denotes the time of the first bin of the word Q. The probability that a word
Q occurs at any time during the entire spike train is P (Q). We would like
to measure the information independent of the length of the sequence and
13
hence, introduce the information rate
H = H/L. Then
H(L, ) = -
1
L
Q
P (Q)log
2
P (Q)
which we call the word length dependent entropy estimate.
However, if there is any correlation between successive intervals, e.g. between
Q(t) and Q(t + L), then part of the information of Q(t + L) can be predicted
by Q(t) and vice versa. Hence, our measure includes redundant information
and our calculation of
H gives an upper bound of the information rate. Ad-
mittedly, the upper bound depends on L and with L the redundancy
gets less and less important. Basically, we can forget about `neighbouring'
effects because boundaries of a d dimensional object grow with d
-
1
but a
word behaves like a string which is a one dimenionsal object. On the other
hand, with larger L we need more data in order to specify the probability
distribution P (Q) properly.
The following explicit calculation are based on data from the LGN neurons
as described in 2.3. In figure 5, the entropy rate
H( = 0.6ms, L) has
been calculated for different lengths of words and drawn against 1/L. We
can observe a proportionality of the entropy rate to 1/L for small L.When
L grows larger than 12 ms (which corresponds to 20 bins 0.6 ms wide) the
dependence changes due to the sampling problem: there is not enough data
available. But we can extrapolate
S
(,L)
L
to L by
S(, L)
L
= S( ) +
C( )
L
+ . . .
where C is a constant. With inserting the entropy estimates with suf-
ficient data sizes (=when 1/L is sufficiently large) we can extrapolate the
entropy estimate to infinite large word length and thus we can find the true
entropy S( ).
For H
N oise
, one can apply the same method using data from repeated trials
with the identic white noise stimulus. The difference between the extrapo-
lated entropy and the extrapolated noise entropy gives the mutual informa-
tion which is in the LGN spike trains 102 bits/s . This is an extraordinary
high rate of information compared with other spike train analyses. This
means that the cat can distinguish between binary signals after 10 ms of
spike train.
4.3
Quantifying time resolution
We were particularly interested in a quantitative measurement of `conven-
tional' time coding. With information theory as a tool, we can give a precise
14
Figure 6:
Time resolution. We estimate the information rate for different time
resolutions as explained above.The mean rate over time windows of 64 ms carries
small but significant information. Observing that the information rate does not
level off until a bin size of at least 0.6 ms shows the significance of high time
resolution. Source: Reinagel and Reid, 2000
answer now. For different time resolutions, we compute several estimates as
as function of word length L and then extrapolate to infinity length of words
(figure 6). As expected, we get more information with increased time resolu-
tion. For data sampling reasons, the smallest bin size was 0.6 ms only, and
we cannot state at what time resolution the information rate plateaus. But
even this 0.6 ms time resolution implies that timing is more precise than the
smallest interspike interval (here, the absolute refractory period is estimated
to be 2.7 ms).
4.4
Temporal patterns
As we observe refractory periods and bursts, there is certainly temporal struc-
ture in spike trains. But what is the quantitative relevance of these patterns
in coding?
In the entropy estimation above, we considered long words of length L which
include temporal structure. In contrast, if we estimate the information with
L=1, we assume independence between bins. In fact, this information (L=1)
corresponds to all information which is contained in the peri stimulus time
histogram (PSTH). But as the entropy rate changes with word length, the in-
15
Figure 7:
The mutual information rate is drawn against the reciprocal word length.
The true data reveal that mutual information increases with word length which
means, that temporal patterns are significant. The temporal patterns are also en-
coded locally: scrambling the time bins makes the mutual information nearly in-
dependent from the word length. Because of higher time precision, the scrambled
model carries more information than the Poisson model. Source: Reinagel and
Reid, 2000
dependence assumption is certainly wrong. Fortunately, we can quantify the
dependence easily (Reid and Reinagel, 2000) by introducing a new quantity
Z, which evaluates the information in temporal patterns. Z is the difference
between the total entropy and the entropy under the independence assump-
tion.
Z( ) = lim
L
I(, L) - I(, L = 1)
The data from the LGN recordings give Z( = 0.6ms) = 25 bits/s. From
11 cells 9 cells had a positive Z value. This could be explained by an exter-
nal noise source which has a long enough time scale to affect more than one
spike. Alternatively, electrophysical effects of one spike like the refractory
period could affect the timing of the following spike.
On the other hand, one cell had a significant negative Z value. This coin-
cides with prevalent occurence of bursts. But bursts are a very stereotyped
structures and hence compromise redundancy (Z < 0).
With a total mutual information of 102 bits/s, Z( = 0.6ms) = 25 bits/s
implies that one quarter of the information is in temporal pattern, whereas
three quarters can be estimated only evaluating the PSTH. One can also
prove that the information in local pattern is local. For this the words were
not composed from L/ succeding bins but from bins which were seperated
16
in time. As shown in figure 7 (scrambled), the estimate of the information
rate depended only very weakly on the word length in this case.
Taking the real data from the PSTH of the neuron, we can generate spike
train according to a time dependent Poisson process which gives the same
temporal precision. Here the estimate of information is independent of L, so
Z = 0 (figure 7). But as it can be observed from the figure, even if the cell
does not encode temporal pattern (L=1, scrambled), the information rate is
higher than predicted by the Poisson model. In fact, only part of the discrep-
ancy between Poisson model and real data is due to temporal pattern. The
other part can be explained by the exact spiking which is more precise than
expected from a Poisson model. More accurately, the ratio between variance
and mean spike count is considerably lower than 1 which is the characteristic
of a Poisson process.
5
Discussion
In our quest to understand the exact meaning of coding schemes, we finally
arrive at a stage where can we distinguish between different classes of coding
schemes.
1. rate encoding
2. temporal encoding
3. temporal pattern encoding
Refering to Theunissen and Miller (1995), we defined rate encoding as a
scheme where a given frequency component of a stimulus is completely de-
scribed by the same or lower freqeuncy components in the spike train spec-
trum. In contrast, in the temporal encoding scheme higher frequency com-
ponents of the spike train spectrum were needed to characterize a given
frequency component of the stimulus. For these two definitions, the stimu-
lus reconstruction method allowed a proper distinction between signal and
response spectra.
The language of information theory revealed a third coding scheme. Intro-
ducing words describing part of spike trains allowed us to estimate entropy
and information rate. Even more, we were able to distinguish between the
total entropy and the entropy assuming independence between neighbouring
bins (the word length is only one bin long). The difference, labeled Z, is that
part of the information which is in temporal pattern.
As discussed above, part of the data from the LGN recordings is temporal en-
coded. However, temporal encoding can be described by a time varying rate
17
of arbitrary high time resolution. This time varying rate is usually drawn in
form of a peri stimulus time histogram (PSTH). But as shown above, tempo-
ral pattern cannot be characterized by information contained in a PSTH. As
the temporal pattern evalutation goes beyond the conventional PSTH, this
method is supposed to be more powerful than the more technical temporal
time encoding distinction. Rather we understand the purpose of the tempo-
ral time encoding procedure in clarifying the landscape of coding schemes.
Historically, the term time coding referred to a diversity of phenomena. In
particular, high time resolution and temporal pattern coding were both de-
scribed by temporal coding. We identified two `true' temporal coding schemes
with temporal encoding and temporal pattern coding. High time resolution
can be time dependent rate coding only. Nonetheless, we were also able to
quantify time resolution (see figure 7).
In a more extensive discussion, natural time scales like the mean interspike
interval, refractory periods and the integration window would deserve more
attention. Explaining the role of of information theory in spike train analy-
sis would need a more thorough treatment of upper and lower bounds with
respect to all kind of assumptions (Borst and Theunissen, 1999). Instead,
we concentrated on a few benchmarks hoping that this will be sufficient for
an introduction into this topic.
5.1
Outlook
We investigated certain aspects of the behaviour of the single neuron's spike
train. However, Victor (2002) suggests an alternative method to estimate
information of spike trains. Instead of binning, he preserves the topological
structure of spike trains and the entropy estimate is based on the distance
to the closest neighbor spike. Numerical results indicate that this approach
is more robust and more rapidly converging than conventional binning.
The true challenge is the understanding of coding in ensembles of neurons.
There is a controversy over the contribution of time pattern in ensembles of
neurons to information representation. Some suggest that correlation carries
significant information. Riehle et al (1997) observe a synchronization of
individual spikes during stimulus expectancy and real performance in the
motor cortex of monkeys. However, Shadlen and Newsome (1998) suggest
that high variability in spike trains of cortical neurons allow only ensemble
rate coding. Panzeri et al (1999) present an approach towards quantifying
the correlation between spike trains on short time scales. For this, they
expand the expression of the mutual information and break down the second
order term in three parts. One represents only a firing rate term, another
shows noise dependent correlations and the last presents stimulus dependent
18
correlations. Results show that most information is carried with the firing
rate only. Other approaches beyond information theory exist, but this area
is still vastly unexplored. As methods and applications have been developed
only over the last few years, this area promises further insights.
A
Appendix
A.1
Calculation of the reconstruction filter
Our task is to find the optimal estimate s
est
of the stimulus s. In general, we
can write the estimate as an expansion of functionals, called Volterra series.
s
est
=
d
1
h
1
(
1
)x(t -
1
) +
d
1
d
2
h
2
(
1
,
2
)x(t -
1
)x(t -
2
) + . . .
Here, the input x is the response function of the neuron (the spike train)
x(t - ) =
i
(t - t
i
- ). Of course, the formula above is sufficiently
general and we could write output y instead of s
est
. We expand all filters in
power series of causal functions, e.g.
h
1
(
1
) =
k
k
f
k
(
1
)
h
2
(
1
,
2
) =
k,l
k,l
f
k
(
1
)f
l
(
2
)
and causality is preserved by f
k
( ) = 0 for < 0. Hence,
s
est
=
k
k
d(
1
)f
k
(
1
)x(t -
1
) +
k,l
kl
d
1
d(
2
)f
k
(
1
)f
l
(
2
)x(t -
1
)x(t -
2
) + . . .
We optimize our estimate by minimizing the mean square error of the differ-
ence between stimulus and stimulus estimate
2
=
dt|s(t -
delay
) - s
est
|
2
Here, s is the observed stimulus. We introduced the delay time
delay
to pay
regard to the finite time between stimulus and response, respectively stim-
ulus estimate. The description of the kernels is complete with the charace-
terization of the coefficients . We start with the linear coefficients only but
generalize then.
2
p
= 0
19
dts(t-
delay
)
d
1
f
p
(
1
)x(t-
1
) =
k
k
d
1
f
k
(
1
)x(t-
1
)
d
2
f
p
(
2
)x(t-
2
).
With
s = s(t -
delay
)
R
p
=
d f
p
( )x(t - ) ,
one can write
Corr(s, R)
p
=
k
k
Corr(R, R)
kp
,
with
Corr(s, R)
p
=
dt s · R
p
Corr
(R, R)
kp
=
dt R
k
· R
p
.
By inversion this gives the solution
a
p
=
k
Corr(s, R)
k
· [Corr(R, R)]
-
1
kp
,
However, we can easily generalize to
= Corr(s, R) · [Corr(R, R)]
-
1
,
where the vector is
= (
1
,
2
, . . . ,
11
,
12
, . . . ,
22
, . . .) ,
and the vector R
R
=
d f
1
( )x(t - )
d f
2
( )x(t - )
..
.
d
1
d
2
f
1
(
1
)f
1
(
2
)x(t -
1
)x(t -
2
)
d
1
d
2
f
1
(
1
)f
2
(
2
)x(t -
1
)x(t -
2
)
...
Of course, for computational purposes the series expansion must be truncated
after a finite number of terms. The goodness of the results can be checked
by comparing them with the acausal filter. The acausal filter is derived in a
similar way and is the optimal linear approximation. For further explaination
see A.8.1 in Rieke et al (1997)
20
A.2
Encoding distinction formula
We transform the stimulus estimate into its frequency spectrum. Afterwards,
we will be able to tell which part of the response spectrum takes part in the
construction of a specific component of the stimulus estimate. Hereby, we
can characterize the nature of encoding.
We know
s
est
(t) =
+
-
d
1
h
1
(
1
)x(t-
1
)+
+
-
d
1
d
2
h
2
(
1
,
2
)x(t-
1
)x(t-
2
)+. . .
(1)
with
h
1
(
1
) =
k
k
f
k
(
1
)
h
2
(
1
,
2
) =
k,l
k,l
f
k
(
1
)f
l
(
2
)
deriving the Fourier transform of each kernel, e.g.
H
1
(
1
) =
+
-
d
1
h
1
(
1
)e
i
(
1
1
)
H
2
(
1
,
2
) =
+
-
d
1
d
2
h
2
(
1
,
2
)e
i
(
1
1
+
2
2
)
s
est
(t) = F
-
1
[H
1
(
1
)x(
1
) + H
2
(
1
,
2
)x(
1
)x(
2
) + . . .]
(2)
and executing the inverse Fourier transform F
-
1
yields (1). However, we can
evaluate (2) at one specific frequency and get
s
est
() = H
1
()x() +
=
1
+
2
d
1
d
2
H
2
(
1
,
2
)x(
1
)x(
2
)
+
=
1
+
2
+
3
d
1
d
2
d
3
H
3
(
1
,
2
,
3
)x(
1
)x(
2
)x(
3
) + . . .
Substituting
2
= -
1
and similarly for higher terms, one get
s
est
() = H
1
()x() +
+
-
d
1
H
2
(
1
, -
1
)x(
1
)x( -
1
) + . . .
(3)
We can group together the right hand side of (3) into the lower frequency
component up to and the upper frequency component above .
21
s
est
() =
H
1
()x()
+2
0
d
1
H
2
(
1
, -
1
)x(
1
)x( -
1
)
+4
0
-
1
0
d
1
d
2
H
3
(
1
,
2
, -
1
-
2
)x(
1
)x(
2
)x( -
1
-
2
)
+ . . .
+
+2
d
1
H
2
(
1
, -
1
)x(
1
)x( -
1
)
+4
0
-
1
d
1
d
2
H
3
(
1
,
2
, -
1
-
2
)x(
1
)x(
2
)x( -
1
-
2
)
+ . . .
where the first group (e.g. labeled s
rate
est
()) is sufficent for rate encoding while
the second group s
temp
+
est
() is needed in addition to characterize temporal
encoding schemes. Therefore,
s
est
() = s
rate
est
() + s
temp
+
est
()
B
Bibliography
References
[1] Abeles, M., H. Bergman, E. Margalit and E. Vaadia (1993). Spatiotem-
poral firing patterns in the frontal cortex of behaving monkeys, J.
Neurophys. 70:1629-1638
[2] Bialek, W., F.Rieke, R.R. Van Steveninck and D. Warland (1991).
Reading a neural code, Science 252, 1854-1857
[3] Borst, A. and F.E. Theunissen (1999). Information theory and neural
coding, Nature Neurosci. 2(11), 947-957
[4] Darwin,
C. (1994). Perception:
Ear and Auditory Nerve,
www.biols.susx.ac.uk/home/Chris Darwin
[5] Engel, A.K., P. Knig, A.K. Kreiter, T.B. Schillen and W. Singer
(1992). Temporal coding in the visual cortex: New vistas on inte-
gration in the nervous system. TINS 155:218-226
[6] Houtsma, A.J.M. and J.L. Goldstein (1972). The central origin of the
pitch of complex tones: Evidence from musical interval recognition, J.
Acoustical Soc. of America, 51, 520-529
22
[7] Houtsma, A.J.M. and J. Smurzynski (1990). Pitch identification and
discrimination for complex tones with many harmonics, J. Acoustical
Soc. of America, 87, 304-310
[8] Lemon, W. and W. Getz (2000). Rate code input produces temporal
code output from cockroach antennal lobes, Biosystems 58, 151-158
[9] Mainen, Z.F. and T.J.Sejnowski (1995). Reliability of spike timing in
neocortical neurons, Science 268, 1503-1506
[10] Panzeri, S., S.R. Schultz, A. Treves and E.T.Rolls (1999). Correlations
and the encoding of information in the nervous system, Proceedings
of the Royal Society B
[11] Reinagel, P. and R.C. Reid (2000). Temporal coding of visual infor-
mation in the thalamus, J. Neurosci. 20(14) 5392-5400
[12] Riehle, A., S. Gruen, M. Diessman and A. Aertsen (1997). Spike sy-
chronization and rate modulation differentially involved in motor cor-
tical function, Science 278: 1950-1953
[13] Rieke, F., D. Warland, R.R. de Ruyter van Steveninck and W. Bialek
(1997). Spikes: exploring the neural code. Cambridge, MA:MIT
[14] Shadlen, M.N. and W.T.Newsome (1998). The variable discharge of
cortical neurons: implications for connectivity, computation and cod-
ing, J. Neurosci. 18(10):3870-3896
[15] Shannon, C.E. (1948). A mathematical theory of communication, Bell
Sys. Tech. J.27, 379-423, 623-656 (republished at cm.bell-labs.com)
[16] Strong, S.P., R. Koberle, R.R. de Ruyter van Steveninck and W. Bialek
(1998). Entropy and information in neural spike trains, Phys Rev Lett
80:197-200
[17] Theunissen, F. and J.P.Miller (1995). Temporal encoding in nervous
systems - a rigorous definition, J. Comput. Neurosci. 2:149-162
[18] Victor, J.D. (2002). Binless Stategies for estimation of information
from neural data, Phys Rev E66,051903
Acknowledgment
I would like to express gratitude to my supervisor Dr. Stuart Baker whose
guidance was crucial for the successful completion of this project.
23
Comments
This paper is written as an essay for part III in mathematics. The main
difficulty was defining the problem. After working through several books
(Rieke et al., Dayan and Abbot, Gerstner) and consulting Dr Baker I singled
out the question of time coding. The most important papers (which I found)
were those of Theunissen and Miller (1995), Strong et al. (1998), Borst and
Theunissen (1999) and Reinagel and Reid (2000). The derivation in A.1
includes some own arguments but is related to A.8.2 in Rieke et al (1997).
The derivation in A.2 is similar to Theunissen and Miller (1995).
24
0 comments