Register or log in at GRIN

Your e-mail-address or password is wrong
Register now
For new authors: free, easy and fast
This will be used as your user name, please specify a valid e-mail address

Lost password

Your e-mail-address or password is wrong

Request a new password
Information Encoding in Small Neural Systems close

Please wait

Please install the Adobe Flash Player if no e-book is displayed.

Information Encoding in Small Neural Systems

Other, 2003, 28 Pages
Author: Creutzig, Felix
Subject: Neurobiology

Details

Category: Other
Year: 2003
Pages: 28
Language: English
Archive No.: V107825
ISBN (E-book): 978-3-640-06048-1

File size: 400 KB
Notes :
27 Seiten.


Abstract

The nervous system must encode and process information using the action potentials of neurones. There ist considerable controversy as to how this is archieved. Most neuroscientists assume that the rate of discharge of action potentials is all that carries information; however, more recent work duggests that the precise timing of the discharge events could also carry information. The essay will adress the theoretical basis vor these two contrasting views of neural encoding. It may also address this problem, since the analysis and interpretation of such data ist far from straightforward.


Fulltext (computer-generated)

Information Encoding in Small Neural Systems

Felix Creutzig

April 8, 2003


Contents

1

Introduction

1

2

Experimental results

2

2.1 Phase locking .

2

2.2 Reliability with fluctuating stimulus .

4

2.3 Time precision in the LGN .

5

3

Temporal encoding

6

3.1 Time encoding window .

6

3.2 Stimulus reconstruction .

7

3.3 Distinction between coding strategies .

8

4

Information Theory

10

4.1 General Principles 10

4.2 Applying the direct method 12

4.3 Quantifying time resolution 14

4.4 Temporal patterns 15

5

Discussion

17

5.1 Outlook 18

A Appendix

19

A.1 Calculation of the reconstruction filter 19

A.2 Encoding distinction formula 21

B Bibliography

22

i


Abstract

The nervous system must encode and process information using the action

potentials of neurones. There is considerable controversy as to how this is

achieved. Most neuroscientists assume that the rate of discharge of action

potentials is all that carries information; however, more recent work suggests

that the precise timing of the discharge events could also carry information.

The essay will address the theoretical basis for these two contrasting views

of neural encoding. It may also address how experimental data can be used

to address this problem, since the analysis and interpretation of such data is

far from straightforward.


1

Introduction

One of the most fundamental questions of neurosciences is how our brain

encodes information. This paper aims firstly at clarifying the notions of

rate and time coding and secondly at explaining how information theory can

quantify the fraction of different coding strategies. This quantification will

show the relevance of time coding strategies. Furthermore, we will identify

the significant questions with respect to time coding.

Perception, awareness and behavioural output are all represented in spike

trains of neurons. Hence, an understanding of the neural code in spike trains

is one of the key questions. Moreover, the quality of the neural code can

reveal the degree to that biological systems are optimised. Here, we will con-

centrate on the more constricted question of the nature of the neural code.

What kind of complex features are in the spike train which go beyond the

mean firing rate? What is their role?

We will start with a short discussion on the concepts of rate and time cod-

ing showing their equivalence according to conventional definition. In the

second part, we will review experiments indicating the pertinence of precise

timing. However, using the stimulus reconstruction analysis technique, we

will identify a time encoding which goes beyond precise timing. In the last

part, we will use information theory to quantify the coding efficiency with

respect to the time resolution. Additionally, we will see that a certain part

of the information is in temporal pattern.

Conventionally, the rate of a spike train is defined as counting the spikes

in a certain time window. Admittedly, the choice of the time window is ar-

bitrary and hence, by construction of small time windows, the measure has

a high time resolution. In general, a stimulus s(t) causes a response, the

spike train. The spike train is described accurately in arrival times of spikes,

t1, t2, . . . , tn or in short {ti}. Using the language of probability, a complete

characterization of the neural response is contained in the conditional proba-

bility distribution P [{ti}|s(t)] which measures the likelihood that spikes will

arrive at the set of time {t1, t2, . . . , tn} given a certain stimulus s(t). Clearly,

we can use this as a characterization of a time coding in a reasonable sense.

But this conditional distribution is closely related to time dependent rate

coding. More properly, defining the counting function

t - t

n(t) =

f

i

t

i

1


where

-1

1

f (x) = 1 if

x

2

2

f (x) = 0

otherwise

so that f (x) will count whether a particular spike of the spike train {ti} has

occured in the time bin t centered on time t. The time dependent rate is

obviously defined by dividing the counting function by the width of the time

bin t and taking the limit after averaging n(t).

1

r[t; s( ] lim

n(t)

t t

This is a Dirac delta function. Hence,

r[t; s( )] =

(t - ti)

i

T

T

T

N

=

dt1

dt2 . . .

dtN P [{ti}|s( )]

(t - ti)

N 0

0

0

i=1

For the last step, we observed that we have to account for the randomness

of spike trains. This randomness is described by a probability distribution,

which is the conditional probability distribution P [{ti}|s( )] in this case. We

have to integrate over all possible arrival times in order to average over arrival

times. Additionally, we sum over all possible number of events N .

The rate is the mean of the conditional distribution which describes arrival

times. Therefore the rate is time dependent and hence, can track changes

in stimulus parameter with high time precision. However, higher moments

of the distribution may carry additional information. Later, we will show

how to quantify this extra amount of information. But, we will begin with

experimental evidence of `time coding′ and continue with a proper distinction

between time encoding and time dependent rate coding.

2

Experimental results

In this chapter we will present evidence for the importance of precise timing.

2.1

Phase locking

Phase locking is a typical example of a time dependent firing rate. The

requirement is that the stimulus is periodic in time, i.e. a sinusoidal pressure

2


Figure 1: Phase locking. The neuron fires with highest probability at high stimulus

amplitude Thus, interspike intervals are often integer multiples of the stimulus

period as illustrated in the picture. At high frequencies phase locking becomes

plurivalent. Source: Darwin, 1994

wave at the eardrum (see figure 1), which is a pure tone. Each auditory

neuron has a characteristic frequency, where the least energy is needed to

stimulate it. At the appropriate frequency the neuron firing rate changes with

the phase. Hence the probability of firing is highest every 2. Naturally, this

does not mean that the neuron fires every period. In fact, the time dependent

firing rate is

r(t) = r0 + Asin(t + f )

where the case A > r0 is subject to half wave rectification. Though the

mean firing rate (over large time windows) is constant until a certain sound

intensity threshold is reached, the firing is phase locked and the frequency is

decoded with the timing of the spikes. At low frequencies, the intensity of

sound is the same at both ears and location is only decoded by time differ-

ences in auditory neurones. At high frequencies, the time differences become

ambiguous but shadowing effects give information about the sound location.

In this context it is worth to note, that the mammalian ear encodes fre-

quency to a certain extent by locus: Specific small frequency ranges activate

specific neurons by tectorial membrane deflection. Pitch is encoded in higher

cortical regions. Proof of this is that it is sufficient to present successive har-

monics to different ears in order to reconstruct the fundamental (Houtsma

3


and Goldstein, 1972). But if only high harmonics (of order > 12) are pre-

sented, coincident phases support reconstruction (Houtsma and Smurzynski,

1990). Hence the purpose of phase locking is not only to locate sounds by

measuring phase differences but phase locking gives also contribution to the

identification of pitch.

2.2

Reliability with fluctuating stimulus

Figure 2: Time precision of firing pattern with constant (A) and fluctuating (B)

stimulus. The constant stimulus with little information to be encoded shows a

irregular firing behaviour. The repeated fluctuating stimulus displays the same

response with high reliability. Top: superimposed responses, 10 trials. Bottom:

raster plot of spike trains, 25 trials. Source: Mainen and Sejnowski, 1995

Mainen and Sejnowski (1995) discovered a highly divergent behaviour of

a cortical neuron in spiking behaviour. A constant current pulse (stimu-

lus) evoked a irregular response with high variability. The same neuron was

stimulated by the same stimulus for 25 times but showed significant different

spike train patterns (figure 2, A). However, stimulating the neuron repeat-

4


edly with a fluctuating stimulus current (Gaussian white noise = frequency

spectrum is continuous and uniform) but applying the same stimulus in all

trials, the patterns show with high reliability a high time precision (figure 2,

B). This experiment demonstrated that cortical neurons have a consistant be-

haviour with time dependent coding strategies. In particular, the fluctuating

stimulus is a more natural and realistic condition than the constant pulse.

Consequently, at least in this case, neurons work with a higher reliability

under natural condition than under simple but unrealistic conditions.

2.3

Time precision in the LGN

Figure 3: Temporal coding in the LGN. Different unique stimuli reveal a flat

spectrum in the average count per time binwhich is represented in the PSTH. In

constrast, the repeated stimulus displays prominent exaltations. Source: Reinagel

and Reid, 2000

The lateral geniculate nucleus (LGN) of the thalamus is part of the vi-

sual pathway and forwards visual information to the cortical level (primary

visual cortex - V1). For accurate perception, relibale transmission is ex-

pected. Reinagel and Reid (2000) analysed the statistical features of LGN

spike trains.

Spatially uniform visual stimuli with random time varying luminance were

presented, while spike trains from 11 well isolated individual neurons in the

5


LGN of anaesthetized cats were recorded. One white noise stimulus was re-

peated 128 times and recordings were compared with recordings from unique

stimuli. For this, we count the average number of spikes in bins and draw this

information into the peristiumulus time histogram (PSTH). Figure 3 shows

the PSTH of unique stimuli at the top and below the PSTH of the repeated

stimulus where the shape of the stimulus is drawn thereunder. Some peaks

of the lower PSTH have width of 1 ms, hence a certain stimulus is encoded

with high time precision.

We will return to the quantitative analysis of this experiment in chapter 4.

3

Temporal encoding

Let us now introduce an alternative and mathematically rigorous concept

of a time code, called time encoding which was developped by Theunissen

and Miller (1995). First of all, we implement the idea of the time encoding

window. Based on this, we can distinguish between rate and time encoding by

comparing the frequency spectrum of signal and response (spike train). This

distinction shows us intrinsic time scales of the neural system and alternative

encoding and decoding mechanisms.

3.1

Time encoding window

The time encoding window is defined as the duration of a neuron′s spike train

which corresponds to a single symbol in the neural code. A signal where the

time scale of changes in the stimulus parameter is much longer than the time

scale of the behavioral response can be regarded as nearly stationary. In

order to increase acuity, the time encoding window should be as large as

possible. In this scenario, the time scale of the behavioral response is the

limiting factor. One example is the coding of the shape of a stationary object

by the visual system.

In contrast, for constantly moving visual stimulus the time scale of variation

of stimulus parameter can be shorter than behavioral response or decision

time. In this case, the size of the encoding time window is limited by the rate

at which the relevant stimuli parameter are changing. In fact, the intrinsic

time scale of neural computation is the limiting factor and the dynamic vari-

ation in the stimuli can only be encoded up to a certain rate. As an example,

a neuron is supposed to encode a 100 Hz component of a signal. Hence one

period is 10 ms and you need a time encoding window of 5 ms. By counting

the sum of two neighbouring windows we can extrapolate the amplitude. The

phase can be measured by the ratio of the counts of these two neighbouring

6


windows. In general, this statement is called Nyquist theorem.

For each spike, we can determine the preceding stimulus waveform. By av-

eraging about many preceding stimulus waveforms, we construct by defini-

tion the reverse correlation function. The duration of the reverse correlation

function corresponds to the integration window which is defined as the time

preceding a certain point in the response pattern in which a change in the

signal significantly affects this response pattern. Physiological integration

processes are responsible for this integration window. Taking the 100 Hz

signal from above, the reverse correlation function is several periods of the

signal, but the encoding time window is only half a period. We see that the

encoding time window can be much shorter than the integration window.

In fact, the integration time gives an upper bound on the time limit of cor-

relation between variation of the stimulus and the response function. The

integration time can be regarded as electrophysiological reaction time, and

hence, the encoding time window is always smaller then the integration time

window.

For further analysis of the meaning of the encoding time window, we need

the method of stimulus reconstruction which is a recently developed tool of

stochastic system analysis.

3.2

Stimulus reconstruction

In general, the methods from stochastic systems analysis give a relationship

between stimulus and response.

Natural stimuli have many different frequency components and each of them

might be encoded in corresponding encoding time windows. The analysis

gets very complex as each spike could be part of all different encoding time

windows. The conventional approach of stochastic system analysis is to con-

struct an operator which transforms the signal into the response. Usually, the

response is given in the form of the probability distribution of the spike train

given the stimulus P [{ti}|s(t)]. Of course, for this construction, we have to

average over many trials. But the organism has only one single spike train as

a sample and has to decide on this basis. We would like to know more about

the prediction power of this single spike train. Furthermore, which part of

the frequency spectrum of the spike train is relevant for reconstructing the

stimulus? If we don′t know the encoding model, the reverse approach must

be taken. This approach was developed by Bialek et al (1991). The aim is

to transform the spike train into an optimal estimate of the stimulus. The

idea is that we treat the neural system as a black box.

7


We write the estimate of the stimulus sestin the linear form

N

sest(t) =

d h1( )

(t - ti - )

i=1

N

=

h1(t - ti)

i=1

where h1( ) is the linear response function. More general,

sest = s(t) =

h1(t - ti) +

h2(t - ti, t - tj) + . . .

i

ij

and hn( ) associates the output (response) to the input (stimulus) in nth

order. Optimising the estimate is done by calculating the minimum of the

error function

E[s(t), sest(t)] = |s(t) - sest(t)|2 . = 2

The solution of this problem is sketched in Appendix A.1.

In particular, with this method we can analyze the encoding of each fre-

quency component of the stimulus independently from all other frequency

components and hence, solve the problem of the multiple overlapping of the

encoding time windows. To see this we transform the optimal estimate we

found in A.1 into the frequency spectrum (see A.2). In this description,

we can quantify the contribution of each part of the spike train′s frequency

spectrum for each frequency component of the signal.

3.3

Distinction between coding strategies

The distinction between rate and temporal encoding is based on the com-

parison between the frequency spectra of signal and response. From the

discussion above, we know that each stimulus frequency needs a proper en-

coding time window. The size of this encoding time window is half of the

corresponding period, e.g. 5 ms for a 100 Hz component.

If a certain frequency component of the signal is only related with the mean

spike count of the correlated encoding time window, we call this rate encod-

ing. Using the stimulus reconstruction, rate encoding is defined as follows:

each frequency component of the signal can only be composed of terms that

involve the same or lower frequency components of the spike train response

pattern. From the point of view of the encoding distinction formula (A.2)

this means, that the term of response pattern′s higher frequency components

gives no significant contribution to the frequency component of the optimal

8


stimulus estimate. Evidently, the number of spikes in each encoding time

window does not have to correspond linearly with the stimulus parameter,

and hence, in general we have a nonlinear rate encoding scheme.

In contrast, a temporal encoding scheme is one with additional correlation

between frequency component of the stimulus and patterns on a time scale

less than the encoding time window in the response function. In this case,

the second part of the distinction formula of the optimal estimate in A.2

gives significant contribution to the description of the stimulus parameter.

In other words, the time dimension of the spike train is really used for in-

formation encoding and does not only represent the time scale of stimulus

changes. In general, information which is not temporal in nature but also in-

formation temporal in nature can be encoded using the temporal dimension.

The second possibility would imply a rescaling of time scales.

An interesting example of these encoding schemes is described by Lemon and

Getz (2000). In the cockroach periplaneta americana olfactory information

is represented by a short time scale rate code in olfactory sensory neurons. A

relatively few number of projection neurons carry this information to higher

cortical areas. Surprisingly, spike train analysis suggest that these projec-

tion neurons carry a higher rate of information by using a temporal encoding

scheme.

We emphasize that high temporal precision is not equal to temporal encod-

ing. High temporal precision in the spike train instead can mean encoding

of high frequency components of the stimulus. The phase locking example

(2.1) is clearly a rate code as temporal patterns in time scales of less than

the period corresponding to the frequency of the stimulus do not occur. In

contrast, the example with data from LGN neurons (2.3) has high time res-

olution up to 1 ms whereas the stimulus were updated only every 7.8 ms.

Consequently, we have now discovered a time encoding scheme. Nevertheless,

one should also take into account that sharp edges in luminance occur at the

frame boundaries. These edges correspond to high frequency components in

the frequency domain. In any case, we will see below, that even temporal

encoding cannot account for all information in this data.

Of course, following the stimulus changes with high time resolution reduces

the size of the encoding time window. When taking into account the intrin-

sic size of an action potential ( 1ms) and the absolute refractory period (

1ms) variations of temporal patterns inside the small encoding time window

are not anymore noticeable.

In ensembles of cells spike train patterns are not limited by refractory periods

and in fact, the number of possible patterns is much greater. Experiments

conducted on behaving monkeys (Abeles, 1993) and on anesthetized cats

(Engel, 1992) suggest temporal encoding in spike train patterns. Regardless

9


of this most interesting topic, we continue our review with the analysis of

the single neuron′s behaviour.

4

Information Theory

How can the vague notion of information be described in mathematical

terms? Also, how can the content of information of a specific spike train

be evaluated? First of all, I will give a rough answer to those questions.

Subsequently, I will introduce the term of mutual information. We will use

the direct method to estimate the information of the spike trains measured in

the LGN (see 2.3). We will see how the coding efficiency is dependent on the

time resolution. This will give us a quantitative estimate of the significance

of precise timing. Moreover, we will also quantify the information coded in

temporal patterns.

4.1

General Principles

The foundations of information theory were developed by Shannon in 1948.

He derived a measure of uncertainty or entropy. This corresponds to infor-

mation as information is, roughly speaking, a decrease in uncertainty.

Each spike train {ti} has a probability P [{ti}] of being observed. A neuron

has a certain set of possible spike trains. When we measure a spike train,

we gain information proportional to the `surprise′ of observing this particular

spike train out of the set of all possible spike trains. Hence we expect entropy

and information to be a decreasing function of P [{ti}]. Additionally, when

recording two spike trains {ti1} and {ti2} from two independent neurons, we

expect the gain of information to be additive. The probability of observing

these two spike train is P [{ti1}]P [{ti2}] and therefore the additivity condition

gives for the information I

I(P [{ti1}]P [{ti2}]) = I(P [{ti1}]) + I(P [{ti2}])

But the logarithm is defined by monotony and additivity, thence, we de-

scribed the entropy completely but ignored arbitrary constant and base

(which can be summarized in one constant). Conventionally, information

is defined in units of bits

I(P [{ti}]) = -log2(P [{ti}])

10


Figure 4: Binning. A binary string code is attributed to a spike train. This is one

approach to make the neural response measurable. Source: Strong et al, 1998

and the entropy according to Shannon is this measure, averaged over all

possible responses

H = -

P [{ti}]logP [{ti}]

i=1

In theory, the range of all possible spike trains which is the entropy would

give us the information capacity of a neuron. But we observed already in

experiments, that a repeated stimulus does not lead to exactly the same

reponse every time but that in fact, we have a certain variability in spike

trains. This variability is called noise and limits the information capacity

of a neuron. Therefore, we can find the true information capacity Imby

subtracting the entropy of the noise Hnoise from the full response entropy H

Im = H - Hnoise

The quantity Im is also called mutual information. The noise entropy is

calculated easily by averaging over all possibe responses at given stimulus s

Hnoise = -

P [{ti|s}logP [{ti|s}]

i=1

and the mutual information is

Im = H - Hnoise = -

P [{ti}]logP [{ti}] +

P [{ti|s}]logP [{ti|s}]

i=1

i=1

11


For practical computing of this quantities, we have to write down all mea-

sured spike trains. One method is to divide the time axis into small time

bins of size . Whenever a spike occurs in a particular time bin, the value

1 is assigned to this bin. Otherwise the bin is labeled with the value 0 (see

figure 4).

This characterization of the entropy is not only dependent on the time reso-

lution but also on length T of the spike train being considered.

The fundamental problem of finding the mutual information is the large

amount of data needed to specify the relevant probability distributions. Re-

cently, Strong et al (1998) explained how to achieve a good approximation

of entropy and mutual information (see below).

4.2

Applying the direct method

Three general approaches can help to estimate the information in spike trains,

giving us a lower bound, an upper bound and a direct estimate of the infor-

mation.

The first approach was described by Bialek et al (1991) and is extensively

discussed in Rieke et al (1997). They derived a lower bound of the informa-

tion Rinfo which is based on the signal to noise ratio (SNR) of the stimulus

estimate:

d

Rinfo =

log

2

2 [1 + SN R]

-

This procedure relies on the stimulus reconstruction method described above

which gives us an optimal stimulus estimate. As the stimulus estimate is

derived from the response, the stimulus estimate contains less information

about the stimulus than the response. This is called the data processing

inequality theorem. Hence by calculating the information in the stimulus

estimate we derive a lower bound for the information in the response. For

the signal to noise ratio, we take sest as the signal and n = s - sest as the

noise.

Second, when assuming that neuronal response and neuronal noise are inde-

pendent and both have Gaussian distributed, we can obtain an upper bound

of the information. Here, the mean neuronal response is taken as signal, the

deviations of each indivudual response from the mean is the noise. By this,

we can use the formula above again. The information obtained is an upper

bound because a Gaussian distribution has the maximal possible entropy.

Particularly, the ratio between lower and upper bound quantifies the qual-

ity of the model we used to derive the optimal estimate. Moreover, we can

identify those stimulus parameters (frequencies) which are encoded preferen-

12


tially (using the Fourier transform of sest, A.2). For derivations of formulas

see Rieke et al (1997) and Borst et Theunissen (1999).

Nonetheless, here we will concentrate on the direct method as specified by

Strong et al (1998), which measures the information directly. This approach

is simpler as we don′t need to identify relevant stimulus parameter and can

use spike train statistics only. The direct method is more satisfying than

deriving only boundaries as it gives the correct information measure. On the

other hand, it can be difficult to accumulate a sufficient amount of data.

Figure 5: Entropy rate against the reciprocal word length. At high word length,

the entropy rate estimation breaks down due to insufficient data. The true entropy

rate is obtained by extrapolating the more reliable data. Source: Reinagel and Reid,

2000

The idea behind the estimate of the information crucially depends on the

construction of spike train words where the binary states of the time bins

correspond to the single letters. T is still the entire duration of the spike

train and is assumed to be very large. In addition, we define Q(t) which is

sequence of length L with L/ zeros and ones. Q(t) is called a word. Here,

t denotes the time of the first bin of the word Q. The probability that a word

Q occurs at any time during the entire spike train is P (Q). We would like

to measure the information independent of the length of the sequence and

13


hence, introduce the information rate

H = H/L. Then

1

H(L, ) = -

P (Q)log

L

2P (Q)

Q

which we call the word length dependent entropy estimate.

However, if there is any correlation between successive intervals, e.g. between

Q(t) and Q(t + L), then part of the information of Q(t + L) can be predicted

by Q(t) and vice versa. Hence, our measure includes redundant information

and our calculation of

H gives an upper bound of the information rate. Ad-

mittedly, the upper bound depends on L and with L the redundancy

gets less and less important. Basically, we can forget about `neighbouring′

effects because boundaries of a d dimensional object grow with d-1 but a

word behaves like a string which is a one dimenionsal object. On the other

hand, with larger L we need more data in order to specify the probability

distribution P (Q) properly.

The following explicit calculation are based on data from the LGN neurons

as described in 2.3. In figure 5, the entropy rate

H( = 0.6ms, L) has

been calculated for different lengths of words and drawn against 1/L. We

can observe a proportionality of the entropy rate to 1/L for small L.When

L grows larger than 12 ms (which corresponds to 20 bins 0.6 ms wide) the

dependence changes due to the sampling problem: there is not enough data

available. But we can extrapolate S(,L) to L by

L

S(, L)

C( )

= S( ) +

+ . . .

L

L

where C is a constant. With inserting the entropy estimates with suf-

ficient data sizes (=when 1/L is sufficiently large) we can extrapolate the

entropy estimate to infinite large word length and thus we can find the true

entropy S( ).

For HNoise, one can apply the same method using data from repeated trials

with the identic white noise stimulus. The difference between the extrapo-

lated entropy and the extrapolated noise entropy gives the mutual informa-

tion which is in the LGN spike trains 102 bits/s . This is an extraordinary

high rate of information compared with other spike train analyses. This

means that the cat can distinguish between binary signals after 10 ms of

spike train.

4.3

Quantifying time resolution

We were particularly interested in a quantitative measurement of `conven-

tional′ time coding. With information theory as a tool, we can give a precise

14


Figure 6: Time resolution. We estimate the information rate for different time

resolutions as explained above.The mean rate over time windows of 64 ms carries

small but significant information. Observing that the information rate does not

level off until a bin size of at least 0.6 ms shows the significance of high time

resolution. Source: Reinagel and Reid, 2000

answer now. For different time resolutions, we compute several estimates as

as function of word length L and then extrapolate to infinity length of words

(figure 6). As expected, we get more information with increased time resolu-

tion. For data sampling reasons, the smallest bin size was 0.6 ms only, and

we cannot state at what time resolution the information rate plateaus. But

even this 0.6 ms time resolution implies that timing is more precise than the

smallest interspike interval (here, the absolute refractory period is estimated

to be 2.7 ms).

4.4

Temporal patterns

As we observe refractory periods and bursts, there is certainly temporal struc-

ture in spike trains. But what is the quantitative relevance of these patterns

in coding?

In the entropy estimation above, we considered long words of length L which

include temporal structure. In contrast, if we estimate the information with

L=1, we assume independence between bins. In fact, this information (L=1)

corresponds to all information which is contained in the peri stimulus time

histogram (PSTH). But as the entropy rate changes with word length, the in-

15


Figure 7: The mutual information rate is drawn against the reciprocal word length.

The true data reveal that mutual information increases with word length which

means, that temporal patterns are significant. The temporal patterns are also en-

coded locally: scrambling the time bins makes the mutual information nearly in-

dependent from the word length. Because of higher time precision, the scrambled

model carries more information than the Poisson model. Source: Reinagel and

Reid, 2000

dependence assumption is certainly wrong. Fortunately, we can quantify the

dependence easily (Reid and Reinagel, 2000) by introducing a new quantity

Z, which evaluates the information in temporal patterns. Z is the difference

between the total entropy and the entropy under the independence assump-

tion.

Z( ) = lim I(, L) - I(, L = 1)

L

The data from the LGN recordings give Z( = 0.6ms) = 25 bits/s. From

11 cells 9 cells had a positive Z value. This could be explained by an exter-

nal noise source which has a long enough time scale to affect more than one

spike. Alternatively, electrophysical effects of one spike like the refractory

period could affect the timing of the following spike.

On the other hand, one cell had a significant negative Z value. This coin-

cides with prevalent occurence of bursts. But bursts are a very stereotyped

structures and hence compromise redundancy (Z < 0).

With a total mutual information of 102 bits/s, Z( = 0.6ms) = 25 bits/s

implies that one quarter of the information is in temporal pattern, whereas

three quarters can be estimated only evaluating the PSTH. One can also

prove that the information in local pattern is local. For this the words were

not composed from L/ succeding bins but from bins which were seperated

16


in time. As shown in figure 7 (scrambled), the estimate of the information

rate depended only very weakly on the word length in this case.

Taking the real data from the PSTH of the neuron, we can generate spike

train according to a time dependent Poisson process which gives the same

temporal precision. Here the estimate of information is independent of L, so

Z = 0 (figure 7). But as it can be observed from the figure, even if the cell

does not encode temporal pattern (L=1, scrambled), the information rate is

higher than predicted by the Poisson model. In fact, only part of the discrep-

ancy between Poisson model and real data is due to temporal pattern. The

other part can be explained by the exact spiking which is more precise than

expected from a Poisson model. More accurately, the ratio between variance

and mean spike count is considerably lower than 1 which is the characteristic

of a Poisson process.

5

Discussion

In our quest to understand the exact meaning of coding schemes, we finally

arrive at a stage where can we distinguish between different classes of coding

schemes.

1. rate encoding

2. temporal encoding

3. temporal pattern encoding

Refering to Theunissen and Miller (1995), we defined rate encoding as a

scheme where a given frequency component of a stimulus is completely de-

scribed by the same or lower freqeuncy components in the spike train spec-

trum. In contrast, in the temporal encoding scheme higher frequency com-

ponents of the spike train spectrum were needed to characterize a given

frequency component of the stimulus. For these two definitions, the stimu-

lus reconstruction method allowed a proper distinction between signal and

response spectra.

The language of information theory revealed a third coding scheme. Intro-

ducing words describing part of spike trains allowed us to estimate entropy

and information rate. Even more, we were able to distinguish between the

total entropy and the entropy assuming independence between neighbouring

bins (the word length is only one bin long). The difference, labeled Z, is that

part of the information which is in temporal pattern.

As discussed above, part of the data from the LGN recordings is temporal en-

coded. However, temporal encoding can be described by a time varying rate

17


of arbitrary high time resolution. This time varying rate is usually drawn in

form of a peri stimulus time histogram (PSTH). But as shown above, tempo-

ral pattern cannot be characterized by information contained in a PSTH. As

the temporal pattern evalutation goes beyond the conventional PSTH, this

method is supposed to be more powerful than the more technical temporal

time encoding distinction. Rather we understand the purpose of the tempo-

ral time encoding procedure in clarifying the landscape of coding schemes.

Historically, the term time coding referred to a diversity of phenomena. In

particular, high time resolution and temporal pattern coding were both de-

scribed by temporal coding. We identified two `true′ temporal coding schemes

with temporal encoding and temporal pattern coding. High time resolution

can be time dependent rate coding only. Nonetheless, we were also able to

quantify time resolution (see figure 7).

In a more extensive discussion, natural time scales like the mean interspike

interval, refractory periods and the integration window would deserve more

attention. Explaining the role of of information theory in spike train analy-

sis would need a more thorough treatment of upper and lower bounds with

respect to all kind of assumptions (Borst and Theunissen, 1999). Instead,

we concentrated on a few benchmarks hoping that this will be sufficient for

an introduction into this topic.

5.1

Outlook

We investigated certain aspects of the behaviour of the single neuron′s spike

train. However, Victor (2002) suggests an alternative method to estimate

information of spike trains. Instead of binning, he preserves the topological

structure of spike trains and the entropy estimate is based on the distance

to the closest neighbor spike. Numerical results indicate that this approach

is more robust and more rapidly converging than conventional binning.

The true challenge is the understanding of coding in ensembles of neurons.

There is a controversy over the contribution of time pattern in ensembles of

neurons to information representation. Some suggest that correlation carries

significant information. Riehle et al (1997) observe a synchronization of

individual spikes during stimulus expectancy and real performance in the

motor cortex of monkeys. However, Shadlen and Newsome (1998) suggest

that high variability in spike trains of cortical neurons allow only ensemble

rate coding. Panzeri et al (1999) present an approach towards quantifying

the correlation between spike trains on short time scales. For this, they

expand the expression of the mutual information and break down the second

order term in three parts. One represents only a firing rate term, another

shows noise dependent correlations and the last presents stimulus dependent

18


correlations. Results show that most information is carried with the firing

rate only. Other approaches beyond information theory exist, but this area

is still vastly unexplored. As methods and applications have been developed

only over the last few years, this area promises further insights.

A

Appendix

A.1

Calculation of the reconstruction filter

Our task is to find the optimal estimate sest of the stimulus s. In general, we

can write the estimate as an expansion of functionals, called Volterra series.

sest =

d1h1(1)x(t - 1) +

d1d2h2(1, 2)x(t - 1)x(t - 2) + . . .

Here, the input x is the response function of the neuron (the spike train)

x(t - ) =

i (t - ti - ).

Of course, the formula above is sufficiently

general and we could write output y instead of sest. We expand all filters in

power series of causal functions, e.g.

h1(1) =

kfk(1)

k

h2(1, 2) =

k,lfk(1)fl(2)

k,l

and causality is preserved by fk( ) = 0 for < 0. Hence,

sest =

k

d(1)fk(1)x(t - 1) +

kl

d1d(2)fk(1)fl(2)x(t - 1)x(t - 2) + . . .

k

k,l

We optimize our estimate by minimizing the mean square error of the differ-

ence between stimulus and stimulus estimate

2 =

dt|s(t - delay) - sest|2

Here, s is the observed stimulus. We introduced the delay time delay to pay

regard to the finite time between stimulus and response, respectively stim-

ulus estimate. The description of the kernels is complete with the charace-

terization of the coefficients . We start with the linear coefficients only but

generalize then.

2 = 0

p

19


dts(t-delay)

d1fp(1)x(t-1) =

k

d1fk(1)x(t-1)

d2fp(2)x(t-2).

k

With

s = s(t - delay)

Rp =

d fp( )x(t - ) ,

one can write

Corr(s, R)p =

kCorr(R, R)kp ,

k

with

Corr(s, R)p =

dt s · Rp

Corr(R, R)kp =

dt Rk · Rp .

By inversion this gives the solution

ap =

Corr(s, R)k · [Corr(R, R)]-1 ,

kp

k

However, we can easily generalize to

= Corr(s, R) · [Corr(R, R)]-1 ,

where the vector is

= (1, 2, . . . , 11, 12, . . . , 22, . . .) ,

and the vector R

d f1( )x(t - )

d f

2( )x(t - )

.

.

.

R =

d1d2f1(1)f1(2)x(t-1)x(t-2)

d

1d2f1(1)f2(2)x(t-1)x(t-2)

.

.

.

Of course, for computational purposes the series expansion must be truncated

after a finite number of terms. The goodness of the results can be checked

by comparing them with the acausal filter. The acausal filter is derived in a

similar way and is the optimal linear approximation. For further explaination

see A.8.1 in Rieke et al (1997)

20


A.2

Encoding distinction formula

We transform the stimulus estimate into its frequency spectrum. Afterwards,

we will be able to tell which part of the response spectrum takes part in the

construction of a specific component of the stimulus estimate. Hereby, we

can characterize the nature of encoding.

We know

+

+

sest(t) =

d1h1(1)x(t-1)+

d1d2h2(1, 2)x(t-1)x(t-2)+. . .

(1)

-

-

with

h1(1) =

kfk(1)

k

h2(1, 2) =

k,lfk(1)fl(2)

k,l

deriving the Fourier transform of each kernel, e.g.

+

H1(1) =

d1h1(1)ei(11)

-

+

H2(1, 2) =

d1d2h2(1, 2)ei(11+22)

-

sest(t) = F -1[H1(1)x(1) + H2(1, 2)x(1)x(2) + . . .]

(2)

and executing the inverse Fourier transform F -1 yields (1). However, we can

evaluate (2) at one specific frequency and get

sest() = H1()x() +

d1d2H2(1, 2)x(1)x(2)

=1+2

+

d1d2d3H3(1, 2, 3)x(1)x(2)x(3) + . . .

=1+2+3

Substituting 2 = - 1 and similarly for higher terms, one get

+

sest() = H1()x() +

d1H2(1, - 1)x(1)x( - 1) + . . .

(3)

-

We can group together the right hand side of (3) into the lower frequency

component up to and the upper frequency component above .

21


H1()x()

+2 d

s

0

1H2(1, - 1)x(1)x( - 1)

est() = +4 -1d

0 0

1d2H3(1, 2, - 1 - 2)x(1)x(2)x( - 1 - 2)

+ . . .

+

+2 d

1H2(1, - 1)x(1)x( - 1)

0

+4

d

- 1d2H3(1,2, -1 -2)x(1)x(2)x( -1 -2)

1

+ . . .

where the first group (e.g. labeled srate()) is sufficent for rate encoding while

est

the second group stemp+

est

() is needed in addition to characterize temporal

encoding schemes. Therefore,

sest() = srate() + stemp+

est

est

()

B

Bibliography

References

[1] Abeles, M., H. Bergman, E. Margalit and E. Vaadia (1993). Spatiotem-

poral firing patterns in the frontal cortex of behaving monkeys, J.

Neurophys. 70:1629-1638

[2] Bialek, W., F.Rieke, R.R. Van Steveninck and D. Warland (1991).

Reading a neural code, Science 252, 1854-1857

[3] Borst, A. and F.E. Theunissen (1999). Information theory and neural

coding, Nature Neurosci. 2(11), 947-957

[4] Darwin,

C. (1994). Perception:

Ear and Auditory Nerve,

www.biols.susx.ac.uk/home/Chris Darwin

[5] Engel, A.K., P. Knig, A.K. Kreiter, T.B. Schillen and W. Singer

(1992). Temporal coding in the visual cortex: New vistas on inte-

gration in the nervous system. TINS 155:218-226

[6] Houtsma, A.J.M. and J.L. Goldstein (1972). The central origin of the

pitch of complex tones: Evidence from musical interval recognition, J.

Acoustical Soc. of America, 51, 520-529

22


[7] Houtsma, A.J.M. and J. Smurzynski (1990). Pitch identification and

discrimination for complex tones with many harmonics, J. Acoustical

Soc. of America, 87, 304-310

[8] Lemon, W. and W. Getz (2000). Rate code input produces temporal

code output from cockroach antennal lobes, Biosystems 58, 151-158

[9] Mainen, Z.F. and T.J.Sejnowski (1995). Reliability of spike timing in

neocortical neurons, Science 268, 1503-1506

[10] Panzeri, S., S.R. Schultz, A. Treves and E.T.Rolls (1999). Correlations

and the encoding of information in the nervous system, Proceedings

of the Royal Society B

[11] Reinagel, P. and R.C. Reid (2000). Temporal coding of visual infor-

mation in the thalamus, J. Neurosci. 20(14) 5392-5400

[12] Riehle, A., S. Gruen, M. Diessman and A. Aertsen (1997). Spike sy-

chronization and rate modulation differentially involved in motor cor-

tical function, Science 278: 1950-1953

[13] Rieke, F., D. Warland, R.R. de Ruyter van Steveninck and W. Bialek

(1997). Spikes: exploring the neural code. Cambridge, MA:MIT

[14] Shadlen, M.N. and W.T.Newsome (1998). The variable discharge of

cortical neurons: implications for connectivity, computation and cod-

ing, J. Neurosci. 18(10):3870-3896

[15] Shannon, C.E. (1948). A mathematical theory of communication, Bell

Sys. Tech. J.27, 379-423, 623-656 (republished at cm.bell-labs.com)

[16] Strong, S.P., R. Koberle, R.R. de Ruyter van Steveninck and W. Bialek

(1998). Entropy and information in neural spike trains, Phys Rev Lett

80:197-200

[17] Theunissen, F. and J.P.Miller (1995). Temporal encoding in nervous

systems - a rigorous definition, J. Comput. Neurosci. 2:149-162

[18] Victor, J.D. (2002). Binless Stategies for estimation of information

from neural data, Phys Rev E66,051903

Acknowledgment

I would like to express gratitude to my supervisor Dr. Stuart Baker whose

guidance was crucial for the successful completion of this project.

23


Comments

This paper is written as an essay for part III in mathematics. The main

difficulty was defining the problem. After working through several books

(Rieke et al., Dayan and Abbot, Gerstner) and consulting Dr Baker I singled

out the question of time coding. The most important papers (which I found)

were those of Theunissen and Miller (1995), Strong et al. (1998), Borst and

Theunissen (1999) and Reinagel and Reid (2000). The derivation in A.1

includes some own arguments but is related to A.8.2 in Rieke et al (1997).

The derivation in A.2 is similar to Theunissen and Miller (1995).

24



Comments

No comments yet

Add Comment
Your comment is reviewed before being published

Other users also were interested in the following titles:

Erstellen einer schriftlichen Hausarbeit

Author: Claudia Nickel
Presentations, Models, Tutorials, Instructions, 2006 Download as PDF-file for 4,99 EUR

Grundtechniken wissenschaftlichen Arbeitens

Author: Maik Philipp
Presentations, Models, Tutorials, Instructions, 2004 Download as PDF-file for 5,99 EUR

This text can be quoted and accessed from this url:

http://www.grin.com/e-book/107825/information-encoding-in-small-neural-systems
please wait Please wait