Guide CH 10
Guide CH 10
DATA ANALYSIS
This chapter provides guidelines to selecting certain key data acquisition parameters, and
discusses current methods used in analysis of biological data, including fitting models to data and
analyzing single ion-channel recordings. The scope of this discussion is limited to brief,
practical introductions with references to more detailed discussion in this Guide and in the
literature. The manuals for the various software programs can be used for more specific step-by-
step instructions, and the text by Dempster (1993) is a useful collection of standard techniques in
this field.
Sampling Rate
The sampling rate used should be selected considering the particular experiment. Use of
excessively high sampling rates wastes disk space and will increase the time required for
analysis. Furthermore, a higher sampling rate is usually associated with a higher filtering
frequency, which in turn allows a larger amount of noise to contaminate the signal. Subsequent
AXON GUIDE
210 / Chapter ten
analysis of the data may therefore require noise reduction using analysis-time filtering, which can
be time-consuming. Guidelines to choosing the correct sampling rate are discussed in the
following paragraphs (Colquhoun and Sigworth, 1983, and Ogden, 1987).
Biological signals are most commonly analyzed in the time domain. This means that the time
dependence of the signals is examined, e.g. to characterize the membrane response to a voltage-
clamp pulse. The usual rule for time-domain signals is that each channel should be sampled at a
frequency between 5 and 10 times its data bandwidth. Knowing the value of the data bandwidth
is required in order to set the filter cut-off frequency during acquisition and analysis.
For a sinusoidal waveform, the data bandwidth is the frequency of the sine itself. For most
biological signals, the data bandwidth is the highest frequency of biological information of
interest present in the recorded signal. This can be determined directly by examining a power
spectrum of rapidly sampled unfiltered data, though this is rarely done. Alternatively, one can
estimate the number of points per time interval required to give a data record whose points can
be easily "connected" by eye and calculate the sampling rate directly. The data bandwidth and
filter frequency can then be calculated from the sampling rate. For example, if a fast action
potential (1 ms to peak) is to be recorded, 25 samples on the rising phase would yield a
reasonably good 40 µs resolution, requiring a sampling rate of 25 kHz and an approximate data
bandwidth of 5 kHz.
The rules are more straightforward in some special cases. Single-channel recording is discussed
below. For signals with exponential relaxation phases, the sampling rate needed to estimate a
time constant depends on the amount of noise present; for moderately noise-free data, at least 15
points should be taken per time constant over a period of 4 to 5 time constants. Many fitting
routines will fail if sampling is performed over only 3 time constants, since the waveform does
not relax sufficiently far towards the baseline. For a sum of multiple exponentials, the sampling
rate is determined in this way from the fastest phase; sampling must extend to 4 time constants of
the slowest phase. If this would result in too many samples, a split clock (as in the program
CLAMPEX of Axon Instruments' pCLAMP suite) or other methods of slowing the acquisition
rate during the acquisition, could be employed as long as at least 15 points are taken over each
time constant.
When a set of several channels is recorded (e.g., channels 0 through 3), most data acquisition
systems sample the channels sequentially rather than simultaneously. This is because the system
usually has only one analog-to-digital converter circuit that must be shared among the channels
in the set. For example, if four channels are sampled at 10 kHz per channel, one might expect
that they would be sampled simultaneously at 0 µs, 100 µs, 200 µs, etc. Instead, channel 0 is
sampled at 0 µs, channel 1 at 25 µs, channel 2 at 50 µs, channel 3 at 75 µs, channel 0 again at
100 µs, channel 1 again at 125 µs, etc. There is therefore a small time skew between the
channels; if this causes difficulties in analysis or interpretation, a higher sampling rate can be
used to minimize the time skew (but this may cause problems associated with high sampling
rates, as mentioned above).
An additional consideration arises from the fact that on many data acquisition systems, including
the Digidata 1200 from Axon Instruments, the digital-to-analog converter (DAC) is updated
whenever the ADC is read, even if there is no change in the DAC output. This means that the
DAC is updated only at the sample rate over all channels. For example, if a stimulus is a 0 to
Data Analysis / 211
150 mV ramp and 50 samples are acquired from one channel at a sampling interval of 25 µs, the
DAC output will appear as a series of steps each 25 µs long followed by an upward jump of 150
mV/50 = 3 mV, which may be too large for some electrophysiological applications. Therefore, if
a rapidly changing continuous waveform is applied while acquiring slowly, the output waveform
should be checked with an oscilloscope and, if necessary, the sampling interval should be
increased. The computer preview of a waveform cannot be relied upon for this purpose because
it does not account for the effect of sampling. Note, however, that since most users acquire
significantly more samples per sweep than 50, this problem will not occur except in very unusual
situations.
Filtering
The signal should be filtered using an analog filter device before it arrives at the ADC. As
discussed in Chapter 6 and in Colquhoun and Sigworth, 1983 and Ogden, 1987, this is done to
prevent aliasing (folding) of high-frequency signal and noise components to the lower
frequencies of biological relevance.
Acquisition-time filtering of time-domain signals is usually performed using a Bessel filter with
the cut-off frequency (-3 dB point; see Chapter 6) set to the desired value of the data bandwidth.
A 4-pole filter is usually sufficient unless excessive higher frequency noise requires the 6- or 8-
pole version. The Bessel filter minimizes both the overshoot (ringing) and the dependence of
response lag on frequency. The latter two effects are properties of the Chebyshev and
Butterworth filters (see Chapters 6 and 12, or Ogden, 1987), which are less appropriate for time-
domain analysis.
This analysis-time filtering is performed using filters implemented in software. The Gaussian
filter is most commonly used for this purpose because of its execution speed, though the Bessel
filter is employed as well. If one wants to write a computer program for a filter, the Gaussian is
easier to implement than the Bessel filter (see program example in Colquhoun and Sigworth,
1983). Note that all filters alter the waveform; for example, a Gaussian-filtered step function
deviates from the baseline before the time of transition of the unfiltered signal. The user can
examine data records filtered at different frequencies to make sure that there is no significant
distortion of the quantities of interest, such as time of transition or time to peak.
Another common software filter is smoothing, the replacement of a data point by a simply
weighted average of neighboring points, used to improve the smoothness of the data in a record
or graph. In contrast to the smoothing filter, the Bessel and Gaussian types have well-known
filtering (transfer) functions, so that (i) it is easy to specify an effective cut-off frequency, and (ii)
the effect of the filter may be compensated for by a mathematical procedure analogous to a high-
frequency boost circuit in a voltage-clamp amplifier (Sachs, 1983). These advantages are
AXON GUIDE
212 / Chapter ten
important if the frequency properties must be known throughout the analysis. If not, the
smoothing filter is much faster to execute, especially on long data records, and easier to
implement if one writes one's own software.
The derivative function is used to determine the rates of change of a digitized signal. This can be
used to help in peak location (where the derivative will be near zero), transition detection (large
positive or negative derivative), sudden departure from baseline, etc. The derivative will,
however, amplify noise, making it difficult to determine trends. The signal should therefore be
filtered beforehand or fit to a function which can then be used to obtain a noise-free form of the
derivative. However, the worse the fit, the greater the error in the derivative.
Single-Channel Analysis
Articles that present approaches and methods used in single-channel current analysis include
Colquhoun and Hawkes, 1983; Colquhoun and Sigworth, 1983; McManus, Blatz and Magleby,
1987; Sigworth and Sine, 1987; and French et al., 1990.
The current from a single ion channel is idealized as a rectangular waveform, with a baseline
current of zero (closed-channel), an open-channel current dependent on the conductance and
membrane potential, and rapid transitions between these current levels. In practice, this idealized
current is distorted by the limited bandwidth of the apparatus and contaminated by noise. Shifts
Data Analysis / 213
in the baseline may occur during the recordings, and capacitative current spikes are present in
sweeps in which voltage changes are applied across the membrane. The current signal is further
altered by filtering and by periodic sampling. These effects can impede making confident
inferences from data regarding channel behavior.
100 f c*
fc = (1)
dmax FTC*
Here FTC* is the observed rate of false threshold crossings measured using recordings made
with an arbitrary cut-off frequency fc*. FTC* can be measured from idealized single-channel
records generated using a threshold set on the side of the baseline opposite to where transitions
are observed. This analysis can be achieved using the FETCHAN program in the pCLAMP
suite.
Analysis-Time Filtering
The digital Gaussian filter can be used for additional analysis-time filtering of single-channel
records. This introduces symmetrical time delays at both opening and closing and can therefore
be used for the unbiased estimation of latencies using the 50% criterion (see below).
AXON GUIDE
214 / Chapter ten
Baseline Definition
The accuracy of the threshold method for transition detection depends on the stability of the
baseline (i.e., closed-channel) current or, if the baseline is unstable, on the ability of an
automated procedure to correctly identify the baseline as it changes. A number of ways have
been devised to track a moving baseline, including (1) averaging the baseline current level to get
the new baseline; (2) defining the new baseline as that level which maximizes the number of
times that the current signal crosses it during the closed state ("zero crossings" method; Sachs,
1983); and (3) defining the new baseline at the peak of the histogram of the baseline current
amplitude (e.g., G. Yellen, quoted in Sachs, 1983). FETCHAN uses a hybrid approach in which
the average of the most recent closed channel current level is averaged with the old baseline
level, weighted by a selectable factor. Regardless of the method used, the user must carefully
monitor the baseline to ensure that any automatic procedure does not lose track of the baseline
value.
Missed Events
Events will be missed if their true durations are shorter than the dead time of the system, which is
determined by the filter cut-off frequencies used during acquisition and analysis. Events will
also be missed if their superthreshold portions happen to miss the times when the signal is
sampled even if their durations are longer than this dead time. The resulting error is minimal if
the fastest time constant in the system is much longer than the sampling interval because few
events will be shorter than the sampling interval. If this is not the case, the sampling rate must be
increased with respect to the filter cut-off frequency (see the relevant sections above).
False Events
The probability of detecting false events depends on the amount and spectrum of noise in the
system, the filter characteristics and the rate of sampling (French et al., 1990).
Multiple Channels
The presence of multiple channels complicates the determination of the kinetic behavior of a
channel. If a record shows a transition from two open channels to one open, it cannot be
determined if the transition was due to the closing of the first or the second channel. A number
of methods have been proposed to deal with this ambiguity (French et al., 1990). As a
precaution, the amplitude histogram of the raw data can be inspected to determine if multiple
channels are present.
(1) the dwell time histogram, in which the duration of events is binned and which can be related
to state models; (2) the first latency histogram, in which the period of time from stimulus onset
to first opening is binned and which is used to extract kinetic information; and (3) the amplitude
histogram, in which the amplitudes of the levels or of all points in the records are binned and
yield information about the conductance, the system noise and the incidence of multiple
channels.
Histograms
The simplest histogram is one in which a series of bins are defined, each of which has an
associated upper and lower limit for the quantity of interest, e.g., dwell time. Each dwell time
will then fall into one of the bins, and the count in the appropriate bin is incremented by one. In
the cumulative histogram, a bin contains the number of observations whose values are less than
or equal to the upper limit for the bin.
AXON GUIDE
216 / Chapter ten
A second problem is called sampling promotion error. Sampling promotion error (Sine and
Steinbach, 1986) occurs because data are sampled periodically. Suppose data were acquired with
1 ms sampling rate and dwell times binned into a histogram with bin width equal to the sampling
rate (1 ms) and centered around 3 ms, 4 ms, 5 ms, etc. The 4 ms bin would therefore contain
events whose true dwell times lie between 3 and 5 ms (Figure 2). If the dwell times fall
exponentially with increasing times, the 4 ms bin would contain more events from 3 to 4 ms than
from 4 to 5. The subsequent fit would treat all these events as if they had occurred at 4 ms (the
midpoint), thereby resulting in an error. In a single exponential fit, this affects only the intercept
and not the time constant; but there may be more subtle effects in a multi-exponential fit. The
error is small when the bin width is much smaller (perhaps by a factor of 5) than the fastest time
constant present.
1 ms
sample
times
3 ms
event
5 ms
event
A third problem is termed binning promotion error. In an exponentially falling dwell time
distribution, a bin is likely to contain more events whose true dwell times are at the left side of
the bin than are at the right side. The average dwell time of the events in that bin is therefore less
than the midpoint time t. The error occurs when a fit procedure assumes that all the bin events
are concentrated at a single point located at the midpoint t, instead of at the average, which is less
than t. Binning promotion error can occur in addition to sampling promotion error because
binning takes place in a different step. Both of these errors are due to the asymmetric
distribution of true dwell times about the bin midpoint. A correction procedure has been
proposed by McManus et al. (1987).
AXON GUIDE
218 / Chapter ten
The degree of these bin-width-related errors may be reduced independently of the corrective
procedures mentioned above, if the fit procedure explicitly uses the fact that a bin actually
represents an area instead of an amplitude. Some errors may be eliminated if the individual data
points are used without ever using a histogram, as in the maximum likelihood fitting method.
Lastly, as discussed in the section on missed events, short events may not be detected by the 50%
threshold criterion. This can give rise to a number of errors in the extracted fit parameters,
which relate specifically to state models. For further details, consult the references cited in
French et al., 1990.
Amplitude Histogram
Amplitude histograms can be used to define the conductances of states of single channels. Two
kinds of amplitude histograms are common: point histograms and level histograms. The former
use all the acquired data points; they are useful mainly for examining how "well-behaved" is a
data set. Abnormally wide distributions may result from high noise if the signal were over-
filtered, or if baseline drift had been significant. Similarly, the number of peaks will indicate the
presence of multiple channels or subconductance states. The all-points histogram will probably
not be useful for determining the conductances unless baseline drift is small. Level histograms
use only the mean baseline-corrected amplitudes associated with each event in the events list.
Such histograms can be fitted to the sum of one or more Gaussian functions in order to estimate
conductances.
Fitting to Histograms
Amplitude and dwell time histograms can be fitted by appropriate functions, usually sums of
several Gaussian functions for the former and sums of exponentials for the latter (see the section
on Fitting below). The time constants and amplitudes can be related to the parameters of several
single-channel models, but this will not be described here.
Histogram bins containing zero counts should usually be excluded from a fit because the chi-
square function (see below) is not defined when a bin i contains Ni = 0 counts and therefore has
σi = 0. Alternatively, adjacent bins can be combined to yield a nonzero content.
Fitting
(1) A function could be fitted to a data set in order to describe its shape or behavior, without
ascribing any "biophysical" meaning to the function or its parameters. This is done when a
smooth curve is useful to guide the eye through the data or if a function is required to find
the behavior of some data in the presence of noise.
Data Analysis / 219
(2) A theoretical function may be known to describe the data, such as a probability density
function consisting of an exponential, and the fit is made only to extract the parameters, (e.g.,
a time constant). Estimates of the confidence limits on the derived time constant may be
needed in order to compare data sets.
(3) One or more hypothetical functions might be tested with respect to the data, e.g., to decide
how well the data were followed by the best fit function.
The fitting procedure begins by choosing a suitable function to describe the data. This function
has a number of free parameters whose values are chosen in order to optimize the fit between the
function and the data points. The set of parameters that gives the best fit is said to describe the
data, as long as the final fit function adequately describes the behavior of the data. Fitting is best
performed by software programs; the software follows an iterative procedure to successively
refine the parameter estimates until no further improvement is found and the procedure is
terminated. Feedback about the quality of the fit allows the model or initial parameter estimates
to be adjusted manually before restarting the iterative procedure. Fitting by pure manual
adjustment of the parameters (the so-called "chi by eye") may be effective in simple cases but is
usually difficult and untrustworthy in more complex situations.
The two following topics will be briefly discussed below: statistics, i.e., how good is the fit and
how confident is the knowledge of the parameters, and optimization, i.e., how to find the best fit
parameters. The statistical aspects are well discussed in Eadie et al. (1971); Colquhoun and
Sigworth (1983) provide examples relevant to the electrophysiologist. A number of aspects of
optimization are presented in Press et al. (1988).
AXON GUIDE
220 / Chapter ten
Gaussian. The value τ* at the peak of the function is called the maximum likelihood value
of τ. The root-mean-square spread of the function about τ is known as the standard
deviation of τ*, though this is not the same as a standard deviation of a set of numbers in the
direct probability case.
ln L
τ∗
τ
It turns out that τ* reliably converges to the true time constant if the number of events N is
sufficiently large. For this reason it is important to either collect a large enough number of
data points or repeat the experiment several times so as to reduce the variation of the
parameters obtained over the data sets to an acceptable level. If the noise is large or there
are sources of significant variability in the signal, the data may be useless except in a
qualitative way because of large variations of the best fit parameters between the runs. If
one long run is taken instead of several smaller ones, the run can be broken up into
segments and the analysis results of the segments compared with each other to assure that
convergence is near.
Although the maximum likelihood method is the most reliable, the time requested for the
calculations may be prohibitively long. The chi-square method, described below, is an
alternative that requires less time.
p ( yi − yi* )2
χ2 = ∑ (2)
i =1 σi2
where y*i is the fit value corresponding to yi. Minimizing chi-square is also called the least-
squares method. If the fit is made to a data sweep, each yi is the value measured at the time
xi, and each σi is the standard deviation or uncertainty of that value. In the typical case
when all the σi's are equal (i.e., the uncertainty in the data does not depend on the time), the
σi's can be ignored while performing the search for the best fit parameters, but must be
specified if the goodness of fit is to be calculated. If the fit is made to a histogram, each yi is
the numbers of events Ni in bin i, and each σi is √Ni.
It is much easier to maximize the chi-square function than to minimize the likelihood
function, whether for data or for many mathematical functions used as models. Since the
use of the chi-square function is equivalent to the use of the likelihood function only if the
uncertainties in the data (e.g., noise) are distributed as Gaussians, the correctness of a least-
squares fit can depend on the characteristics of this uncertainty.
Often several models are fitted to a single set of data so as to define the best-fit model.
Horn (1987) discussed choosing between certain types of models using their respective chi-
square values (with the F test) or the logarithm of the ratio of the likelihoods, which follows
the chi-square distribution. These tests can help decide whether the model contains
irrelevant parameters, e.g., if a three-exponential function was fitted to a set of data
containing only two exponentials.
Suppose there are two parameters a and b to be fit. One can make a two-dimensional
contour plot of the likelihood or chi-square function, with each axis corresponding to one of
the parameters, showing the probability of the limits including the true parameter values
(Figure 4). The resultant ellipse is usually inclined at an angle to the parameter axes due to
the correlation of the two parameters. These correlations tend to increase the confidence
limits, which are indicated by the dotted lines which project to the respective axes. The
probability that the confidence limits for the best fit a includes the true a is 0.683, but this
AXON GUIDE
222 / Chapter ten
does not specify anything about b, as indicated by the shaded region. If, in addition, limits
±σb are specified for the best fit b, the joint probability that both sets of confidence limits
include the true a and b has an upper limit of (0.683)2 or 0.393, i.e., the probability content
of the area inside the ellipse. If one has a five exponential fit with an offset, the analogous
cumulative joint probability will be (0.683)11, or 0.015, which is quite small.
b * + σb
b * -σb
a* + σa a* + σa
Methods of Optimization
Optimization methods are concerned with finding the minimum of a function (e.g., the chi-
square) by adjusting the parameters. A global minimum, i.e., the absolute minimum, is clearly
preferred. Since it is difficult to know whether one has the absolute minimum, most methods
settle for a local minimum, i.e., the minimum within a neighborhood of parameter values. A
number of algorithms have been developed to find such a minimum. For example, to find time
constants and coefficients in an exponential fit, the pCLAMP program pSTAT allows the user to
choose between the following:
Of the three methods, the Simplex method is fast and relatively insensitive to shallow local
minima. Though it will reliably find the region of the global minimum or maximum, it may not
find the precise location of the minimum or maximum if the function is rather flat in that vicinity.
The Levenberg-Marquardt method is slower and more easily trapped in local minima of the
function, but it can provide better fits than the Simplex because it uses the mathematical
characteristics of the function being minimized to find the precise location of the minimum or
maximum, within the numerical resolution of the computer. This method also provides statistical
information sufficient to find the confidence limits.
These methods are iterative, i.e., they continue refining parameter values until the function stops
changing within a certain convergence criterion. They also require reasonable starting estimates
for the parameters, so that the function to be minimized or maximized is not too far away from its
optimum value; a poor starting set can lead some fit programs to a dead end in a shallow local
minimum.
Axon Instruments' analysis programs CLAMPFIT for the IBM-PC and AxoGraph for the Apple
Macintosh provide a non-iterative method, in which the data points and function to be fit are
transformed using a set of orthogonal Chebyshev polynomials, and the fit function coefficients
are quickly calculated using these transformed numbers in a linear regression. This method is
very fast and requires no initial guesses, though the parameters may be slightly different than
those found by the methods listed above because the underlying algorithm minimizes a quantity
other than the sum of squared differences between fit and data.
References
Colquhoun, D. and Sigworth, F.J. Fitting and Statistical Analysis of Single-Channel Records. in
Single-Channel Recording. Sakmann, B. and Neher, E., Eds. Plenum Press, New York, 1983.
Colquhoun, D. and Hawkes, A.G. The Principles of the Stochastic Interpretation of Ion-
Channel Mechanisms. in Single-Channel Recording. Sakmann, B. and Neher, E., Eds.
Plenum Press, New York, 1983.
Eadie, W.T., Drijard, D., James, F.E., Roos, M, Sadoulet, B. Statistical Methods in Experimental
Physics. North-Holland Publishing Co., Amsterdam, 1971.
Horn, R. Statistical methods for model discrimination. Biophysical Journal. 51:255-263, 1987.
McManus, O.B., Blatz, A.L. and Magleby, K.L. Sampling, log binning, fitting, and plotting
distributions of open and shut intervals from single channels and the effects of noise. Pflügers
Arch, 410:530-553, 1987.
AXON GUIDE
224 / Chapter ten
Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T. Numerical Recipes in C.
Cambridge University Press, Cambridge, 1988.
Sigworth, F.J. and Sine, S.M. Data transformations for improved display and fitting of single-
channel dwell time histograms. Biophysical Journal, 52:1047-1054, 1987.
Sine, S.M. and Steinbach, J.H. Activation of acetylcholine receptors on clonal mammalian
BC3H-1 cells by low concentrations of agonist. Journal of Physiology (London), 373:129-162,
1986.
Wonderlin, W.F., French, R.J. and Arispe, N.J. Recording and analysis of currents from single
ion channels. in Neurophysiological Techniques. Basic Methods and Concepts. Boulton,
A.A., Baker, G.B. and Vanderwolf, C.H., Eds. Humana Press, Clifton, N.J., 1990.
Data Analysis / 225
AXON GUIDE