0% found this document useful (0 votes)
194 views11 pages

MITT Open-Source MATLAB Algorithms For The Analysis of High-Frecuency Flow Velocity Time Series Datasets - MacVicar 2014

Sedimentos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
194 views11 pages

MITT Open-Source MATLAB Algorithms For The Analysis of High-Frecuency Flow Velocity Time Series Datasets - MacVicar 2014

Sedimentos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

Computers & Geosciences 73 (2014) 8898

Contents lists available at ScienceDirect

Computers & Geosciences


journal homepage: www.elsevier.com/locate/cageo

Multi-instrument turbulence toolbox (MITT): Open-source MATLAB


algorithms for the analysis of high-frequency ow velocity time
series datasets
Bruce MacVicar a,n, Scott Dilling a, Jay Lacey b
a
b

Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
Department of Civil Engineering, Universit de Sherbrooke, Sherbrooke, Quebec, Canada J1K 2R1

art ic l e i nf o

a b s t r a c t

Article history:
Received 28 April 2014
Received in revised form
8 September 2014
Accepted 10 September 2014
Available online 20 September 2014

The measurement of ow velocity at high frequencies (20200 Hz) has been made easier over the last
couple of decades by the development and commercialization of a variety of instruments, many of which
are capable of measuring multiple sampling volumes simultaneously. A variety of methods has been
proposed to remove errors in velocity time series and classify the quality of data. However, most
methods are applied using custom algorithms written to treat custom data formats and remain hidden
from a wider audience. The objective of this paper is to introduce and document a new set of opensource algorithms, written in Matlab, that comprise the Multi-Instrument Turbulence Toolbox (MITT).
The algorithms are designed to: (i) organize the data output from multiple instruments into a common
format; (ii) present the data in a variety of interactive gures for visualization and assessment; (iii) clean
the data by removing data spikes and noise; and (iv) classify data quality. We hope that these algorithms
will form the nucleus of an evolving toolbox that will help to accelerate the training of hydraulic
researchers and practitioners, ensure a consistent application of methods for turbulence analysis,
remove the bias of poor quality data from scientic literature, and ease collaboration through the sharing
of data and methods.
& 2014 Elsevier Ltd. All rights reserved.

Keywords:
Hydraulics
Turbulence
Acoustic doppler velocimeter (ADV)
Spike replacement
Data
Quality analysis
Visualization
Matlab

1. Introduction
Measurements of ow velocity are frequently obtained in
open-channels and other environments to calculate parameters
such as discharge and bed shear stress, and to characterize ow
turbulence. As instrumentation has improved, it has become easier
to record multipoint and simultaneous velocity time series at high
frequencies (20200 Hz). These technological advances have
greatly reduced the time and cost required to obtain velocity time
series, but all instruments are subject to measurement error and
noise, and accurately assessing data quality and eliminating or
replacing poor quality data remains a prerequisite to obtaining
representative results. Given that the number and complexity of
error correction/analysis techniques have increased along with
dataset size, the task of data quality analysis has become even
more onerous. A set of open-source algorithms to organize and
analyze turbulence data would help to ensure methods are applied
consistently, accelerate training and the exchange of data, and ease

Corresponding author. Tel.: 1 519 888 4567x38897.


E-mail address: bmacvicar@uwaterloo.ca (B. MacVicar).

http://dx.doi.org/10.1016/j.cageo.2014.09.002
0098-3004/& 2014 Elsevier Ltd. All rights reserved.

the comparison of results from different studies and different


instruments.
Algorithms to treat turbulence data can be roughly classied
into three types: (i) closed source algorithms developed by instrument manufacturers; (ii) closed source algorithms developed by
independent users; and (iii) open source algorithms developed by
independent users. Algorithms developed by instrument manufacturers are typically provided to license holders upon purchase of
the instrument and allow for data acquisition, visualization and
post-processing (e.g. the Metow software for the Ultrasonic
Doppler Velocity Prolers (UDVP) (SA, 2002)). Although in some
rare cases it may be possible to integrate these programs into
custom interfaces and analysis tools, for example through a library
of Active X functions in the Metow software (SA, 2002), the
software packages developed by instrument manufacturers are
generally limited because they work with only one type of
instruments, do not allow the integration of new methods/algorithms, and use custom le formats. Some algorithms developed
by independent users are distributed as closed-source software to
users. In turbulence analysis, for example, the WinADV program is
widely used to visualize and post-process data recorded using
Sontek Acoustic Doppler Velocimeters (ADV) and Nortek Vectrinos (Wahl, 2000). Despite the utility of this program, its closed

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

source structure restricts its exibility for testing new analysis


procedures and instruments. Other algorithms developed by independent users are open source. Some of these algorithms are
published (Le Roux and Brodalka, 2004; Martini et al., 2005;
Stapleton and Huntley, 1995), but many others remain inaccessible
on private computers and servers. The ad hoc development and
typically poor documentation of these programs generally prevent
their widespread use. Following such examples as the Velocity
Mapping Toolbox (Parsons et al., 2013), created for mapping
acoustic Doppler current proler (ADCP) datasets, there is a need
to bridge the gap between closed software packages and algorithms developed by independent users by developing an open
source library of turbulence analysis programs in a widely used
programming language.
The objective of this paper is to introduce and demonstrate a
new set of algorithms that comprise the Multi-Instrument Turbulence Toolbox (MITT). The algorithms are designed to: (i) organize
the data output from multiple instruments into a common format;
(ii) present the data in a variety of interactive gures for visualization; (iii) clean the data by removing data spikes and noise; and
(iv) classify data quality. The scope of the toolbox is currently
limited to high frequency (Z20 Hz) instruments that are commonly used in eld and lab experiments of open-channel ow. It is
hoped that users of the toolbox will take advantage of the open
architecture to add their own functionality over time by including
other instruments and advanced analyses of velocity time series.
The algorithms and example les are available for free download
on the Matlab File Exchange server (http://www.mathworks.com/
matlabcentral/leexchange).

2. Program description
The MITT algorithms are written as Matlab functions due to the
simplicity and popularity of this high-level programming language. A signicant advantage of high level languages is that
many other algorithms and toolboxes are available to perform
routine tasks. An example of an available algorithm is pwelch,
included in the Signal Processing Toolbox, which computes periodograms of subsets of the full time series and then averages the
results to obtain a smoothed periodogram for which it is easier to
identify anomalies in frequency space (Welch, 1967). Please note
that courier font is used to indicate function names and
italicized courier font is used to indicate variable names.
Other examples include algorithms for box plotting, curve tting,
and creating programmable objects such as push buttons. The
Statistics, Signal Processing, and Curve Fitting Toolboxes are
required as part of the Matlab license to perform the full range
of calculations in the MITT algorithms. The code was written using
Matlab v2012b and is not backwards compatible due to changes in
Matlab function syntax and availability. MITT users will benet
from a working knowledge of standard and object-oriented
programming in Matlab.
MITT is launched from the command line within the Matlab
environment. A graphical user interface (GUI) is shown that allows
the user to select les and options for the analysis and visualization of the data (Fig. 1). The three major computational blocks,
referred to herein in bold courier font as Organize, Clean,
and Classify, must initially be run in that order to ensure proper
program function, but any of the blocks can be rerun. The blocks
are actioned by ticking Organize raw data into Data and Cong
array, Clean raw time series, and Classify quality of time series
on panel (b) of Fig. 1, respectively. Organize translates the raw
data from different types of instruments into a common format,
Clean block de-spikes and lters velocity time series, and Classify assesses the quality of recorded time series in an array. In

89

each block, options are selected by the user to apply the desired
methods. Classify can be run as an automated procedure or
through a second interactive GUI (Fig. 2). The following sections
describe the common data format including the supported instrument output formats, the program architecture, and the data
quality assessment and visualization tools.
2.1. Organization of data
A common data format, modeled after the output format of
the Nortek Vectrino II (VII) velocity proling instrument, is
used to save the output from MITT. Two structure arrays called
Data and Config store the recorded data and conguration
parameters, respectively. The advantage of using structure
arrays is that variable names can be given sufcient description
so that the content is clearly identiable. For instance, the
transformation matrix used to convert between measured beam
velocities and orthogonal velocity components for acoustic type
instruments is stored in Config.transformationMatrix.
Each velocity component is stored in a subeld of the Data
structure array (for e.g. Data.Vel.u for the streamwise velocity). The naming convention used in the VII data is followed so
that the orthogonal components u, v, and w represent velocity
in the streamwise (x), lateral (y) and vertical (z) directions,
respectively. The ability to attach descriptive names to the data
while storing it in a format that can be readily passed around
between programs allows for the development of a exible
platform for data analysis that conceivably will have many
different users. Visualization and analysis programs are written
to accept the Data and Config structure arrays from any
instrument.
MITT is capable of handling instruments that sample either a
single volume (called a cell herein) or multiple cells in parallel
(i.e. simultaneously or quasi-simultaneously) or in series (i.e. one
after another). Single cell instruments for which Organize algorithms have been created include the Sontek Acoustic Doppler
Velocimeter (ADV), the Nortek Vectrino, and the MarshMcBirney Electomagnetic Current Meter (ECM). These instruments
can be linked to record multiple cells in parallel. Multiple cell
instruments for which Organize algorithms have been created
include the VII and the Metow Ultrasonic Doppler Velocity
Proler (UDVP). Multiple UDVP probes can be multiplexed but
the software records data from different probes in series. Within
the appropriate subelds, single cell time series are stored in
single columns while multiple cells in parallel are stored in a
matrix with the rows and columns representing the time interval
and the cell number, respectively. Multiple cells recorded in series
are stored in separate les with their own Data and Config
structure arrays. Neither low frequency acoustic instruments such
as the Sontek Acoustic Doppler current Proler nor high
frequency instruments such as hot lm probes, hot wire probes,
Laser Doppler Velocimeters, and Particle Image Velocimetry (PIV)
techniques are currently supported in MITT, although there is no
conceptual reason why they could not be included in the future.
A comma delimited text le (n.csv format), called the CSVcontrol le, is a required input to MITT. In the CSVcontrol le, the data
les to be analyzed are listed in rows, all of which should be stored
in the same folder as the control le. Information (or variables)
about the recorded data series such as instrument location and
water depth (i.e. anything not recorded by the instrument itself) is
listed in columns. Each column must have two header rows that
list the name of the parameter (e.g. waterDepth) and a format
string (Table 1). Format strings must be in matlab format (e.g. %s
for a string, %d for an integer, and %f for a oating point real
number). Required eld names and descriptions are listed in
Table 2. Parameters are stored as subelds in Config.

90

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

Fig. 1. MITT launch window with all panels active and open. The csvcontrol le is selected on panel a. Messages regarding program operation are shown in the blue-green
rectangle. Computational blocks are activated on panel b. Options for Organize, Clean and Classify are selected on panels c, d, and e, respectively. The pushbutton f
launches the analysis.

Sampling cell positions can be specied in one of two ways. For


single cell instruments such as the ADV, the easiest method is to
enter the coordinates (xpos, ypos, and zpos for the streamwise,
lateral and vertical position relative to the bed) directly in the
CSVcontrol le. In some cases, however, particularly for multi-cell
instruments, it is not possible to enter all cell positions. In such a
case the option exists to create a custom subprogram that will
calculate the positions of all cells from some reference location.
Reference location parameters should be entered as columns in
the CSVcontrol le so that they will be saved in Config where they
can be accessed by the custom subprogram.
An optional component of Organize is the denition of the
test section water and bed surface elevations. The test section can
be dened either as a uniform or a non-uniform channel using
options listed on the launch window (Fig. 1). A trapezoidal crosssection is dened for the uniform channel option, which requires
the specication of the bed slope (S), the bottom channel width
(B), the water depth (Z), the length of the test section (L), and the
side slope (m specied as a ratio of the horizontal to the vertical
distance) (Fig. 3). Triangular and rectangular channels can be
specied by setting B 0 and m 0, respectively. The origin is
located at the channel bed centerline at the upstream limit of the
test section. Non-uniform topography can be specied by selecting
a n.csv le or a custom subprogram to perform the calculations.
The n.csv le must have four columns that list the channel bed
coordinates (xchannel, ychannel, and zchannel) and the water
surface elevation (wsurf) following the same format shown in
Table 1. This format is suitable for a eld setting where points are
collected with a total station or GPS. In all cases, the denition of
the test section produces gridded interpolants of the bed and the

water surface (bedElevation and waterElevation, respectively) that are stored as subelds in one and two dimensional
structural arrays (oneD and twoD), which are then stored in
Config.
To assist user with launching an analysis of their own data, a
set of data for each instrument has been provided along with
CSVcontrol les and is available on the Matlab File Exchange
server. While it is beyond the scope of this article to fully describe
their experimental apparatus and methods, the data is largely
from published studies, and the examples of good and bad quality
can help to accelerate the learning process. The VII, UDVP, and
ECM data require custom sampling locations algorithms to function properly, and these programs have been included in MITT.
CalcXYZUWFlume calculates cell positions for a ume experiment
using a VII based on the position of the probe head given in the
control le (xpos1, ypos1, and zpos1) and instruments internal
parameters (see MacVicar et al., 2014 for a description of this
work). CalcXYZUIllinoisFlume performs a similar calculation
but with more parameters so that the position of an array of UDVP
probes can be calculated from recorded positions of the instrument cart and holder relative to a guide rail (e.g., tdist is the
length of a custom built holder for the probes). CalcIllinoisFlumePos needs to be used to specify the non-uniform channel
bed-conguration for the experiment in this case as it is necessary
to calculate the elevation of the guide rail to determine sampling
cell locations (see MacVicar and Best, 2013 for a description of this
work). CalcXYZMoras calculates cell positions relative to the bed
and water surface for an array of ECMs from positions measured
on a system of light bridges during ood events at a eld site. A set
of scattered topographical measurements can be used to specify

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

91

Fig. 2. Interactive Quality Control Plot. Panel a is active when the gure is created. The user selects arrays on panel a. When the Done button is pushed, panels b and c
are enabled, which allows Config subelds to be set as the y axis variable. Config.zZ is set as the y-axis of the three subplots and its range is from 0 to 1. Drop down lists on
panels c, d, and e allow the user to set selected Data subelds and statistics to be calculated for those subelds as the x-axis variables. The mean, standard deviation, and
kurtosis for Data.Filtered.Beam1 from all selected arrays are shown in subplots I, II, and III, respectively. Axes limits can be changed or reset. Options activated on panel
f for a single active array include: (1) plot sampling volume positions (Fig. 6), (2) plot one series (Fig. 4 series to plot is selected using the mouse), (3) plot an array image
(Fig. 5), or (4) classify data quality. Classify options are shown on panel g. For time series that are classied as poor quality, a horizontal shaded grey line is drawn in the
subplots A, B, and C. Users can change the classication of cells by pushing the Manually Adjust button and selecting cells with the mouse.

Table 1
csvcontrol le format example. Probe elevations should be entered as true
elevations (i.e. not relative to the bed). Columns may be in any order and additional
columns may be used to save additional parameters as subelds in Cong. For
example, the discharge Q would be saved as Cong.Q.
instrument lename
%s
%s
Vectrino
Vectrino

zpos xpos ypos waterDepth bedElevation Q


%f
%f
%f
%f
%f
%f

Example1 0.2
Example2 0.4

0.5
0.5

0.4
0.4

1
1

0
0

0.054
0.054

Table 2
Variables required for analysis and visualization algorithms. These variables can be
input directly using the CSVcontrol le (Table 1) or can be calculated using custom
subprograms.
Parameter

Description

Instrument type. Current options are ECM, ADV, Vectrino,


VectrinoII, or UDVP
filename
File where raw data is stored. File should be located in same
directory as the control le.
waterDepth
Depth of water at measurement location (m)
bedElevation Bed elevation at measurement location (m)
xpos
Streamwise position of sampling volume (m)
ypos
Lateral position of sampling volume (m)
zpos
Vertical position of sampling volume (m)

Fig. 3. Denition of optional uniform channel and orientation. Flow is from top to
bottom so that the origin is dened at the centerline of the upstream limit of the
measurement section.

the non-uniform channel in this case (Moras topo 151004.csv)


and the surveyed positions of the bridges (bridge020604.mat) are
also required for correct positioning of the ECM data (see MacVicar
and Roy 2007 for a description of this work).

instrument

2.2. Program architecture


Consistent program architecture was used to ensure that algorithm function is transparent and readily modied by other users.
All algorithms with the inputs they receive and return and a short
description of program function are listed in Table 3. Five levels of
algorithms were dened:

92

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

Table 3
Full list of MITT algorithms with descriptions.
Algorithm Name

Level Called From

Subprograms

Description

MITT

Command line

Opens the launch window

OrganizeInput

MITT

CleanSeries

MITT

ClassifyArrayGUI

MITT

ClassifyArrayAuto

MITT

CalcChannelMesh

OrganizeInput

OrganizeInstrumentData 3
CleanSpike
3

OrganizeInput
CleanSeries

OrganizeInput
CleanSeries
ClassifyArrayGUI
ClassifyArrayAuto
makefaQCbuttons
DefaultfaQC
subGetValues
subSetValues
subFieldnames
OrganizeInstrumentData
CalcChannelMesh
CleanSpike
CleanFilter
PlotTimeSeries
CalcGoodCells
CalcArrayStats
Plot1Series
PlotTimeSpace
makefaQCbuttons
getAnames
subGetValues
subSetValues
subFiltArray
getAnames
InterpUniformChan
InterpNonUniformChan
custom subprograms
ConvCSV2Struct
SpikeStdev
SpikeSkewness
SpikeGoringNikora
SpikeVelCorr

Plot1Series

Organizes Instrument output into Data and Cong arrays


Controls the de-spiking and ltering of time series and optional time
series plot
Plots array statistics and interactively identies bad cells from data
array

Automatically identies bad cells from data array


Calculates 1D and 2D topography and water surface

Get raw data from Instrument output les and save in Cong, Data.
Controls detection and removal of spikes from time series

Controls frequency ltering of time series


Controls automatic time series plot

CleanFilter
PlotTimeSeries

3
3

CalcGoodCells

InterpUniformChan

CalcChannelMesh

ClassifyCor
Classifyxcorr
ClassifyNoiseRatio
ClassifySpike
ClassifyNoiseFloor
ClassifyPoly
DefaultfaQC

InterpNonUniformChan

CalcChannelMesh

CalcArrayStats
Plot1Series
PlotTimeSpace
PlotQCTable

4
4
4
4

subprograms

SpikeSkewness
SpikeGoringNikora
SpikeReplace

4
4
4

SpikeReplace
SpikeReplace

Detects spikes based on skewness


Detects spikes using the Goring Nikora (2002) algorithm
Replaces detected spikes in time series

ClassifyCor
ClassifyNoiseRatio

4
4

ClassifyArrayGUI
PlotTimeSeries
ClassifyArrayGUI
ClassifyArrayAuto
ClassifyArrayGUI
CleanSpike
CleanSpike
SpikeSkewness
SpikeGoringNikora
CalcGoodCells
CalcGoodCells

Classifyxcorr

CalcGoodCells

ClassifySpike
ClassifyNoiseFloor

4
4

CalcGoodCells
CalcGoodCells

ClassifyPoly

CalcGoodCells

ConvCSV2Struct

ConvMulti2Struct

Converts a multidimensional array into structure elds

ConvStruct2Multi

Converts structure elds into a single multidimensional array

ConvXYZ2Beam

OrganizeInput
OrganizeECMData
PlotTimeSeries
CleanSpike
CleanFilter
ClassifyArrayGUI
PlotTimeSeries
CleanSpike
CleanFilter
CleanSpike

Classies data quality using measurement correlation


Classies data quality using noise from cross-spectra evaluations
(Hurther and Lemmin, 2001)
Classies data quality based on cross-correlation between adjacent
cells
Classies data quality based on percentile of detected spikes
Classies data quality based on slope of inertial subrange in
frequency space
Classies data quality based on 3rd order polynomial ts to mean,
standard deviation, and skewness proles
Reads a n.csv le and saves each column as a eld in a structure array

CleanSeries
CleanSeries
ClassifyArrayGUI
ClassifyArrayGUI,
ClassifyArrayAuto

Assesses quality of sampled cells based on various exible criteria

Calculates 1D and 2D topography for a uniform channel (geometry is


specied in Launch Window)
Calculates 1D and 2D topography for a nonuniform channel (data is
loaded from a n.csv le)
Calculates a statistical parameter from a data array
Creates times series plot
Creates array image plot
Create table that shows results from cell quality classication

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

93

Table 3 (continued )
Algorithm Name

Level Called From

makefaQCbuttons
DefaultfaQC

5
5

getAnames

subGetValues
subSetValues
subFieldnames

5
5
5

MITT, ClassifyArrayGUI
MITT
CalcGoodCells
ClassifyArrayGUI
ClassifyArrayAuto
MITT, ClassifyArrayGUI
MITT, ClassifyArrayGUI
ClassifyArrayGUI

Subprograms

Description

Converts xyz to beam coordinates using calibration matrix (acoustic


instruments only)
Creates buttons on panel for Classication options
Contains default values for Classication options

Gets names of all analyses that have been performed

Get values from buttons, editable elds and other GUI input elds
Set values for buttons, editable elds and other GUI input elds
Extracts eld names, including all substructures within a structure

1. Master program (MITT) receives user input data through the


GUI launch window (Fig. 1). Input data is stored in a structure
array called GUIControl and passed to the main computational blocks for execution. MITT is written in an objectoriented code structure;
2. Main computational blocks receive GUIControl from MITT.
Organize (OrganizeInput) must be run prior to the Clean
and Classify blocks (CleanSeries and ClassifyArrayAuto/ClassifyArrayGUI, respectively). OrganizeInput controls a set of algorithms that open the CSVcontrol le,
dene the test section conguration and sampling locations,
load the raw data from instrument-specic custom output
formats, and save the information in the MITT standard Data
and Config structure arrays. One Matlab storage le (n.mat) is
created for each input le listed in the CSVcontrol le and saved
in a subdirectory MITT that is created in the directory where
the CSVcontrol le is located. OrganizeInput, ClassifyArrayAuto, and ClassifyArrayGUI receive GUIControl, load
Data and Config from all n.mat les stored in the MITT
directory, and effectuate analyses to clean the time series and
classify the quality of data, respectively;
3. Task management subprograms for each block receive Data,
Config and the options stored in GUIControl and return the
results by adding subelds to Data and Config. An example of a
level three algorithm is CleanSpike, which creates empty
matrices to store results, sends velocity time series one at a
time to the de-spiking algorithms, and adds the de-spiked data
as new subelds to Data;
4. Computational subprograms receive only the data necessary for
a particular computation. For example, the SpikeGoringNikora algorithm receives one velocity time series, the sampling
frequency, a coefcient to control the spike detection threshold,
and the desired spike replacement technique. Following the
spike detection and replacement, the de-spiked data is returned
to CleanSpike. Analytical subprograms are designed to function with or without the superstructure provided by the upper
three algorithm levels of MITT. It is anticipated that other
researchers will have their own algorithms, working approximately at this level, that may be added to the MITT program in
the future;
5. Utility functions can be called from anywhere and are commonly used to convert data types. An example of a level
5 algorithm is ConvCSV2Struct, which converts data stored
as columns in a comma delimited text le to subelds in a
structure array.
2.3. Cleaning velocity time series
MITT is designed to make it easy to clean velocity time series by
providing algorithms for common methods and visualization tools

for qualitative assessments. Erroneous data spikes are known to


occur due to a variety of factors including phase wrapping, high
shear and entrained air bubbles within the sampling volume,
insufcient seeding, and reections off the bed (Cea et al., 2007;
Doroudian et al., 2007; Goring and Nikora, 2002; Lane et al., 1998;
McLelland et al., 1999; Snyder and Castro, 1999; Voulgaris and
Trowbridge, 1998). Erroneous data spikes need to be removed to
avoid biasing mean and turbulent velocity statistics. In MITT, data
spikes can be detected using one or more of four methods:
1. Velocity threshold (SpikeStdev): xed minimum and maximum velocity thresholds can be specied using the following
algorithm:
uT U 7 C 1 su
where U is the mean velocity (of any component or beam
direction), su is the standard deviation, C1 is a user specied
constant, and is the universal threshold. For random white
noise, p

can be
calculated from probability distribution theory
as 2 ln n, where n is the number of independent samples
(i.e. the length of the time series) (Goring and Nikora, 2002).
This method is identical to the maximum/minimum threshold
lter applied by Cea et al (2007), with the exception that the
coefcient C1 (default value of 1.0) is provided here for user
exibility.
2. Skewed velocity threshold (SpikeSkewness): a xed minimum or maximum velocity threshold can be specied using
the following algorithm:
uT umedian 7C 2 sn u
where umedian is the median velocity, snu is the one-sided
standard deviation calculated from the data on the side of the
median that has lower variability, and C2 is a user coefcient
(default value 1.0) again provided for user-exibility. This
method was found to be effective for data measured with the
UDVP instrument where seeding was insufcient (MacVicar
and Best, 2013). In such a case a second mode of velocity
samples were found clustered around a null velocity so that
they produced a highly skewed distribution. A high value of C2
( 3.0) was found to reliably detect this type of outlier without
resulting in false positive spike detections.
3. Phase-space threshold (SpikeGoringNikora): a variable
threshold can be applied that detects spikes by determining
whether they are enclosed by three-dimensional ellipsoids
calculated for dimensions of the velocity and its derivatives
following Goring and Nikora (2002). The median is used
instead of the mean to locate the central tendency of the data
because it is more reliable for data contaminated with spikes
(Goring and Nikora, 2003; Wahl, 2003). A user-dened coefcient (C3) is again applied as a multiplier of for exibility. An

94

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

additional option is to freeze data points that are relatively


close to the median to prevent the spreading of spikes following the method of Parsheh et al. (2010).
4. Velocity-correlation threshold (SpikeVelCorr): a variable
threshold can be applied that detects spikes by determining
whether data are enclosed by three-dimensional ellipsoids
calculated for dimensions of the three velocity components
following the method developed and described Cea et al.
(2007). A user-dened coefcient (C4) is again applied as a
multiplier of for exibility. Low frequency variability is
removed prior to spike detection.
To improve spike detections, particularly in time-series with
periodic variations in the mean due to waves or where ow
conditions are changing over time, low frequency variability is
removed using a moving average (default value of 5 s) lter, prior
to spike detection. Detected data spikes can be replaced with a
variety of algorithms (Goring and Nikora, 2002). At the moment,
the option is linear interpolation (SpikeReplace). In the future,
more options can be included to allow users more exibility.
Doppler or white noise is an inherent characteristic of Dopplerbased backscatter measurement systems. While Doppler noise is
present over all frequencies, it has been suggested that failure to
remove the noise at high frequencies may result in biases in the
estimation of turbulence statistics (especially for higher order
moments) due to aliasing (i.e., the folding back of frequencies
above the Nyquist frequency) (Lane et al., 1998). Low-pass lters
are recommended to avoid aliasing errors (Lane et al., 1998); yet
low pass lters may affect other turbulent statistics such as the
integral time scale (see Roy et al. 1997). A user option can be used
to remove white noise above the Nyquist frequency (one half of
the sampling frequency) can be removed from a velocity time
series using MITT by applying a low-pass third order Butterworth
lter. The method and parameters are as recommended by Roy et
al (1997). In real terms this lter serves to smooth the highfrequency variability of the time series.
2.4. Classication of data quality
For studies focusing on the characterization of ow structure
through mean and turbulence statistics, the importance of accurately assessing time series data quality cannot be overstated. For
example, studies involving ADV measurements tend to rely on the
signal correlation value, which as specied by Sontek/YSI (2001)
should be above 70%. However, signal correlations below 80% have
been shown to signicantly affect mean velocities, turbulent
statistics, and higher statistical moments (Lacey and Rennie,
2011; Lane et al., 1998). Other techniques based on advanced
statistics such as the noise ratio have been developed (Hurther and
Lemmin, 2001). As the average user of velocimeters may not be
familiar with all the available techniques, the algorithms available
in MITT should help to avoid publishing biased results. A wide
range of options for classifying the quality of data series have been
included:
1. Cell Range The upper and lower bounds of cells that are
known to have feasible data can be entered to restrict the range
of cells that will be considered. Using the UDVP, for example, it
was found that turbulence statistics were poor in cells that
were less than 0.01 m from the probe head (about 7 cells in
MacVicar and Best, 2013). If the sampling locations and the
channel topography are accurately input to the program, the
classication of cells below the bed and above the water
surface is performed automatically. This option is available for
multi-cell data arrays.

2. Signal correlation (ClassifyCor) Acoustic instruments


measure velocity based on the phase lag between the return
signals of two acoustic pulses. Under optimal conditions the
return signals are highly correlated and this parameter can be
used as an indicator of signal quality. A percentage threshold
(default time averaged correlation of 70%) can be used to reject
poor quality data series as recommended by instrument
manufacturers (Sontek/YSI, 2001). This option is only available
for the ADV, Vectrino, and VII.
3. Cross-correlation between adjacent cells (Classifyxcorr)
Assuming that turbulence is coherent in time and space,
recorded signals should have a relatively high correlation when
velocity is recorded simultaneously in closely spaced cells and a
cross- correlation threshold can be used to reject poor quality
data series. This option is available for multi-cell data arrays. No
default is specied as it will depend on cell volume, the
distance between cells, and other ow parameters such as
turbulence intensity.
4. Noise evaluation using redundant vertical velocity measurements (ClassifyNoiseRatio) This method can only be
used with instruments with redundant velocity measurements
(i.e., Vectrino and VII). Hurther and Lemmin (2001) show that
the noise within the variance of the measured time series can
be estimated by subtracting the covariance of the two redundant vertical velocity time series from the vertical times series
variance (i.e., n2w1 2w1  w01 w02 ). The estimated noise can be
divided by the true variance to give a noise ratio (see Lacey and
Rennie, 2011). This time series quality assessment parameter is
invaluable as it gives a quantitative measure of the level of
noise in the time series. For example, a noise ratio threshold of
20% would mean that at least 80% of the recorded variance is
related to turbulent velocity uctuations. The noise ratio
threshold can be adjusted to control which data series are
classied as poor quality (default of 5%).
5. Spike threshold (ClassifySpike) Time series with a high
number of spikes are of lower quality because frequent spikes
indicates the instrument was unable to obtain a reliable signal
while spike replacement necessarily means that interpolated
data are being included in analyses. A percentage-based threshold can be used to reject recorded signals where the number of
detected spikes is high (default of 5%). This option is available
for all instruments for which the velocity signals have been
despiked.
6. Flattening of power spectra (ClassifyNoise) When
plotted in frequency space, turbulent velocity time series are
expected to follow the Kolmogorov turbulence scaling law (-5/
3 slope) within the inertial subrange (Frisch, 1995;
Kolmogorov, 1941; Nikora and Goring, 1998). A comparison
of the actual and ideal slopes in this frequency range can be
used to classify data quality. The upper and lower frequency
limits of the inertial subrange are determined using recommendations of Stapleton and Huntley (1995). Based on the
tendency of the spectrum to atten out due to noise, the
criterion species the maximum permissible slope for the
inertial subrange.
7. 3rd order polynomial t (ClassifyPoly) Time series for
which the calculated statistics do not t within larger scale
trends of a multi-cell array may be of poor quality. Third
order polynomial ts to the mean, standard deviation and
skewness of recorded multi-cell velocity signals can be used
to check for outliers. A standard score (acting as a multiplier
of the standard deviation) is used in combination with the t
algorithm from Matlabs curve tting toolbox to vary the
threshold at which time series are rejected (default value of
2.58 or 99%). This option is available for multi-cell instruments that have at least ve cells.

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

The above analyses can be applied to all les listed in the


CSVcontrol le automatically (ClassifyArrayAuto) or with the
aid of an interactive assessment GUI described below (ClassifyArrayGUI Fig. 2). Classication options are saved in Config.
faQC. Default options can be controlled in DefaultfaQC. The
results of the classication can be output to a table in a popup
window by selecting the option on the MITT launch window
(Fig. 1). Cell quality classication results are stored in a logical
array (Config.goodCells - 0 indicates poor quality and 1 indicates good quality), where it is available to control further analysis
and guide interpretation of results.
2.5. Visualization and interactive assessment of good quality data
Three gures have been designed to visualize recorded data
and aid in the assessment of data quality. The interactive quality
control window (Fig. 2 ClassifyArrayGUI) allows users to
plot statistics from stored velocity arrays and to visualize the
results from the quality classication tools. Statistics such as the
mean, standard deviation and skewness can be calculated for all
variables stored in the Data structure array. Such statistics can be
plotted on one of three subplots against any variable stored in
Config. For the example shown in Fig. 2, four UDVP les are
displayed with Config.zZ (the relative depth) as the y-axis and
the mean, standard deviation, and kurtosis on subplots I, II, and III,
respectively. The active array is shown as the le name 64p5p075ds0001na3.mat, which corresponds with the green symbols.
Data quality for the active array has been classied using a
combination of techniques including the range, the crosscorrelation between adjacent cells, the spike threshold, and a
third-order polynomial. Greyed out positions at z/Z  0.75 and 0.81
have higher standard deviations than surrounding cells and have
been identied as poor quality. Additional plots can be generated
for the active array which are designed to further assist the user to
assess data quality including the time series plot (Fig. 4), the time
space plot (Fig. 5), and the sampling cell locations plot (Fig. 6). The
interactive quality control window can thus function as a hub for
the visualization of data quality and results.
The time series plot (Fig. 4 Plot1Series) is a static (noninteractive) plot that shows recorded and cleaned velocity data
plotted versus time, signal correlation data (where available), box
plots and power spectra of the velocity data. Axes ranges are
determined automatically from the recorded data. Recorded components are shown in their orthogonal form in the plot, each on a
separate axis. Raw, de-spiked, and ltered data for each component (where available) are plotted on the same axes. This gure
allows a rapid assessment of data quality and the effect of the
various spike replacement algorithms on the time series. Grid lines
following the Kolmogorov  5\3 law are included in the frequency
space subplot to allow a visual assessment of the inertial subrange
and to determine if attening of the spectra is present due to noise
(Frisch, 1995; Kolmogorov, 1941; Nikora and Goring, 1998).
Fig. 4a and b present examples of good and poor quality data,
respectively obtained using a Sontek ADV and a Nortek Vectrino.
The good data (Fig. 4a) shows relatively few spike replacements
and high mean signal correlations in all beams. The w velocity
spectra indicates a general good t with the  5/3 law at higher
frequencies while the slopes of u and v velocity spectra atten.
Higher noise in the horizontal velocity components is expected
due to ADV transducer conguration (Voulgaris and Trowbridge
1998). In the inertial subrange, theory predicts that the velocity
spectra should collapse following 4/3Puu(f) Pvv(f) Pww(f)
(Tennekes and Lumley 1972). It can be observed from Fig. 4a that
all velocity components do have similar power over a small range
from f 1 to 5 Hz. In contrast, the velocity time series presented
in Fig. 4b has many characteristics indicating it is of poor quality.

95

Firstly, low signal correlations ( o48%) are observed for all velocity
components with the exception of w1. The v time series contains a
number of spikes (3.6%) and the associated v spectra is relatively
at, even after spike removal, which indicates a high level of noise.
The slope of the w1 spectra is clearly greater than that of the w2
spectra indicating noise contamination of w2 as the two slopes
should be equal. Due in part to the high noise in the signal (the
noise ratio for this series is 41%), none of the spectra collapse to
similar values in the inertial subrange.
The time-space velocity matrix plot (Fig. 5 PlotTimeSpace)
is a static color image of the data in a multi-cell array of velocity
measurements. High and low velocities are shown as hot and cold
colors for each cell location (y-axis) versus time (x-axis). The mean
velocity value is removed from each cell in the array so that it is
easier to assess the coherence of the turbulence and typical
variability of the cells. Since the data is recorded simultaneously,
the array image plot helps to visualize coherent turbulent ow
structures (identied as zones of contrasting high and low velocity
in the plot). Following Bufn-Blanger et al. (2000), time is
reversed on the x-axis to give the appearance of an unrolled lm
with the rst measurements to the right and the most recent
measurements to the left. Two examples show relatively good
UDVP data (Fig. 5a) and relatively poor data from a VII (Fig. 5b).

3. Getting started
To assist the user with launching their analysis, this section
summarizes the steps required to run MITT for their own data.
Before the master program (MITT) is started, the user must create a
CSVcontrol le using the format specied in Table 1 and containing, at a minimum, the parameters specied in Table 2. Care should
be taken to ensure that the le names included in the CSVcontrol
le match the instrument output le names. The CSVcontrol le
and instrument output les must be located in the same directory/
folder.
3.1. To begin using MITT
1. Open MATLAB and open the directory where the MITT programs are located.
2. From the command line run MITT. The MITT launch window
opens (Fig. 1).
3. Click the Select File box. This opens a browser window to
search for the CSVcontrol le.
To upload and organize the raw data into MATLAB format:
1. Click the Organize raw Data and Cong array tick box. A set of
tick boxes appears allowing the user dene the channel
geometry and custom sampling locations if desired.
2. Click the Run Analysis button. The structured arrays Data and
Config are created in a n.MAT le named after each raw data
time series.
To clean the time series:
1. Click the Clean raw time series tick box. The Clean block
options tick boxes are displayed. The reset despiked and/or
ltered time series should be ticked if the cleaning algorithms
are being rerun.
2. Clicking Plot all time series will generate a time series plot for
each series (e.g., Fig. 4). This is recommended as a rst quality
control measure. Alternatively, the time series plots can be
created from the Interactive Quality Control window (Fig. 2).

96

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

Fig. 4. Time series plot showing the analysis results from one cell of (a) an ADV dataset, and (b) a Vectrino dataset. A set of subplots are shown for each component (see part
a, component u for subplot labels). Subplots include signal correlation (I), velocity time series (II) with raw, despiked, and ltered results (where available, see legend on
lower left subplot), box plots (III), and spectral density (IV). Grid lines are plotted with a  5/3 slope on spectral density subplots to facilitate comparison with the
Kolmogorov scaling law. The le location, le name, and the cell number are indicated in the title.

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

97

To classify the quality of the time series


1. Click the Classify quality of time series tick box. The Classify
block options tick boxes are displayed along with the classication parameters. The reset classications with listed parameters should be ticked if the classication is being rerun.
2. Click the Interactive quality control GUI tick box to obtain the
interactive plotting gure (Fig. 2). Otherwise the classication
is done automatically for all les listed in CSVcontrol.
3. Click the Plot classication results in tables tick box to obtain a
popup window output of classication results for each time
series.
4. Click the Run Analysis button. The classication parameter
results are stored in Config.Qdat and the results of the
analysis (1 good data, 0 poor data) are stored in Config.goodCells.
4. Summary

Fig. 5. Time space plots for (a) Data.Filtered.Beam1 for a UDVP array and (b)
Data.Filtered.u for a VII array. In a) the overall good quality data can be
inferred from the spatial and temporal organization, with areas of high and low
velocities indicative of coherent turbulent structures. Higher variability cells can be
distinguished at Config.zZ  0.75 and 0.80 that correspond with classied poor
quality cells in Fig. 2. Cells with relatively low temporal variability close to the
probe (Config.zZ  0.95) were also classied as poor quality. In (b) the lack of
spatial and temporal organization in the ow is indicative of poor data quality. Low
data variability at Config.zZ o0.32 indicates that there may have been a solid
object within the sampling volume in this area.

The MITT algorithms are intended to be used by hydraulic


researchers and practitioners that work with high frequency
(Z20 Hz) instruments in eld and lab experiments of open
channel ow. The algorithms are designed as a exible analysis
and visualization toolbox in Matlab that can be adjusted to work
with different types of data sets in a wide range of hydraulic
conditions. The visualization tools are built on a series of gures
that allow the user to quickly assess the quality of the measured
data, to correct the data where possible by removing noise and
spikes, and if necessary to reject the data series in an objective and
defensible manner. A number of widely applied data quality
algorithms from the literature have been included.
While the toolbox is fully usable in its current state, it is
intended to be a work in progress. Not all available data treatment
and analysis algorithms have been included, and algorithms to
analyse the data, for example boundary shear stress and turbulent
Reynolds stresses, are also not part of the current toolbox. The
standardization of the output format is meant to allow additional
modules to be programmed that will work with data from a
variety of instruments. The development of a freely shared set of
algorithms in this commonly used programming language will
help to accelerate the training of researchers and practitioners,
ensure a consistent application of methods, remove the bias
associated with the use of poor quality data from future scientic
literature, and ease collaboration through the sharing of data and
methods.

Acknowledgments

Fig. 6. Example of sampling volume positions output for ECM data over a 2D mesh
of interpolated bed and water surfaces. Sampling cell locations are indicated by
asterisk symbols and the color of the symbols will correspond with those in Fig. 3.

3. Click the Despike tick box to clean the data. Four de-spiking
methods appear and the user has the option to tick one or more
to clean the data.
4. Click the Frequency lter tick box to lter the data series. The
only lter option is a low pass 3rd order Butterworth lter.
5. Click the Run Analysis button. The cleaned and ltered data
series are stored in the Data array.

This research has been funded through the NSERC Discovery


Grant program and a Canadian Foundation for Innovation
Leaders of the Future/Ontario Research Fund Research Infrastructure (#27851). We would also like to thank the work of two
anonymous reviewers for their comments that signicantly
improved the clarity of this submission and the usability of the
algorithms.

Appendix A. Supporting information


Supplementary data associated with this article can be found in
the online version at http://dx.doi.org/10.1016/j.cageo.2014.09.002.

98

B. MacVicar et al. / Computers & Geosciences 73 (2014) 8898

References
Bufn-Blanger, T., Roy, A.G., Kirkbride, A.D., 2000. On large-scale ow structures in
a gravel-bed river. Geomorphology 32, 417435.
Cea, L., Puertas, J., Pena, L., 2007. Velocity measurements on highly turbulent free
surface ow using ADV. Exp. Fluids 42, 333348.
Doroudian, B., Hunther, D., Lemmin, U., 2007. Discussion of turbulence measurements with acoustic doppler velocimeters by Carlos M. Garca, Mariano I.
Cantero, Yarko Nio, and Marcelo H. Garca. J. Hydraul. Eng. 133, 12861289.
Frisch, U., 1995. Turbulence: The Legacy of A.N. Kolmogorov. Cambridge University
Press, Cambridge, U.K.
Goring, D.G., Nikora, V.I., 2002. Despiking Acoustic Doppler Velocimeter data.
J. Hydraul. Eng. (128), 117126.
Goring, D.G., Nikora, V.I., 2003. Closure to Depiking Acoustic Doppler Velocimeter
Data by Derek G. Goring and Vladimir I. Nikora. J. Hydraulic Eng. (129),
487488.
Hurther, D., Lemmin, U., 2001. A correction method for turbulence measurements
with a 3D acoustic doppler velocity proler. J. Atmos. Oceanic Technol. 18,
446458.
Kolmogorov, A.N., 1941. Dissipation of energy in locally isotropic turbulence in an
incompressible viscous liquid. Dokl. Akad. Nauk SSSR 30, 299303.
Lacey, R., Rennie, C., 2011. Laboratory Investigation of Turbulent Flow Structure
around a Bed-Mounted Cube at Multiple Flow Stages. J. Hydraul. Eng. 138,
7184.
Lane, S.N., Biron, P.M., Bradbrook, K.F., Butler, J.B., Chandler, J.H., Crowell, M.D.,
McLelland, S.J., Richards, K.S., Roy, A.G., 1998. Three-dimensional measurement
of river channel ow processes using Acoustic Doppler Velocimetry. Earth Surf.
Processes Landforms 23, 12471267.
Le Roux, J.P., Brodalka, M., 2004. An Excel-VBA programme for the analysis of
current velocity proles. Comput. Geosci. 30, 867879.
MacVicar, B.J., Roy, A.G., 2007. Hydrodynamics of a forced rife pool in a gravel bed
river: 1. Mean velocity and turbulence intensity. Water Resour. Res. 43,
W12401.
MacVicar, B.J., Best, J., 2013. A ume experiment on the effect of channel width on
the perturbation and recovery of ow in straight pools and rifes with smooth
boundaries. J. Geophys. Res. Earth Surf. 118, 18501863.
MacVicar, B.J., Dilling, S., Lacey, R.W.J., Hipel, K., 2014. A quality analysis of the
Vectrino II instrument using a new open-source MATLAB toolbox and 2D ARMA
models to detect and replace spikes. In: Schleiss, A., De Cesare, G., Franca, M.J.,
Pster, M. (Eds.), River Flow 2014. CRC Press, Lausanne, SW, pp. 19511959.

Martini, M., Lightsom, F.L., Sherwood, C.R., Xu, J., Lacy, J.R., Ramsey, A., Horwitz, R.,
2005. Hydratools, a MATLABs based data processing package for Sontek Hydra
data, In: Proceedings og the IEEE/OES Eighth Working Conference on Current
Measurement Technology, Southampton, UK, pp. 147151.
McLelland, S.J., Ashworth, P.J., Best, J.L., Livesey, J.R., 1999. Turbulence and
secondary ow over sediment stripes in weakly bimodal bed material.
J. Hydraulic Eng. 125, 463473.
Nikora, V.I., Goring, D.G., 1998. ADV measurements of turbulence: can we improve
their interpretation? J. Hydraul. Eng. 124, 630634.
Parsheh, M., Sotiropoulos, F., Port-Agel, F., 2010. Estimation of Power Spectra of
Acoustic-Doppler Velocimetry Data Contaminated with Intermittent Spikes. J.
Hydraul. Eng. 136 (6), 368378.
Parsons, D.R., Jackson, P.R., Czuba, J.A., Engel, F.L., Rhoads, B.L., Oberg, K.A., Best, J.L.,
Mueller, D.S., Johnson, K.K., Riley, J.D., 2013. Velocity Mapping Toolbox (VMT): a
processing and visualization suite for moving-vessel ADCP measurements.
Earth Surf. Processes Landforms 38, 12441260.
Roy, A.G., Biron, P.M., Lapointe, M.F., 1997. Implications of low-pass ltering on
power spectra and autocorrelation functions of turbulent velocity signals.
Math. Geol. 29, 653668.
Snyder, W., Castro, I., 1999. Acoustic doppler velocimeter evaluation in stratied
towing tank. J. Hydraul. Eng. 125, 595603.
Sontek/YSI, 2001. SonTek ADVField Acoustic Doppler Velocimiter: Technical Documentation, San Diego, CA.
Stapleton, K.R., Huntley, D.A., 1995. Seabed stress determinations using the inertial
dissipation method and the turbulent kinetic energy method. Earth Surf.
Processes Landforms 20, 807815.
Tennekes, H., Lumley, J.L., 1972. A First Course in Turbulence. MIT Press, Cambridge,
Mass.
Voulgaris, G., Trowbridge, J.H., 1998. Evaluation of the Acoustic Doppler Velocimeter
(ADV) for turbulence measurements. J. Atmos. Oceanic Technol. 15, 272289.
Wahl, T., 2000. Analyzing ADV Data Using WinADV, Joint Conference on Water
Resource Engineering and Water Resources Planning and Management. ASCE,
Minneapolis, MN, pp. 110.
Wahl, T.L., 2003. Discussion of Despiking Acoustic Doppler Velocimeter Data by
Derek G. Goring and Vladimir I. Nikora. J. Hydraul. Eng. 129, 484487.
Welch, P.D., 1967. The use of fast fourier transform for the estimation of power
spectra: a method based on time averaging over short modied periodograms.
IEEE Trans. Audio Electroacoust. AU-15, 7073.

You might also like