MITT Open-Source MATLAB Algorithms For The Analysis of High-Frecuency Flow Velocity Time Series Datasets - MacVicar 2014
MITT Open-Source MATLAB Algorithms For The Analysis of High-Frecuency Flow Velocity Time Series Datasets - MacVicar 2014
Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
Department of Civil Engineering, Universit de Sherbrooke, Sherbrooke, Quebec, Canada J1K 2R1
art ic l e i nf o
a b s t r a c t
Article history:
Received 28 April 2014
Received in revised form
8 September 2014
Accepted 10 September 2014
Available online 20 September 2014
The measurement of ow velocity at high frequencies (20200 Hz) has been made easier over the last
couple of decades by the development and commercialization of a variety of instruments, many of which
are capable of measuring multiple sampling volumes simultaneously. A variety of methods has been
proposed to remove errors in velocity time series and classify the quality of data. However, most
methods are applied using custom algorithms written to treat custom data formats and remain hidden
from a wider audience. The objective of this paper is to introduce and document a new set of opensource algorithms, written in Matlab, that comprise the Multi-Instrument Turbulence Toolbox (MITT).
The algorithms are designed to: (i) organize the data output from multiple instruments into a common
format; (ii) present the data in a variety of interactive gures for visualization and assessment; (iii) clean
the data by removing data spikes and noise; and (iv) classify data quality. We hope that these algorithms
will form the nucleus of an evolving toolbox that will help to accelerate the training of hydraulic
researchers and practitioners, ensure a consistent application of methods for turbulence analysis,
remove the bias of poor quality data from scientic literature, and ease collaboration through the sharing
of data and methods.
& 2014 Elsevier Ltd. All rights reserved.
Keywords:
Hydraulics
Turbulence
Acoustic doppler velocimeter (ADV)
Spike replacement
Data
Quality analysis
Visualization
Matlab
1. Introduction
Measurements of ow velocity are frequently obtained in
open-channels and other environments to calculate parameters
such as discharge and bed shear stress, and to characterize ow
turbulence. As instrumentation has improved, it has become easier
to record multipoint and simultaneous velocity time series at high
frequencies (20200 Hz). These technological advances have
greatly reduced the time and cost required to obtain velocity time
series, but all instruments are subject to measurement error and
noise, and accurately assessing data quality and eliminating or
replacing poor quality data remains a prerequisite to obtaining
representative results. Given that the number and complexity of
error correction/analysis techniques have increased along with
dataset size, the task of data quality analysis has become even
more onerous. A set of open-source algorithms to organize and
analyze turbulence data would help to ensure methods are applied
consistently, accelerate training and the exchange of data, and ease
http://dx.doi.org/10.1016/j.cageo.2014.09.002
0098-3004/& 2014 Elsevier Ltd. All rights reserved.
2. Program description
The MITT algorithms are written as Matlab functions due to the
simplicity and popularity of this high-level programming language. A signicant advantage of high level languages is that
many other algorithms and toolboxes are available to perform
routine tasks. An example of an available algorithm is pwelch,
included in the Signal Processing Toolbox, which computes periodograms of subsets of the full time series and then averages the
results to obtain a smoothed periodogram for which it is easier to
identify anomalies in frequency space (Welch, 1967). Please note
that courier font is used to indicate function names and
italicized courier font is used to indicate variable names.
Other examples include algorithms for box plotting, curve tting,
and creating programmable objects such as push buttons. The
Statistics, Signal Processing, and Curve Fitting Toolboxes are
required as part of the Matlab license to perform the full range
of calculations in the MITT algorithms. The code was written using
Matlab v2012b and is not backwards compatible due to changes in
Matlab function syntax and availability. MITT users will benet
from a working knowledge of standard and object-oriented
programming in Matlab.
MITT is launched from the command line within the Matlab
environment. A graphical user interface (GUI) is shown that allows
the user to select les and options for the analysis and visualization of the data (Fig. 1). The three major computational blocks,
referred to herein in bold courier font as Organize, Clean,
and Classify, must initially be run in that order to ensure proper
program function, but any of the blocks can be rerun. The blocks
are actioned by ticking Organize raw data into Data and Cong
array, Clean raw time series, and Classify quality of time series
on panel (b) of Fig. 1, respectively. Organize translates the raw
data from different types of instruments into a common format,
Clean block de-spikes and lters velocity time series, and Classify assesses the quality of recorded time series in an array. In
89
each block, options are selected by the user to apply the desired
methods. Classify can be run as an automated procedure or
through a second interactive GUI (Fig. 2). The following sections
describe the common data format including the supported instrument output formats, the program architecture, and the data
quality assessment and visualization tools.
2.1. Organization of data
A common data format, modeled after the output format of
the Nortek Vectrino II (VII) velocity proling instrument, is
used to save the output from MITT. Two structure arrays called
Data and Config store the recorded data and conguration
parameters, respectively. The advantage of using structure
arrays is that variable names can be given sufcient description
so that the content is clearly identiable. For instance, the
transformation matrix used to convert between measured beam
velocities and orthogonal velocity components for acoustic type
instruments is stored in Config.transformationMatrix.
Each velocity component is stored in a subeld of the Data
structure array (for e.g. Data.Vel.u for the streamwise velocity). The naming convention used in the VII data is followed so
that the orthogonal components u, v, and w represent velocity
in the streamwise (x), lateral (y) and vertical (z) directions,
respectively. The ability to attach descriptive names to the data
while storing it in a format that can be readily passed around
between programs allows for the development of a exible
platform for data analysis that conceivably will have many
different users. Visualization and analysis programs are written
to accept the Data and Config structure arrays from any
instrument.
MITT is capable of handling instruments that sample either a
single volume (called a cell herein) or multiple cells in parallel
(i.e. simultaneously or quasi-simultaneously) or in series (i.e. one
after another). Single cell instruments for which Organize algorithms have been created include the Sontek Acoustic Doppler
Velocimeter (ADV), the Nortek Vectrino, and the MarshMcBirney Electomagnetic Current Meter (ECM). These instruments
can be linked to record multiple cells in parallel. Multiple cell
instruments for which Organize algorithms have been created
include the VII and the Metow Ultrasonic Doppler Velocity
Proler (UDVP). Multiple UDVP probes can be multiplexed but
the software records data from different probes in series. Within
the appropriate subelds, single cell time series are stored in
single columns while multiple cells in parallel are stored in a
matrix with the rows and columns representing the time interval
and the cell number, respectively. Multiple cells recorded in series
are stored in separate les with their own Data and Config
structure arrays. Neither low frequency acoustic instruments such
as the Sontek Acoustic Doppler current Proler nor high
frequency instruments such as hot lm probes, hot wire probes,
Laser Doppler Velocimeters, and Particle Image Velocimetry (PIV)
techniques are currently supported in MITT, although there is no
conceptual reason why they could not be included in the future.
A comma delimited text le (n.csv format), called the CSVcontrol le, is a required input to MITT. In the CSVcontrol le, the data
les to be analyzed are listed in rows, all of which should be stored
in the same folder as the control le. Information (or variables)
about the recorded data series such as instrument location and
water depth (i.e. anything not recorded by the instrument itself) is
listed in columns. Each column must have two header rows that
list the name of the parameter (e.g. waterDepth) and a format
string (Table 1). Format strings must be in matlab format (e.g. %s
for a string, %d for an integer, and %f for a oating point real
number). Required eld names and descriptions are listed in
Table 2. Parameters are stored as subelds in Config.
90
Fig. 1. MITT launch window with all panels active and open. The csvcontrol le is selected on panel a. Messages regarding program operation are shown in the blue-green
rectangle. Computational blocks are activated on panel b. Options for Organize, Clean and Classify are selected on panels c, d, and e, respectively. The pushbutton f
launches the analysis.
water surface (bedElevation and waterElevation, respectively) that are stored as subelds in one and two dimensional
structural arrays (oneD and twoD), which are then stored in
Config.
To assist user with launching an analysis of their own data, a
set of data for each instrument has been provided along with
CSVcontrol les and is available on the Matlab File Exchange
server. While it is beyond the scope of this article to fully describe
their experimental apparatus and methods, the data is largely
from published studies, and the examples of good and bad quality
can help to accelerate the learning process. The VII, UDVP, and
ECM data require custom sampling locations algorithms to function properly, and these programs have been included in MITT.
CalcXYZUWFlume calculates cell positions for a ume experiment
using a VII based on the position of the probe head given in the
control le (xpos1, ypos1, and zpos1) and instruments internal
parameters (see MacVicar et al., 2014 for a description of this
work). CalcXYZUIllinoisFlume performs a similar calculation
but with more parameters so that the position of an array of UDVP
probes can be calculated from recorded positions of the instrument cart and holder relative to a guide rail (e.g., tdist is the
length of a custom built holder for the probes). CalcIllinoisFlumePos needs to be used to specify the non-uniform channel
bed-conguration for the experiment in this case as it is necessary
to calculate the elevation of the guide rail to determine sampling
cell locations (see MacVicar and Best, 2013 for a description of this
work). CalcXYZMoras calculates cell positions relative to the bed
and water surface for an array of ECMs from positions measured
on a system of light bridges during ood events at a eld site. A set
of scattered topographical measurements can be used to specify
91
Fig. 2. Interactive Quality Control Plot. Panel a is active when the gure is created. The user selects arrays on panel a. When the Done button is pushed, panels b and c
are enabled, which allows Config subelds to be set as the y axis variable. Config.zZ is set as the y-axis of the three subplots and its range is from 0 to 1. Drop down lists on
panels c, d, and e allow the user to set selected Data subelds and statistics to be calculated for those subelds as the x-axis variables. The mean, standard deviation, and
kurtosis for Data.Filtered.Beam1 from all selected arrays are shown in subplots I, II, and III, respectively. Axes limits can be changed or reset. Options activated on panel
f for a single active array include: (1) plot sampling volume positions (Fig. 6), (2) plot one series (Fig. 4 series to plot is selected using the mouse), (3) plot an array image
(Fig. 5), or (4) classify data quality. Classify options are shown on panel g. For time series that are classied as poor quality, a horizontal shaded grey line is drawn in the
subplots A, B, and C. Users can change the classication of cells by pushing the Manually Adjust button and selecting cells with the mouse.
Table 1
csvcontrol le format example. Probe elevations should be entered as true
elevations (i.e. not relative to the bed). Columns may be in any order and additional
columns may be used to save additional parameters as subelds in Cong. For
example, the discharge Q would be saved as Cong.Q.
instrument lename
%s
%s
Vectrino
Vectrino
Example1 0.2
Example2 0.4
0.5
0.5
0.4
0.4
1
1
0
0
0.054
0.054
Table 2
Variables required for analysis and visualization algorithms. These variables can be
input directly using the CSVcontrol le (Table 1) or can be calculated using custom
subprograms.
Parameter
Description
Fig. 3. Denition of optional uniform channel and orientation. Flow is from top to
bottom so that the origin is dened at the centerline of the upstream limit of the
measurement section.
instrument
92
Table 3
Full list of MITT algorithms with descriptions.
Algorithm Name
Subprograms
Description
MITT
Command line
OrganizeInput
MITT
CleanSeries
MITT
ClassifyArrayGUI
MITT
ClassifyArrayAuto
MITT
CalcChannelMesh
OrganizeInput
OrganizeInstrumentData 3
CleanSpike
3
OrganizeInput
CleanSeries
OrganizeInput
CleanSeries
ClassifyArrayGUI
ClassifyArrayAuto
makefaQCbuttons
DefaultfaQC
subGetValues
subSetValues
subFieldnames
OrganizeInstrumentData
CalcChannelMesh
CleanSpike
CleanFilter
PlotTimeSeries
CalcGoodCells
CalcArrayStats
Plot1Series
PlotTimeSpace
makefaQCbuttons
getAnames
subGetValues
subSetValues
subFiltArray
getAnames
InterpUniformChan
InterpNonUniformChan
custom subprograms
ConvCSV2Struct
SpikeStdev
SpikeSkewness
SpikeGoringNikora
SpikeVelCorr
Plot1Series
Get raw data from Instrument output les and save in Cong, Data.
Controls detection and removal of spikes from time series
CleanFilter
PlotTimeSeries
3
3
CalcGoodCells
InterpUniformChan
CalcChannelMesh
ClassifyCor
Classifyxcorr
ClassifyNoiseRatio
ClassifySpike
ClassifyNoiseFloor
ClassifyPoly
DefaultfaQC
InterpNonUniformChan
CalcChannelMesh
CalcArrayStats
Plot1Series
PlotTimeSpace
PlotQCTable
4
4
4
4
subprograms
SpikeSkewness
SpikeGoringNikora
SpikeReplace
4
4
4
SpikeReplace
SpikeReplace
ClassifyCor
ClassifyNoiseRatio
4
4
ClassifyArrayGUI
PlotTimeSeries
ClassifyArrayGUI
ClassifyArrayAuto
ClassifyArrayGUI
CleanSpike
CleanSpike
SpikeSkewness
SpikeGoringNikora
CalcGoodCells
CalcGoodCells
Classifyxcorr
CalcGoodCells
ClassifySpike
ClassifyNoiseFloor
4
4
CalcGoodCells
CalcGoodCells
ClassifyPoly
CalcGoodCells
ConvCSV2Struct
ConvMulti2Struct
ConvStruct2Multi
ConvXYZ2Beam
OrganizeInput
OrganizeECMData
PlotTimeSeries
CleanSpike
CleanFilter
ClassifyArrayGUI
PlotTimeSeries
CleanSpike
CleanFilter
CleanSpike
CleanSeries
CleanSeries
ClassifyArrayGUI
ClassifyArrayGUI,
ClassifyArrayAuto
93
Table 3 (continued )
Algorithm Name
makefaQCbuttons
DefaultfaQC
5
5
getAnames
subGetValues
subSetValues
subFieldnames
5
5
5
MITT, ClassifyArrayGUI
MITT
CalcGoodCells
ClassifyArrayGUI
ClassifyArrayAuto
MITT, ClassifyArrayGUI
MITT, ClassifyArrayGUI
ClassifyArrayGUI
Subprograms
Description
Get values from buttons, editable elds and other GUI input elds
Set values for buttons, editable elds and other GUI input elds
Extracts eld names, including all substructures within a structure
94
95
Firstly, low signal correlations ( o48%) are observed for all velocity
components with the exception of w1. The v time series contains a
number of spikes (3.6%) and the associated v spectra is relatively
at, even after spike removal, which indicates a high level of noise.
The slope of the w1 spectra is clearly greater than that of the w2
spectra indicating noise contamination of w2 as the two slopes
should be equal. Due in part to the high noise in the signal (the
noise ratio for this series is 41%), none of the spectra collapse to
similar values in the inertial subrange.
The time-space velocity matrix plot (Fig. 5 PlotTimeSpace)
is a static color image of the data in a multi-cell array of velocity
measurements. High and low velocities are shown as hot and cold
colors for each cell location (y-axis) versus time (x-axis). The mean
velocity value is removed from each cell in the array so that it is
easier to assess the coherence of the turbulence and typical
variability of the cells. Since the data is recorded simultaneously,
the array image plot helps to visualize coherent turbulent ow
structures (identied as zones of contrasting high and low velocity
in the plot). Following Bufn-Blanger et al. (2000), time is
reversed on the x-axis to give the appearance of an unrolled lm
with the rst measurements to the right and the most recent
measurements to the left. Two examples show relatively good
UDVP data (Fig. 5a) and relatively poor data from a VII (Fig. 5b).
3. Getting started
To assist the user with launching their analysis, this section
summarizes the steps required to run MITT for their own data.
Before the master program (MITT) is started, the user must create a
CSVcontrol le using the format specied in Table 1 and containing, at a minimum, the parameters specied in Table 2. Care should
be taken to ensure that the le names included in the CSVcontrol
le match the instrument output le names. The CSVcontrol le
and instrument output les must be located in the same directory/
folder.
3.1. To begin using MITT
1. Open MATLAB and open the directory where the MITT programs are located.
2. From the command line run MITT. The MITT launch window
opens (Fig. 1).
3. Click the Select File box. This opens a browser window to
search for the CSVcontrol le.
To upload and organize the raw data into MATLAB format:
1. Click the Organize raw Data and Cong array tick box. A set of
tick boxes appears allowing the user dene the channel
geometry and custom sampling locations if desired.
2. Click the Run Analysis button. The structured arrays Data and
Config are created in a n.MAT le named after each raw data
time series.
To clean the time series:
1. Click the Clean raw time series tick box. The Clean block
options tick boxes are displayed. The reset despiked and/or
ltered time series should be ticked if the cleaning algorithms
are being rerun.
2. Clicking Plot all time series will generate a time series plot for
each series (e.g., Fig. 4). This is recommended as a rst quality
control measure. Alternatively, the time series plots can be
created from the Interactive Quality Control window (Fig. 2).
96
Fig. 4. Time series plot showing the analysis results from one cell of (a) an ADV dataset, and (b) a Vectrino dataset. A set of subplots are shown for each component (see part
a, component u for subplot labels). Subplots include signal correlation (I), velocity time series (II) with raw, despiked, and ltered results (where available, see legend on
lower left subplot), box plots (III), and spectral density (IV). Grid lines are plotted with a 5/3 slope on spectral density subplots to facilitate comparison with the
Kolmogorov scaling law. The le location, le name, and the cell number are indicated in the title.
97
Fig. 5. Time space plots for (a) Data.Filtered.Beam1 for a UDVP array and (b)
Data.Filtered.u for a VII array. In a) the overall good quality data can be
inferred from the spatial and temporal organization, with areas of high and low
velocities indicative of coherent turbulent structures. Higher variability cells can be
distinguished at Config.zZ 0.75 and 0.80 that correspond with classied poor
quality cells in Fig. 2. Cells with relatively low temporal variability close to the
probe (Config.zZ 0.95) were also classied as poor quality. In (b) the lack of
spatial and temporal organization in the ow is indicative of poor data quality. Low
data variability at Config.zZ o0.32 indicates that there may have been a solid
object within the sampling volume in this area.
Acknowledgments
Fig. 6. Example of sampling volume positions output for ECM data over a 2D mesh
of interpolated bed and water surfaces. Sampling cell locations are indicated by
asterisk symbols and the color of the symbols will correspond with those in Fig. 3.
3. Click the Despike tick box to clean the data. Four de-spiking
methods appear and the user has the option to tick one or more
to clean the data.
4. Click the Frequency lter tick box to lter the data series. The
only lter option is a low pass 3rd order Butterworth lter.
5. Click the Run Analysis button. The cleaned and ltered data
series are stored in the Data array.
98
References
Bufn-Blanger, T., Roy, A.G., Kirkbride, A.D., 2000. On large-scale ow structures in
a gravel-bed river. Geomorphology 32, 417435.
Cea, L., Puertas, J., Pena, L., 2007. Velocity measurements on highly turbulent free
surface ow using ADV. Exp. Fluids 42, 333348.
Doroudian, B., Hunther, D., Lemmin, U., 2007. Discussion of turbulence measurements with acoustic doppler velocimeters by Carlos M. Garca, Mariano I.
Cantero, Yarko Nio, and Marcelo H. Garca. J. Hydraul. Eng. 133, 12861289.
Frisch, U., 1995. Turbulence: The Legacy of A.N. Kolmogorov. Cambridge University
Press, Cambridge, U.K.
Goring, D.G., Nikora, V.I., 2002. Despiking Acoustic Doppler Velocimeter data.
J. Hydraul. Eng. (128), 117126.
Goring, D.G., Nikora, V.I., 2003. Closure to Depiking Acoustic Doppler Velocimeter
Data by Derek G. Goring and Vladimir I. Nikora. J. Hydraulic Eng. (129),
487488.
Hurther, D., Lemmin, U., 2001. A correction method for turbulence measurements
with a 3D acoustic doppler velocity proler. J. Atmos. Oceanic Technol. 18,
446458.
Kolmogorov, A.N., 1941. Dissipation of energy in locally isotropic turbulence in an
incompressible viscous liquid. Dokl. Akad. Nauk SSSR 30, 299303.
Lacey, R., Rennie, C., 2011. Laboratory Investigation of Turbulent Flow Structure
around a Bed-Mounted Cube at Multiple Flow Stages. J. Hydraul. Eng. 138,
7184.
Lane, S.N., Biron, P.M., Bradbrook, K.F., Butler, J.B., Chandler, J.H., Crowell, M.D.,
McLelland, S.J., Richards, K.S., Roy, A.G., 1998. Three-dimensional measurement
of river channel ow processes using Acoustic Doppler Velocimetry. Earth Surf.
Processes Landforms 23, 12471267.
Le Roux, J.P., Brodalka, M., 2004. An Excel-VBA programme for the analysis of
current velocity proles. Comput. Geosci. 30, 867879.
MacVicar, B.J., Roy, A.G., 2007. Hydrodynamics of a forced rife pool in a gravel bed
river: 1. Mean velocity and turbulence intensity. Water Resour. Res. 43,
W12401.
MacVicar, B.J., Best, J., 2013. A ume experiment on the effect of channel width on
the perturbation and recovery of ow in straight pools and rifes with smooth
boundaries. J. Geophys. Res. Earth Surf. 118, 18501863.
MacVicar, B.J., Dilling, S., Lacey, R.W.J., Hipel, K., 2014. A quality analysis of the
Vectrino II instrument using a new open-source MATLAB toolbox and 2D ARMA
models to detect and replace spikes. In: Schleiss, A., De Cesare, G., Franca, M.J.,
Pster, M. (Eds.), River Flow 2014. CRC Press, Lausanne, SW, pp. 19511959.
Martini, M., Lightsom, F.L., Sherwood, C.R., Xu, J., Lacy, J.R., Ramsey, A., Horwitz, R.,
2005. Hydratools, a MATLABs based data processing package for Sontek Hydra
data, In: Proceedings og the IEEE/OES Eighth Working Conference on Current
Measurement Technology, Southampton, UK, pp. 147151.
McLelland, S.J., Ashworth, P.J., Best, J.L., Livesey, J.R., 1999. Turbulence and
secondary ow over sediment stripes in weakly bimodal bed material.
J. Hydraulic Eng. 125, 463473.
Nikora, V.I., Goring, D.G., 1998. ADV measurements of turbulence: can we improve
their interpretation? J. Hydraul. Eng. 124, 630634.
Parsheh, M., Sotiropoulos, F., Port-Agel, F., 2010. Estimation of Power Spectra of
Acoustic-Doppler Velocimetry Data Contaminated with Intermittent Spikes. J.
Hydraul. Eng. 136 (6), 368378.
Parsons, D.R., Jackson, P.R., Czuba, J.A., Engel, F.L., Rhoads, B.L., Oberg, K.A., Best, J.L.,
Mueller, D.S., Johnson, K.K., Riley, J.D., 2013. Velocity Mapping Toolbox (VMT): a
processing and visualization suite for moving-vessel ADCP measurements.
Earth Surf. Processes Landforms 38, 12441260.
Roy, A.G., Biron, P.M., Lapointe, M.F., 1997. Implications of low-pass ltering on
power spectra and autocorrelation functions of turbulent velocity signals.
Math. Geol. 29, 653668.
Snyder, W., Castro, I., 1999. Acoustic doppler velocimeter evaluation in stratied
towing tank. J. Hydraul. Eng. 125, 595603.
Sontek/YSI, 2001. SonTek ADVField Acoustic Doppler Velocimiter: Technical Documentation, San Diego, CA.
Stapleton, K.R., Huntley, D.A., 1995. Seabed stress determinations using the inertial
dissipation method and the turbulent kinetic energy method. Earth Surf.
Processes Landforms 20, 807815.
Tennekes, H., Lumley, J.L., 1972. A First Course in Turbulence. MIT Press, Cambridge,
Mass.
Voulgaris, G., Trowbridge, J.H., 1998. Evaluation of the Acoustic Doppler Velocimeter
(ADV) for turbulence measurements. J. Atmos. Oceanic Technol. 15, 272289.
Wahl, T., 2000. Analyzing ADV Data Using WinADV, Joint Conference on Water
Resource Engineering and Water Resources Planning and Management. ASCE,
Minneapolis, MN, pp. 110.
Wahl, T.L., 2003. Discussion of Despiking Acoustic Doppler Velocimeter Data by
Derek G. Goring and Vladimir I. Nikora. J. Hydraul. Eng. 129, 484487.
Welch, P.D., 1967. The use of fast fourier transform for the estimation of power
spectra: a method based on time averaging over short modied periodograms.
IEEE Trans. Audio Electroacoust. AU-15, 7073.