Machine Learning Applications For Building Structural Design and Performance Assessment
Machine Learning Applications For Building Structural Design and Performance Assessment
com/science/article/pii/S2352710220334495
Manuscript_6251259e356a65b8b4459236b9eae6c3
1 Machine Learning Applications for Building Structural Design and Performance Assessment:
2 State-of-the-Art Review
a* b c
3 Han Sun ; Henry V. Burton, M.ASCE ; Honglan Huang
a
4 Research Engineer, Yahoo Research. Email: hansun2014@ucla.edu
b
5 Asscociate Professor, Department of Civil and Environmental Engineering, University of California Los Angeles. Email:
6 hvburton@seas.ucla.edu
c
7 PhD Candidate, Department of Civil and Environmental Engineering, University of California Los Angeles. Email:
8 honglanhuang@ucla.edu
*
9 corresponding author
10 ABSTRACT:
11 Machine learning provides a powerful tool for predicting and assessing structural performance, identifying structural
12 condition and informing preemptive and recovery decisions by extracting patterns from data collected via various sources
13 and media. This paper presents a review of the historical development and recent advances in the application of machine
14 learning to the area of building structural design and performance assessment. To this end, an overview of machine learning
15 theory and the most relevant algorithms is provided with the goal of identifying problems suitable for machine learning
16 and the appropriate models to use. The machine learning applications in building structural design and performance
17 assessment are then reviewed in four main categories: (1) predicting structural response and performance, (2) interpreting
18 experimental data and formulating models to predict component-level structural properties, (3) information retrieval using
19 images and written text and (4) recognizing patterns in structural health monitoring data. The challenges of bringing
20 machine learning into building engineering practice are identified, and future research opportunities are discussed.
© 2020 published by Elsevier. This manuscript is made available under the Elsevier user license
https://www.elsevier.com/open-access/userlicense/1.0/
21 KEY WORDS: machine learning; artificial intelligence; building structural design and performance assessment;
23 1. Introduction
24 Machine learning (ML) refers to a set of methodologies that are capable of automatically detecting patterns in data,
25 which can then be used to develop forecasting models and support decision making under uncertain conditions (Murphy,
26 2012) [1]. There are three main types of learning: supervised, unsupervised and reinforcement. Supervised learning is used
27 to develop predictive models where the goal is to map a set of inputs (also known as features, attributes or covariates) to
28 one or more outputs (also known as the response variable). Supervised learning problems are described as classification or
29 pattern-recognition when the response variables are categorical and regression when the outputs are numerical variables.
30 Unsupervised or descriptive learning is associated with much less well-defined problems, where the goal is to discover
31 underlying relationships in the data. Both supervised and unsupervised learning can be achieved using parametric and non-
32 parametric models. Whereas the former utilizes a fixed number of parameters, the size of the training dataset determines
33 the number of parameters in the latter. Parametric models are often easier to construct and implement but are constrained
34 by the assumptions that they make about the data distribution. Non-parametric models are much more flexible but their
35 complexity increases with the size of the dataset. Reinforcement learning, the least popular of the three categories, is used
36 to acquire knowledge on how to act or behave (i.e. make decisions) under uncertainty (Hastie et al. 2009 [2]; Murphy, 2012
37 [1]). Note that semi-supervised learning, which, for the purposes of this paper, is not included as a primary category,
39 ML methods are not foreign to building structural design and performance assessment (SDPA) as applications in this
40 area can be traced back to as early as the late 1980’s when Adeli and Yeh (1989) [3] developed and applied an ML-based
41 methodology to a beam design problem. This pioneering work was followed by several studies during the 1990’s that
42 applied artificial neural networks (ANNs) (Hopfield, 1982) [4] to building SDPA problems. One of the first in this series
43 of studies was conducted by Vanluchene and Sun (1990) [5], who applied back-propagation neural networks (Rumelhart
44 et al. 1986 [6] ) to three distinct building SDPA problems related to locating the load on a beam, designing a reinforced
45 concrete beam and analyzing a simply supported plate. This study was closely followed by several others (Hajela and Berke,
46 1991; Ghaboussi et al. 1991; Wu et al. 1992; Masri et al. 1993; Kang and Yoo, 1994; Messner et al. 1994; Elkordy et al.
47 1994) [7–13], most of which utilized the back-propagation network. Recognizing the growing popularity of ANNs in
48 building SDPA, Gunaratnam and Gero (1994) [14] conducted a detailed examination of the factors that influence their
49 performance, some of which were domain-specific, while others were domain-independent. The authors highlighted the
50 importance of reduced dimensionality (i.e. the number of features or predictors) and embedment of domain-specific
51 knowledge in achieving effective learning. To address specific challenges associated with the back-propagation
52 methodology such as the slow rate of learning, Adeli and Park (1995) [15] explored the use of counter-propagation
53 algorithms to address building SDPA problems. Whereas back-propagation networks utilized only supervised learning, the
54 counter-propagation algorithm combined both supervised and unsupervised learning. In the Adeli and Park study, the two
55 algorithms were applied to four building SDPA problems including the concrete beam design and simply supported plate
56 analysis defined by Vanluchene and Sun and two others involving the analysis of a steel beam.
57 In the late 1990’s, Reich (1997) [16] conducted a review of the literature on the application of ML to civil engineering
58 problems. In addition to building SDPA, the review included other civil engineering domains such as transportation,
59 construction management, water resources, environmental and materials. In fact, only sixteen of the ninety-seven citations
60 were specific to building SDPA. In addition to reviewing the literature, the author highlighted several issues to be addressed
61 towards the practical application of ML in civil engineering. They include (1) having a deep understanding of the learning
62 problems, (2) knowing which ML technique is most suitable for the problem at hand, (3) the ease of implementation or
63 availability of various ML techniques, (4) proper evaluation of trained models and (5) the availability of efficient
66 the ones described in the previous two paragraphs) were limited to a few relatively simple problems involving small
67 datasets. In contrast, the increase in computational resources and resurgence of artificial intelligence over the past two
68 decades has led to the development of more sophisticated tools and techniques that can harness these new data streams and
69 solve highly nonlinear learning problems. Within building SDPA, the revitalization of ML has been fueled by the
70 complexity of modern systems, which requires the generation and/or manipulation of large datasets to rigorously assess
71 their performance under various loading conditions. These datasets can be produced from (1) reconnaissance and remote
72 sensing from past extreme events, (2) measurement data from large-scale (or multiple small-scale) physical experiments,
73 (3) response of instrumented systems under normal operating loading conditions, (4) large-scale computational simulations
74 and (5) relevant audio-visual media (e.g. images, videos and written text).
75 The abundance of studies on ML methods applied to building SDPA problems since the Reich paper warrants a more
76 current state-of-the-art review. The goal of this paper is to synthesize past research on this topic towards a common
77 understanding of the types of problems that are suited to ML applications, the characteristics of ML methods, the challenges
78 associated with applying ML to building SDPA (ML-SDPA) and opportunities for the future. The review begins with a
79 brief introduction to ML that includes a general problem formulation and discussion of relevant sub-topics (feature
80 engineering and model training and performance evaluation). Next, the mathematical details of some ML algorithms that
81 are increasingly being applied to building SDPA problems are presented. This is followed by a review of the existing ML-
82 SDPA literature categorized in terms of the following four application areas: (1) predicting structural response and
83 performance, (2) interpreting experimental data and formulating models to predict component-level structural properties,
84 (3) information retrieval using images and written text and (4) recognizing patterns in structural health monitoring data.
85 Subsequently, a discussion of specific challenges and future research opportunities related to the availability and collection
86 of useful data, the explainability and interpretability (or lack thereof) of some ML models and challenges with overfitted
87 models are presented.
89 This section presents an overview of ML beginning with a generalized formulation of supervised (classification and
90 regression) and unsupervised learning problems. A brief discussion of feature engineering and model training and
91 performance evaluation is also included. The material presented in this section is obtained from several statistics and ML
94 For supervised learning, the dataset of feature variables can be described by a matrix with dimension × , where
95 is the total number of observations (data points) and is the number of features (or independent variables). The
96 response variable is described by an × 1 vector containing the label for each observation. For a classification
97 problem, is a categorical variable and for regression, is a numerical variable. Unsupervised learning problems
98 include the feature matrix but not the response variable. The objective of supervised learning is to solve the generalized
99 optimization problem by minimizing the empirical loss function defined by Equation 1 (Murphy, 2012) [1].
100 ∑ , ; + Ω (1)
101 Where is the response variable for observation , ; (also commonly denoted as ! including later in this
102 paper) is the predicted response from the ML model based on the feature and represents the set of model parameters.
103 is a loss measure between the true and predicted ! value of the response variable. Ω is a regularization
104 term that penalizes the model based on its complexity by restricting the parameter set through some regularizing
105 function Ω . is a model parameter that is determined as part of the optimization process. The objective is to find
106 the set of model parameters " that minimizes the empirical loss over the training data with the regularization penalty
107 considered. Note that is only used in parametric models, which have a finite number of parameters. For example, linear
108 regression models always have + 1 model parameters. On the other hand, rather than using a finite number of
109 parameters to define the data distribution, non-parametric models utilize a flexible parameter set whose size, in theory, can
110 be infinite, and is often treated as a function. The Support Vector Machine with radial basis function kernel is an example
111 of a non-parametric model whose parameter set depends on the training data. Equation 1 is a convenient generalized
112 formulation that is adopted by many supervised learning methods including ordinary least squares, ridge, least absolute
113 shrinkage and selection operator (LASSO) and logistic and kernel regression. Depending on the ML method, the
114 minimization problem can be solved using a closed form solution, gradient-based optimization, or convex relaxation.
115 The goal of unsupervised learning is to infer the underlying structure and parameters of the model that generated the
116 data, which can then be used to group the data into clusters, generating new instances and drawing inferences. The objective
117 function for unsupervised learning is shown in Equation 2, where # is the set of model parameters that characterizes a
118 learned structure for the given dataset. The objective function can take the variant forms of negative log-likelihood and
119 Kullback-Leibler divergence. In clustering analysis, quantifies the cost of assigning a data point to a particular
120 cluster. Examples of ML methods that follow this generalized formulation include the Gaussian Mixture Model, K-means
∑ ;#
$
122 (2)
123 In ML theory, the objective function expressed in Equations 1 and 2 is defined as the empirical loss over the training
124 dataset, denoted as % for a given model . The theoretical solution to the ML problem, which is shown in Equation
125 3, is the set of parameters that minimizes the loss function over the entire data space, % . However, real problems are
126 almost always limited by the amount of data sampled from the entire space. Therefore, the ideal solution is often
127 approximated by minimizing the empirical loss over the training data instead.
% = ') , , ( (
$,
128 (3)
129 Where , is the theoretical joint distribution of the feature and response variables over the entire data
130 space, *.
131 2.2 Feature Engineering
132 Prior to training a ML model, the features that are found to influence model performance, improve training efficiency,
133 and increase flexibility, must be selected and extracted. Most ML methods deploy standard feature selection and extraction
134 algorithms. However, some also have the ability to adjust features to achieve the best possible prediction performance.
135 Feature selection can be categorized into three methods: filter, wrapper, and embedded. The filter method ranks the
136 original features according to an importance measure such as the scores from a Chi-square test or correlation coefficients
137 between individual features and the response variable and selects a subset to be used for model training. The wrapper
138 method recursively includes or excludes features from an initial pool and selects the best performing feature set based on
139 feedback from the ML model. Embedded methods are used by those algorithms that incorporate automatic feature selection
140 (e.g., LASSO regression). Both filter and wrapper methods are good at avoiding overfitting issues by reducing model
141 complexity and improving training efficiency by reducing highly correlated features.
142 Feature extraction consists of two major tasks that increase the effectiveness of ML models. The first is dimension
143 reduction, which is achieved by applying methods such as Principle Component Analysis (PCA), which performs a linear
144 mapping from the original data space to a lower dimensional space such that the data variance over each resulting
145 orthogonal component is maximized. The second involves transforming the data into a higher dimensional space such that
146 the patterns become sparse and separable, such as in kernel-based ML algorithms (Huang et al. 2006) [18].
147 Besides the earlier-described general feature engineering techniques, specific feature designs have been proven to be
148 very successful for domain-specific problems. For instance, the use of HAAR-like features achieved human-level face-
149 recognition accuracy with far less computational effort (Viola and Jones, 2001; Lienhart and Maydt, 2002) [19,20], SIFT
150 features (Lowe, 2004) [21] are very effective for object detection within images and HOG features (Dalal and Triggs, 2005)
151 [22] are particularly good for human detection. However, these domain-specific feature engineering techniques require
152 considerable trial and error testing and are designed to only work for very specific problems and data structures. Neural
153 networks and the associated deep learning approaches are extremely popular because they automate feature engineering to
154 achieve state-of-the-art level performance in many pattern recognition and data mining domains. This approach has gained
155 widespread popularity in recent years because of the increase in computation power, which made complex neural net
158 There are many well-established procedures for training ML models that attempt to achieve stable and effective
159 prediction performance for new data given a training dataset. One common strategy is +-fold cross validation (also
160 discussed in Reich 1997 [16]), which randomly splits a dataset into + different subsets and trains the model + times
161 using the + ,- subset as testing data and the remaining + − 1 subsets for model training. The best performing of the +
162 models over the testing dataset is selected. This procedure is intended to reduce overfitting on the training dataset. Another
163 popular technique that is used to avoid overfitting is Bootstrapping, which randomly samples a subset of the data with
164 replacement and trains the model / times. The final model is selected as an average over the predicted results (regression)
165 or based on a majority vote (classification) (Bunke and Droge, 1984 [23]) from the / models. Both bootstrapping and
166 +-fold cross validation effectively reduces model variances and removes bias and are the primary training techniques used
167 to develop data-driven models. These training procedures are evaluated by using various performance metrics for model
168 selection. For example, performance metrics for binary classification models include accuracy, precision and recall.
169 Precision refers to the number of correct “positive” (e.g. building is red-tagged) predictions normalized by the number of
170 positive predictions. Recall is the number of correct positive predictions normalized by the number of actual positive classes.
171 Accuracy is the number of correct predictions normalized by the total number of predictions (Powers, 2011) [24]. For
172 multi-class classification problems, in addition to the aforementioned three metrics, a confusion matrix and top-+ class
173 accuracy is also used (Krizhevsky et al. 2012) [25]. Regression models are typically evaluated using the mean squared
174 error (MSE), root mean squared error (RMSE), adjusted root mean square error, coefficient of multiple determination, the
175 median absolute error and the median absolute relative deviation (MARD) (Mack et al. 2007; Burton et al. 2017) [26,27].
176 3. Machine Learning Models Commonly used for Building SDPA Problems
177 Some ML algorithms that are commonly employed for building SDPA problems are introduced. Only supervised
178 learning (classification and regression) models are included because very few ML-SDPA studies involving unsupervised
179 learning can be found in the literature. The next section summarizes the recent (mostly within the last decade) ML-SDPA
180 research. Most of the methods included in the literature review are covered in this section. However, some of the more
181 advanced algorithms such as recurrent or convolutional neural networks, while included in the ML-SDPA review, are not
182 discussed in this section. The relevant references are provided for readers who would like to become more familiar with
184 3.1 Linear Regression: Ordinary Least Squares, LASSO, Ridge and Polynomial Basis Functions
185 Ordinary Least Squares (OLS) is one of the more simpler regression methods and is very commonly used in building
186 SDPA. It is included in this section because it provides a basis for understanding some of the less common regression
187 techniques (e.g. LASSO and ridge), which are described later in this section. In the OLS formulation, the response variables
190 Where is the observed response variables, 0 represents the predicted response variables, is the feature matrix
191 described earlier and ∈ is an × 1 vector of residuals, which is taken as the difference between the observed and
192 predicted values of the response variables. 2 (same as the model parameters, , in Equation 1) is a × 1 vector of
193 predictor coefficients, which is derived by minimizing the residual sum of squares, 455 = − 2 6
− 2 . Note that
194 for OLS regression, 455 represents the loss function described earlier and no regularization term is included. The
195 OLS minimizing predictors are computed using the closed form solution in Equation 5a.
196 LASSO (Tibshirani, 1996) [28] and ridge (Hoerl and Kennard, 1970) [29] are two linear regression methods that, as
197 noted earlier, incorporate feature engineering as part of the overall formulation. The LASSO method integrates both feature
198 selection (by setting some of the predictor coefficients to zero) and shrinking the OLS coefficients by including a penalty
199 on the 455 loss function. As shown in Equation 5b, the regularizing function (Ω in Equation 1) is taken as the sum of
200 the absolute values (the % norm) of the predictor coefficients. Like LASSO, ridge adds a penalty to the 455 loss function.
201 However, the root sum of the square of the predictor coefficients (the %7 norm) is used as the regularization function. Also,
202 ridge regression does not incorporate feature selection i.e. none of the predictor coefficients are shrunk to zero.
8 9:; =
2 455 = 6 < 6
2
203 (5a)
8 :=;;9 =
2 455 + ‖2‖:?
2
204 (5b)
8 @ ABC =
2 455 + ‖2‖7:D = 6
+ E < 6
2
205 (5c)
208 the value of the regularization parameter can be determined using +-fold cross validation and/or by minimizing some
209 information criterion such as the Akaike Information Criterion (AIC) (Akaike, 1974) [30] and Bayesian Information
211 By replacing in Equation 4 with a nonlinear transformation of itself (also referred to as a basis function, e.g., ),
212 linear regression can be used to create models that capture a nonlinear relationship between the response and feature
213 variables. It is important to note that the regression model itself is still linear because the parameters 2 are linear
214 (Murphy, 2012) [1]. In the literature review section presented later in the paper, several studies have utilized linear
215 regression with higher-order polynomial basis functions i.e. in Equation 4 is replaced with F1 G G7 … G A I. The
216 complexity of the model can be increased by using higher values of ( or by utilizing multiple piece-wise polynomial basis
217 functions as in multiple adaptive regression splines (MARS) (Friedman, 1991) [32].
218 3.2 Kernel Regression
219 As noted earlier, linear regression models can be adapted to capture complex non-linear relationships between features
220 and response variables by employing nonlinear basis functions. One strategy that has been adopted in several of the ML-
221 SDPA studies discussed later in the paper is the use of kernel basis functions. The word “kernel” has several interpretations;
222 however, it is often described as a measure of the similarity between two observations and J. This similarity (or lack
223 therefore) is quantified using a kernel function, + , J (note that and J are feature vectors of an individual
224 observation). Examples of commonly used kernel functions include linear, polynomial, sigmoid and Gaussian or squared
225 exponential. Equation 6 describes the Gaussian kernel, which was commonly used in the reviewed ML-SDPA studies.
D
+ , = exp N− S
O P< QO
226 J 7R D
(6)
227 Where the bandwidth, T, is a model parameter that is determined during the training process (e.g. using +-fold cross
228 validation or based on AIC). In the context of kernel ridge regression (i.e. ridge regression performed using a kernel basis
229 function), the feature matrix is replaced by a kernel matrix U (Equation 7) and the minimization problem takes on the
+ , + , 7 … + ,
+ 7, + 7, … ⋮
U=V 7
Y
⋮ ⋮ ⋱ ⋮
231 (7)
+ , + , 7 … + ,
0 @ ABC =
Z | − UZ|7 + Z6 UZ = U + E <
Z 7
232 (8)
233 Where E is an × identity matrix and Z is the vector of regression coefficients in kernel space. Equation 9 is
234 used to compute the response function !,C\, for a test data point ,C\, .
236 3.3 Tree-Based Algorithms: Decision Trees, Random Forests and Adaptively Boosted Trees
237 Tree-based algorithms can be used for both classification and regression. The models that belong to this category
238 recursively divide the training dataset while exploring and learning its structure towards creating subspaces that are
239 mutually exclusive or have high levels of purity (the ratio between the class with the most samples and the size of the data
240 subset). The rules used to grow and prune each tree (e.g. node splitting and stopping criteria), the number of considered
241 trees and the approach to aggregating information from multiple trees, are what distinguishes the different types of tree-
243 The structure of a tree can be described using nodes, each of which represents a data subset of predictors and response
244 variables. The lowest level node, which is the one that comprises the entire dataset, is called the root node. Additional
245 nodes are created when a parent node is divided based on some criteria (described later) to create child nodes. The highest
246 level nodes, which are also referred to as the leaf nodes, represent the data subsets created at the very end of the data-
247 division process. In other words, no further splitting of the data occurs beyond the leaf nodes, whose associated subspaces
248 provide the response variable prediction. All other nodes (excluding the root and leaf nodes) are referred to as interior
249 nodes. For a classification problem, the prediction is taken as the dominant class (or categorial variable) within the leaf
250 node that meets the data-division criteria for all nodes leading to it. For a regression problem, the predicted value of the
251 response variable is taken as the mean value at the corresponding leaf node.
252 The Decision Tree (DT), which is the simplest of the tree-based algorithms, considers all features when splitting each
253 data subset and choses the one that minimizes the impurity measure. The Gini index ^_ (Equation 10) is commonly
̂ab = ∑ P ∈ge
_ = f represents the fraction of observations belonging to the f ,- class within the
de
256 Where
257 region (or data subset) 4a . _ ∙ is an indicator function. Several alternative criteria such as the minimum number of
258 samples needed at a given node for additional splitting and the maximum depth of the tree are used to terminate the growth
259 (or data-division) process (Breiman et al. 1984; Hastie et al. 2009) [33,34].
260 Adaptive boosting seeks to improve the performance of DTs by iteratively creating new models that correct the errors
261 of the previous one. This is achieved by applying weights to the training datapoints based on some set of criteria. The first
262 model is created using uniform weights i = , = 1,2, … , for the training data k , |1,2, … , l, ∈ k1,2, … , ml,.
264 11b.
# =o p r
a <q e
qe
265 (11a)
266 i s = i ∙ exp t# a
∙_p ≠v a
rw (11b)
267 Where # a
is the weight factor assigned to the base model v a
, and _ ∙ is the indicator function. Based on
268 Equation 11a, the misclassified data points in iteration are assigned a higher # a
and a higher weight in iteration
269 + 1. A linear aggregation of the weighted base models is used to give the final prediction (Freund and Schapire, 1997;
!,C\, = G ∑x
a # a
_ v a
,C\, =f
b
271 (12)
272 The Random Forests (RF) model uses an aggregated set of Decision Trees, which are constructed by applying
273 bootstrap sampling to the training dataset. For regression problems, the model prediction is taken as the average of the
274 considered DTs and the class that is predicted by the majority of trees is used for classification. During the data-division
275 process at each node, a randomly selected subset of features is considered, thus reducing the correlation across the different
278 Logistic regression is often used as a classification algorithm because it is fairly easy to implement and interpret the
279 final results. Given the feature vector, and assuming a binary response for each observation, ∈ k0,1l, the probability
280 of the = 1 class is computed using the sigmoid transformation of the linear function of (Equation 13).
z = 1|{ = =
|}~ 2• s€•
s|}~ 2• s€•
281 (13)
Where 2 = ‚ƒ , ƒ7 , … , ƒ„ … and ƒ† are the predictor coefficients vector and the bias term, respectively. Multinomial
6
282
283 logistic regression is used for problems with more than two classes i.e. ∈ k1,2, … , ml, where m represents the number
284 of classes. The probability that belongs to the f ,- class is computed using Equation 14.
z = f|{ = = ∑ˆ
|}~ 2•
‡ s€‡•
•
P‰? |}~ 2P s€P•
285 (14)
286 Where 2b and ƒb† are the predictor coefficients vector and the bias term associated with computing the probability
287 of the f ,- class. The predicted class is taken as the one with the highest probability (Bishop, 2006) [39]. The training
288 process used to retrieve 2 is similar that of the OLS method discussed above with a closed-form solution.
290 Originally developed as a binary classifier, support vector machines (SVM) seek to determine the hyperplane that
291 separates a dataset into two classes with the widest possible gap between them. If the training data is linearly separable, a
292 hard-margin version of SVM is applied. Otherwise, the hinge loss function (Rosasco, et al., 2004) [40] is introduced to
293 maintain a soft margin for the decision boundary, which begins by defining the %7 regularized objective function shown
296 where the response approximation function in the original feature space is ! = 26 + ƒ† . By adopting the Š-insensitive
297 loss function together with slack variables (because the objective function is not differentiable) (Cortes & Vapnik, 1995)
298 [41] and an appropriate kernel function + , , the response approximation becomes ! = ∑d # + , + ƒ† where
300 For a binary class problem with the training dataset defined by k , |1,2, … , l, ∈ k−1,1l and feature vector
301 observation , the classification is based on sign ! . For a multi-class problem, an one-versus-all or one-versus-one
302 approach can be adopted. For a set of m classes i.e. ∈ k1,2, … , ml, the data from class f is treated as positive and the
c c<
7
303 data from the other classes is treated as negative in the one-versus-all approach. In the one-versus-one approach,
304 classifiers are trained and a prediction is established for each pair. The class with the highest number of votes is then used
305 as the prediction (Bishop, 2006; Murphy, 2012) [1,39].
307 K-Nearest Neighbors (KNN) is a non-parametric algorithm that is used for both classification and regression. First,
308 the "Ž" observations in the training data that are nearest to the observation ‹• are identified based on some pre-
309 defined distance metric. An empirical function is then created based on the number of each class in that subset of datapoints
310 (defined as ‹• ). For ∈ k1,2, … , ml , the probability that observation belongs to the f ,- class is computed using
z = f|{ = , Ž = ∑ ∈d‘ _ =f
•
312 (16)
313 Where _ ’ is an indicator function that is equal to 1 if ’ is true and 0 of ’ is false where ’ represents whether the
314 observation belongs to class f. The Euclidean distance is often used as the distance metric in KNN and the value of Ž
315 can be chosen using + -fold cross validation. The observation is assigned to the class with the highest empirical
316 probability. KNN can also be used for regression where the value of the response variable is taken as the average (or median)
319 Discriminant analysis is a technique that is used to address binary or multi-class classification problems. The
320 methodology assumes that feature variables within a particular class take on a multi-variate normal (MVN) distribution.
321 More specifically, the distribution of the feature vector conditioned on class f is defined by ‹ |“b , ”b , where “b
322 and ”b are the mean vector and covariance matrix computed using that data subset associated with class f. The probability
323 that observation belongs to class f is obtained by applying Bayes theorem to the class-conditioned multi-variate
z = f|{ = = ∑ˆ
•‡ –‡
P‰? •P –P
325 (17)
326 Where b and are the class-conditioned MVN probability density function (pdf) for the feature vector
327 and —b is the prior probability of being in class f (estimated as the fraction of class f observations in the training
328 dataset). The classification (or discriminant) function is obtained by substituting the MVN pdf into Equation 17. In linear
329 discriminant analysis (LDA), the same covariance matrix ” is used across all classes (i.e. computed using the feature
330 vectors for the entire training dataset), which produces a linear decision boundary between each pair of classes. The LDA
333 The predicted class is taken as the one with the highest ˜b value. Quadratic Discriminant Analysis (QDA) uses
334 class-conditioned covariance matrices ”b (i.e. computed using only the feature vectors from the class f data subset),
335 which produces a quadratic decision boundary between each pair of classes. The QDA classifier is shown in Equation 19
339 ANNs refer to a category of pattern recognition algorithms that are inspired by the biological nervous system. The
340 network takes a set of features as inputs and applies complex feature fusion operations through a series of layers. ™š ,
344 ™š = š
›š ™š< + œš (20)
345 The final layer, which could be a linear layer for regression problems or a softmax layer for classification problems
346 (Bishop, 2006) [39], outputs the predicted response !. Loss function choices include 455 for regression and cross entropy
347 loss for classification. The ANN model is trained through backpropagation, which is a gradient-based algorithm that
348 calculates error gradients over each model parameter based on the chain rule (Rumelhart, et. al., 1986) [6]. Numerous
349 variants of ANNs have been developed to achieve faster convergence, better prediction performance and less memory
350 usage. ANN variants can differ based on the activation function (e.g., leaky rectified linear unit), the type of layer-
351 connections (e.g., dropout, max-pooling) or the connection mechanism (e.g., Recurrent Neural Network (RNN)). The term
352 deep learning (DL) is used to describe ANNs with many layers. The success of DL started with Krizhevsky et al. (2012)
353 [25], who formulated a deep convolutional neural network (CNN) ImageNet classification model that achieved superb
354 performance. Because of its pattern recognition capability, DL has since been successfully used in many domains including
357 A broad range of relatively recent (mostly within the last decade) ML-SDPA publications are summarized based on
358 the four categories identified earlier, which are also schematically illustrated in Figure 1: (1) predicting structural response
359 and performance, (2) models developed using data from physical experiments, (3) information retrieval using images and
360 written text and (4) models developed using field reconnaissance and structural health monitoring data. It is recognized
361 that some of the reviewed studies belong to more than one category. For instance, models developed with the intent of
362 using measured structural responses (from instrumented buildings) and/or observed damage (during reconnaissance) to
363 estimate building performance states belong to categories (1) and (4) (e.g. Ghiasi et al. 2016 [43]; Zhang and Burton, 2019
364 [44]). Another example is text-feature-based building damage prediction models trained using field reconnaissance data,
365 which belong to categories (3) and (4) (Mangalathu and Burton, 2019) [45]. The reason for the categories is to help
366 researchers and practitioners make decisions about which types of ML problems are suited for specific SE problems.
367
368 Figure 1 Schematic illustration of the four ML-SDPA categories
370 Nonlinear structural response simulation is recognized as the ideal approach to assessing the performance of built
371 systems under extreme loading. Prior studies have used ML to complement or expand the predictive capabilities of
372 “mechanistic” or “physics-based” structural response simulations. These so-called surrogate (or meta-) models serve as
373 compact statistical representations of the relationship between a set of input variables (e.g. structural properties, loading
374 characteristics) and the response or performance quantities of interest. They are useful for reducing the number of
375 mechanistic simulations needed for computationally intensive applications such as uncertainty quantification and
377 Table 1 summarizes the studies that have used ML models to estimate structural response demands or performance
378 metrics (e.g. collapse fragility). For each study, the structural system type, category of response and predictor variables,
379 adopted ML algorithm(s) and model performance evaluation methods are shown. The listed studies focused on building
380 seismic systems including steel (Seo et al. 2012; Khojastehfar et al. 2014; Jough and Sensoy, 2016; Kiani et al. 2019) [46–
381 49] and concrete moment frames (with and without masonry infill) (Mitropoulou and Papadrakakis, 2011; Burton et al.
382 2017; Morfidis et al. 2017; Zhang et al. 2018) [27,50–52], steel braced frames (Moradi and Burton, 2018; Moradi et al.
383 2018) [53,54] and reinforced concrete shear walls (Sun et al. 2019; Zhang and Burton, 2019) [44,55]. Most of the reviewed
384 studies developed regression models with engineering demand parameters (e.g. story drift ratio, peak floor acceleration) as
385 the response variables and in many of those cases, this prediction was as an intermediate step towards developing limit
386 state fragility functions (e.g. Seo et al. 2012) [46]. Utilizing a slightly different approach, a few studies directly incorporated
387 the limit state fragility parameters (e.g. median and dispersion of the collapse intensity) as the response variable
388 (Khojastehfar et al. 2014; Jough and Sensoy, 2016; Burton et al. 2017) [27,47,48]. Whereas most of the studies sought to
389 predict mainshock demands and limit state parameters, three (Burton et al. 2017; Zhang et al. 2018; Zhang and Burton,
390 2019) [27,44,52] focused on aftershock performance. Binary classification models were used in only two of the fourteen
391 studies (Zhang et al. 2018; Kiani et al. 2019) [49,52]. While some studies used only ground motion intensity measures as
392 the model features (e.g. Mitropoulou and Papadrakakis, 2011; Kiani et al. 2019) [49,50], others also included structural
393 configuration (e.g. number of stories in building), modeling (e.g. damping ratio) and material properties (e.g. yield strength
394 of steel).
395 Several different algorithms were used to develop the ML-based structural response and limit state parameter
396 prediction models. For regression, ANN and linear models with polynomial basis functions were most widely used. Other
397 adopted regression algorithms include ridge (conventional and kernel), LASSO, elastic net, PCA and SVM. Some authors
398 used a single algorithm (e.g. Mitropoulou and Papadrakakis, 2011) [50] while others compared the performance of multiple
399 algorithms (e.g. Burton et al. 2017) [27]. Similarly, of the two studies that developed classification models, one used RF
400 (Zhang et al. 2018) [52] while the other compared the performance several algorithms (Kiani et al. 2019) [49]. Most studies
401 used training-testing splits to evaluate the performance the developed ML model. These splits ranged from 33%-67% (i.e.
402 33% training and 67% testing) on one extreme to 80-20 on the other. None of the studies evaluated the effect of the training-
403 testing partition point on model performance. The ratio of the predicted to actual value of the response variable was the
404 most widely used performance metric in the regression studies. Others included the coefficient of determination 47 ,
405 MSE, MARD and mean absolute error (MAE). For the classification studies, accuracy, precision, recall and F-measure
407
408 Table 1 Summary of ML models developed in prior studies to predict building structural response and/or performance
ML Model Performance
Response Training/
Study Structure Type Predictor Variables ML Algorithm(s) Performance
Variable(s) Testing
Metric
Split
Collapse
Khojastehfar et al. Steel Moment Mean absolute
fragility Ground motion parameters ANN 65-30
2014 Frames error and MSE
parameters
2
Collapse Linear Regression with R , MSE and
Jough and Sensoy, Steel Moment Frame strength and ductility
fragility Polynomial Basis NA mean absolute
2016 Frames parameters
parameters Function error
Aftershock
Mainshock response demands OLS, PC, LASSO,
RC Infilled collapse
Burton et al. 2017 (e.g. SDR) and component damage Ridge (conventional 80-20 MARD
Frames fragility
levels and kernel) regression
parameters
2
Controlled Structural Rocking frame design parameters R , ratio of
Moradi and Burton, Linear Regression with
Rocking Steel response (e.g. post-tensioning force and frame 35-65 predicted to
2018 Polynomial Basis Function
Braced Frames parameters aspect ratio) actual response
2
Controlled Performance limit Rocking frame design parameters R , ratio of
Linear Regression with
Moradi et al. 2018 Rocking Steel states (e.g. life (e.g. post-tensioning force and frame 80-20 predicted to
Polynomial Basis Function
Braced Frames safety) aspect ratio) actual response
Post-earthquake Mainshock response demands (e.g.
Zhang et al. 2018 RC Frames Random Forests 75-25 Confusion Matrix
safety state SDR) and component damage levels
Logistic, LASSO, SVM,
Steel Moment Damage fragility Recall, precision
Kiani et al. 2019 Ground motion parameters Naïve Bayes, DT, RF, 70-30
Frames parameters and F-measure
KNN, DA and ANN
Tall RC Building
Aftershock
Zhang and Burton, with Moment
damage fragility Mainshock response demands Support Vector Machines 75-25 MSE
2019 Frame and Core
parameters
409 Walls
411 There is a long history of using empirical data from physical experiments to develop statistical models for predicting
412 structural parameters (e.g. component stiffness, strength and/or deformation capacity). Many of the earlier models, which
413 were developed using very small datasets (on the order 10’s of datapoints), adopted relatively simple analytical expressions
414 with one or two input parameters (Hobbs, 1972; Bažant & Zebich, 1983; Bažant & Chern, 1984; Bažant, et al., 1991;
415 Bažant & Kim, 1991; Carpinteri et al., 1995) [56–61]. As the size of the datasets increased (on the order of 100’s), more
416 complex multi-variate analytical expressions have been adopted (e.g. Haselton et al. 2016; Lignos & Krawinkler, 2010)
417 [62,63]. Some of the most recent studies, which are summarized in the next paragraph, have implemented advanced ML
418 models.
419 Table 2 summarizes the studies that have developed ML models using experimental data. The categories are the same
420 as Table 1with one exception: the models developed to predict structural response demands and limit state parameters are
421 based on the results from nonlinear analyses, which, for the most part, meant that the authors had the ability to control the
422 size of the dataset. However, for the ML models developed using experiment data, the size of the dataset is controlled by
423 the number of experiments conducted in prior studies. For this reason, the number of sample points is also documented in
424 Table 2. Only reinforced concrete components were used in the reviewed studies including columns (Naeej et al. 2013;
425 Luo and Paal, 2018; Luo and Paal, 2019; Mangalathu and Jeon, 2018) [64–67], beam-column joints (Jeon et al. 2014; Luo
426 and Paal, 2018; Luo and Paal, 2019; Mangalathu and Jeon, 2018) [65–68], slabs (Vu and Hoang, 2016) [69], infilled frames
427 (Huang and Burton, 2019) [70] and shear walls (Mangalathu et al., 2020) [71]. The number of specimens ranged from 65
428 (Naeej et al. 2013) [64] to 536 (Mangalathu and Jeon, 2018) [67]. Both classification and regression models have been
429 developed using experimental data. The former has been used to predict the failure mode in specific components (Huang
430 and Burton, 2019; Mangalathu and Jeon, 2019; Mangalathu et al., 2020) [67,70,71] while the latter was employed to predict
431 component-level structural parameters including column confinement coefficients, beam-column shear strengths, punching
432 shear strength in slab, beam-column drift capacity and backbone parameters (e.g. strength, stiffness and deformation
433 capacity). The Luo and Paal (2018) [65] model is the only one that was developed to predict multiple output parameters.
434 The input parameters always comprise the cross section, reinforcement and material properties of the component.
435 Linear (ordinary and piecewise linear) models, DT, MARS, symbolic methods, least square SVM and ANNs have
436 been used for regression. The methods adopted for the classification studies include adaptive boosting, logistic regression,
437 ANNs, RF, SVM and DT. Again, most studies utilized training-testing splits to evaluate model performance, however,
438 because of the generally small size of the datasets, the partition point was mostly skewed towards a much larger proportion
439 for the training subset (e.g. 90%-10% training-testing). MSE, RMSE and 47 were most commonly used to evaluate the
440 performance of regression models. Bias, scatter index, correlation coefficient, agreement index, coefficient of variance,
441 mean average percentage and absolute error were also used for this purpose. Similar to the structural response and limit
442 state assessment studies, accuracy, recall and precision were used to evaluate the classification models.
443 Table 2 Summary of ML models developed using data from physical experiments of building components
446 Techniques for automatically extracting information from images, video and written text, have been broadly applied to a
447 several fields, including engineering, medicine and the physical, natural and social sciences. In building SDPA, large
448 numbers of images, written text and (to a lesser extent) videos are often generated during laboratory experiments, field
449 reconnaissance and routine inspections. With systematic collection, curation and organization, ML models can be
450 developed to extract useful information from these three types of media.
451 Computer Vision (CV) is a sub-category of artificial intelligence that seeks to empower computers to extract
452 meaningful information from images and videos (Szeliski, 2010) [72]. It is worth noting that while ML methods can be
453 incorporated, some CV tasks can be performed using non-ML algorithms. In fact, most of the existing CV applications
454 related to the built environment did not utilize ML algorithms. These studies focused on (i) visual identification and retrieval
455 of concrete (Zhu et al. 2011; German et al. 2012; German et al. 2013; Koch et al. 2014; Koch et al. 2015) [73–77] and steel
456 (Kong and Li, 2018) [78] crack properties (e.g. crack width, length, and orientation) and spalling (concrete only) from
457 images and videos, (ii) automatically developing as-built models using images (Brilakis et al. 2011; Koch et al. 2014; Koch
458 et al. 2015) [76,77,79] and (iii) structural component-level damage classification (German et al. 2013; Paal et al. 2015)
459 [75,80].
460 Unlike the previously mentioned studies that explicitly make use of designated features for visual content
461 detection/classification, the more modern CV applications such as the ones summarized in Table 3, utilize ML methods to
462 automatically extract visual features. ML-based computer vision has been used to detect RC cracks and spalling (Cha et al.
463 2018; Kucuksubasi and Sorgucb, 2018; Hoang et al. 2019b) [81–83], detect loosened and corroded steel bolts (Cha et al.
464 2016; Cha et al. 2018) [81,84] and identify and classify structure and component types and the presence and severity of
465 damage (Gao and Mosalam 2018; Gonzalez, et al., 2020; Naito et al., 2020) [85–87]. While CNN is the most widely used
466 method among these studies, SVM and logistic regression has also been implemented. The training-testing splits ranged
467 from as high as (in terms of the relative size of the training set) 90%-10% to as low as 2%-98%. It is worth noting that the
468 latter involved a study that utilized transfer learning which typically does not require large amounts of training data. The
469 confusion matrix, accuracy, precision, recall and the F1 score were used as performance metrics.
470 The Mangalathu and Burton (2019) [45] study, which is also summarized in Table 3, is the only one that utilized text-
471 based media. The authors trained a long short-term memory (LSTM) deep learning model (Graves and Schmidhuber, 2005;
472 Hochreiter, and Schmidhuber, 1997) [88,89] to classify building damage based on the ATC-20 categories (red, yellow and
473 green) (ATC, 1995) using natural language damage descriptions as the features. The dataset included 3,423 buildings
474 affected by the 2014 South Napa earthquake, with written documentation of the damage and the assigned ATC-20 tags. A
475 75%-25% training-testing split was used and the model performance was also assessed using the confusion matrix.
476
477 Table 3 Summary of ML models developed using images, videos and written text
ML Model Performance
Structure and/or Media ML
Study Task Training/Testing Performance
Component Type Type Algorithm(s)
Split Metric
Cha et al. 2016 Steel Bolts Detecting loosened bolts Images SVM ~ 90-10 Accuracy
Detecting concrete cracks,
Steel and concrete steel corrosion, bolt
Cha et al. 2018 Images Faster R-CNN ~ 80-20 Precision
components corrosion and steel
delamination
479 4.4 Models Developed using Structural Health Monitoring and Field Reconnaissance Data
480 Structural health monitoring (SHM) and post-event field reconnaissance have been central to the advancement of
481 building SDPA. The data generated from both of these activities provide insights into the performance of different types of
482 structures, especially under extreme loading conditions, and are well-suited for ML applications.
483 SHM is generally concerned with using various types of sensors to detect the type, location and extent of damage to
484 a structure. Some of the more traditional techniques that have been used to detect damage from SHM data include auto-
485 regressive model fitting (e.g. Sohn and Farrar, 2001) [90], Fast Fourier Transform (e.g. Lynch, 2002 [91]) and wavelet
486 transformation (e.g. Noh et al. 2011; Hwang and Lignos, 2018) [92,93]. Some noticeable ML-SDPA-SHM studies are
487 summarized in Table 4. A broad range of structure types were considered, including an aluminum frame test specimen
488 (Figueiredo et al. 2011) [94], a steel frame and truss (Ghiasi et al. 2016) [43] and tall RC building structures (Rafiei and
489 Adeli, 2017; Sun et al. 2019) [55,95]. All but one of the studies involved on damage detection, localization and/or
490 classification. The Sun et al. study was focused on reconstructing seismic structural responses in tall buildings. This was
491 also the only study that utilized regression (i.e. all others incorporated classification). In most cases, the predictor variables
492 (e.g. auto-regressive and frequency-domain parameters, wavelet features) were extracted from accelerometer recordings.
493 However, while the Sun et al. was developed with the intention of being applied to accelerometer measurements, it was
494 demonstrated using data generated from nonlinear response history analyses. The adopted ML methods include SVM, ANN
495 and kernel ridge. Also, the Rafiei and Adeli (2017) [95] study implemented a neural dynamic algorithm that was previously
496 developed by the second author (Adeli and Park, 1995) [15]. Most studies utilized training-testing splits and two of them
497 (Rafiei and Adeli, 2017; Sun et al. 2019) [55,95] evaluated the effect of the partition-point on model performance. The
498 Figueredo et al. study was the only one to utilize the receiving operator characteristic (ROC) curve to evaluate model
499 performance. The other studies utilized some of the metrics from prior sections (e.g. accuracy, MARD, mean absolute
500 error).
501 The only study to develop ML models using post-event field reconnaissance data (besides the ones mentioned in the
502 earlier sections) is by Mangalathu et al. (2020) [96]. Using a similar dataset from the 2014 South Napa earthquake
503 (discussed in the previous section), the authors developed a second damage classification model based on ATC-20 tags.
504 However, instead of written damage descriptions, this model utilized features related to the building (e.g. age, number of
505 stories), site (closest distance to surface projection of the fault rupture) and shaking intensity.
506
507 Table 4 Summary of ML models developed using structural health monitoring and field reconnaissance data
Auto-regressive parameters
Figueiredo et Aluminum Frame Damage detection
extracted from accelerometer ANN NA ROC Curve
al. 2011 Test Specimen and classification
recordings
Frequency domain
Rafiei and Damage Neural Dynamic
RC Tall Building parameters from acceleration Multiple Accuracy
Adeli 2017 classification Algorithm
recordings
509
510 5. Discussion
511 As evidenced by the previous section, the application of ML to building SDPA problems has regained significant
512 momentum within the past decade since its dormancy from the late 1990’s to the late 2000’s. Most (if not all) of the
513 reviewed studies have been exploratory and there is no evidence that any of the applications have made their way into
514 practice. For ML-SDPA to advance from conception and research into practice, there are several challenges that must be
515 overcome. A synthesis of those challenges as well as opportunities for future work are presented in this section.
517 One big contributor to the success of ML in other fields is the access to adequate data. Although the amount of data
518 required to achieve reasonable performance for ML models depends on the problem and goal, it is essential to have
519 sufficient high-quality data that the sampled group could represent the true distribution. This enables the adopted ML
520 algorithm(s) to discover underlying patterns and produce predictive models that are truly generalizable within the problem
521 scope. One of the major challenges in ML-SDPA applications is that the datasets are often limited in quantity and diversity.
522 In the studies that sought to predict structural response and performance using ML models, the data was generated from
523 nonlinear response history analyses by the researchers performing the study (e.g. Seo et al. 2012; Moradi and Burton, 2018)
524 [46,53]. However, to the author’s knowledge, none of these datasets have been made publicly available. To have a truly
525 representative dataset of structural response demands, an open access repository should be instituted with rigorous quality
526 control measures. The recently established DesignSafe (Rathje et. al., 2017) [97] platform makes the creation of such a
527 repository more feasible. The studies related to automatic information retrieval from visual media and models developed
528 using field reconnaissance and SHM data face similar challenges with lack of diversity in the adopted datasets. Resources
529 such as Structural ImageNet (Gao and Mosalam, 2018) [85], the National Hazard Engineering Infrastructure (NHERI)
530 RAPID facility (https://rapid.designsafe-ci.org/) and the DataCenterHub (http://datacenterhub.org) will, over time, help
531 alleviate this challenge. Despite being relatively small (on the order of hundreds of datapoints), there is more diversity in
532 the datasets that have been generated from physical experiments. In other words, the prior studies in this area have utilized
533 data generated from a broad range of experiments conducted by many researchers (e.g. Jeon et al. 2014; Huang and Burton,
535 One partial solution to the shortage of data from physical experiments is to incorporate domain knowledge within the
536 ML algorithm, which will reduce the complexity of the model space and consequently reduce amount of data that is needed
537 to achieve good performance. Transfer learning is another technique that can be used to address the data-shortage issue.
538 The basic idea behind transfer learning is that the knowledge acquired from training one model for a specific problem or
539 domain can be “transferred” to another. Additionally, there are procedures such as Monte Carlo Simulation, and generative
540 models such as Generative Adversarial Networks (GAN) (Goodfellow et al. 2014) [99] and Variational Autoencoders
541 (VAE) (Kingma and Welling 2014) [100], which can augment existing datasets through the generation of synthetic data.
542 Future efforts should focus on all four of these options: (i) collecting and curating more diverse datasets, (ii) generating
543 synthetic data, (iii) utilizing transfer learning and (iv) incorporating domain knowledge in the design of the ML model.
544 Another important concern is data quality, which is a common challenge for ML models. Currently, there are no formal
545 methods for collecting and synthesizing datasets generated by the building SDPA community. This lack of systematic
546 curation procedures can lead to issues such as the existence of outliers in the data, which can have an adverse effect on the
547 performance of ML models. This is especially true for ML algorithms such as logistic regression, which are less capable
548 of dealing with noise. There are anomaly detection procedures such as DBSCAN (Ester et al. 1996) [101], K-Means
549 clustering (Lloyd and Stuart, 1982) [102] and Z-score (Rousseeuw and Hubert, 2011) [103] that can be used to address
550 outliers. Building SDPA domain-specific procedures should also be implemented. In other words, the universal data
551 filtering procedure of ML models should be carefully integrated with building SDPA domain knowledge. Ultimately, many
552 of the challenges related to the quality of building SDPA datasets can be addressed if precise collection and processing
554 Once the standardized benchmark datasets such as the ones suggested in the previous section have been created, a
555 unified set of performance measures and context-specific thresholds for determining when models are deemed adequate,
556 should be developed. This, along with the creation of the ImageNet dataset [104], was a key factor in the success of the
559 A wide range of algorithms were used in the reviewed ML-SDPA studies. Unfortunately, there is no consensus or
560 general takeaway about ML method-selection that could be inferred from the review. In some studies, the author(s) chose
561 to focus on a single method (e.g. Morfidis and Kostinakis, 2017; Luo and Pall, 2018) [51,65]. However, no clear compelling
562 reason is ever provided for the selected method. Other studies focused on comparing the performance of ML models
563 developed using different methods. However, the findings from these comparative assessments are difficult to generalize
564 because they are very much conditioned on the adopted dataset, and the model training (e.g. whether or not +-fold cross
565 validation is adopted) and testing (e.g. partition point for training-testing split, performance metric). Future efforts should
566 place a greater focus on analyzing domain specific characteristics of the adopted datasets and applying knowledge-
567 informed strategies in selecting ML algorithms instead of using a purely performance-driven search. For example,
568 multioutput models are especially useful for predicting backbone curves due to its capability in predicting multiple response
569 variables. Addressing some of the aforementioned challenges with creating systematic and well-curated datasets would
570 also help with the method-selection issue. The advantage of having such benchmark datasets is that a standard dataset will
571 encourage focused attention on integrating domain knowledge and the associated data patterns. Nevertheless, performance-
572 driven model selection is often the ideal solution when there is no sense of how domain knowledge can be incorporated.
573 An immediate strategy that can be used to guide method-selection is to begin by training and evaluating the performance
574 of a linear (basis function) ML model (OLS, LASSO, ridge) for regression problems and logistic regression for
575 classification problems. With the exception of very specific problems (e.g. computer vision or natural language processing),
576 linear models have been shown to perform reasonably well, while being easy to implement (e.g. Burton et al. 2017;
577 Mangalathu and Jeon, 2018) [27,67] and more importantly, have high model transparency and interpretability. In the event
578 that the initial linear models do not perform well, they should be further investigated before moving on to more complex
579 models. For example, the poor performance could be because of simplicity of the linear ML model or noisy data. The
580 former situation requires exploring advanced (e.g. non-parametric) models that can capture the complexity of data.
583 One of the most significant challenges associated with ML-SDPA models is explaining the feature effects and
584 interpreting the physical meaning of the model parameters. A commonly held view is that ML models, especially the more
585 advanced ones, are black boxes. In other words, they are difficult to extract mechanistic relationships between input
586 (features) and output (response variables) parameters in data-driven models. One approach to increasing model
587 explainability is to perform feature importance tests to understand their marginal effect on the response variable, which can
588 then be benchmarked against the fundamental principles that are known to govern the phenomena. Statistical methods such
589 as the F test (e.g. Sun et al. 2019) [55] and analysis of variance (ANOVA) (e.g. Moradi and Burton, 2018) [53] can be used
590 to evaluate the relative strengths of association between features and response variables. In addition, the partial dependence
591 plot (PD plot) and its variant, individual conditional expectation curves (ICE) are also widely used [105,106]. Besides these
592 general measures of feature importance, model-specific techniques such as the use of class activation mapping (CAM) to
593 visualize focus areas on the image of CNN models, [107,108] have been developed. On the other hand, some recent efforts
594 on the interpretability of ML have demonstrated the benefit of introducing domain knowledge into ML algorithms by
595 incorporating a physics-based loss function. A specific example is to embed hard conditions with a Lagrange multiplier
596 into the loss function (e.g. Karpatne et al. 2017a; Muralidhar et al. 2018) [109,110]. This approach provides a means to
597 explain some of the ML model by adding a physics-based law into the objective function. In Karpatne et al. (2017b) [111],
598 a spectrum of approaches is discussed, whereby the wealth of domain knowledge is leveraged to improve the performance
599 of data-driven models. One recent article (Zhang and Sun, 2020 [112]) in the SHM domain combines observed labeled
600 field data with unlabeled simulation data using a physics-guided neural network with a loss function that contains additional
601 terms that reflect the discrepancy between observed and simulation output. Combining ML and physics-based models
602 remains a challenging problem especially for SE community and will continue to be explored in future research.
604 Overfitted models, which results in inadequate performance outside of the data used for training and/or testing, is a
∗
605 domain-agnostic challenge that is faced by the broader ML community. In Figure 2, assuming presents the “true”
606 model in the space Θ, overfitted models ( "7) are of high variance and do not generalize well while underfitting models
607 ( "Ÿ ) have high bias and inferior predictive performance. In most cases there is no so-called “true” model and the goal of
608 ML is to find a balanced model ( "Ÿ ). Standard ML procedures seek to address the overfitting issue by utilizing
609 training/testing split, +-fold cross validation, bagging and bootstrapping, as well as other algorithm-specific approaches.
610 For instance, the stochastic procedure used by RF to generate trees was intentionally developed to avoid the overfitting
611 challenge associated with DT. It should be noted that overfitting is not only associated with model training but also model
612 selection. A sophisticated nonlinear model trained on a dataset with low-dimensional (a small number of) features can also
613 be overfitted. For the SE community, the application of domain knowledge can also help with avoiding overfitting issues.
614 The combination of a data-driven procedure and domain knowledge, similar to the approach used to deal with data sparsity,
615 may prove to be a powerful combination. Although overfitting has been extensively studied in the broader ML community,
616 it could be more critical to SE applications given the complexity of some of the mechanistic relationships that data-driven
617 models attempt to replicate. Consequently, ML-SDPA models often require large amounts of data, better noise filtering
619
620 Figure 2 Illustrating the tradeoff between bias and variance in machine learning models
621 6. Conclusion
622 This paper provides a review of machine learning (ML) applications in building structural design and performance
623 assessment (SDPA). The vulnerability of aged structures under natural hazards and the complexity of modern building
624 systems call for efficient and reliable frameworks for performance assessment, conditional monitoring and risk-informed
625 decision making. The increase in computational power in recent years has enhanced the capability of ML in complex
626 applications involving large-scale, high-dimensional nonlinear data. With the advantages in pattern recognition and
627 function approximation, ML offers a natural choice to help address the aforementioned challenges in building SDPA.
628 In order to provide a good understanding of building SDPA problems that are suitable for ML applications and the
629 available models for solving specific problems, an overview of the ML methodology is given, followed by a review of the
630 supervised learning algorithms most utilized in building SDPA literature: Linear Regression, Kernel Regression, Tree-
631 Based Algorithms, Logistic Regression, Support Vector Machine, K-Nearest Neighbors and Neural Networks. Next, the
632 ML applications in previous building SDPA studies are placed into the following four categories and reviewed: (1)
633 predicting structural response and performance, (2) interpreting experimental data and formulating models to predict
634 component-level structural properties, (3) information retrieval using images and written text and (4) recognizing patterns
635 in structural health monitoring data. These successful applications have demonstrated the capability of ML in efficiently
636 extracting information from multi-media building SDPA data and assessing structural performance.
637 To bring ML into building SDPA practice, several key challenges need to be addressed. First, adequate high-quality
638 data sources essential for ML model development are currently unavailable within the building SDPA community.
639 Therefore, a unified effort is needed to generate, collect and curate diverse datasets to an open-source repository that can
640 be populated by researchers and practitioners. This effort should also include the creation of benchmark datasets for specific
641 SDPA sub-domains to align and focus research resources. Data augmentation, transfer learning and reasonable design of
642 ML algorithms with domain knowledge can also help address the data sparsity. Second, previous studies did not establish
643 general guidelines for the selection of ML models. Future studies should incorporate more knowledge-informed selection
644 strategies. As a rule of thumb, initial exploration should focus on simple linear models which are usually easy to interpret
645 and explain. The complexity of the data space can also inform the model selection. Third, the results from ML models are
646 often difficult to interpret. This can be addressed by using importance testing to better understand the individual effects of
647 features on the response variable. The introduction of physics-based loss functions can offer insight into ML model training
648 and interpretation and can potentially improve robustness. Lastly, overfitting is a significant issue for ML models,
649 especially when attempting to capture complex mechanistic relationships in building SDPA problems. This issue can be
650 further studied by examining the SDPA data space and proposing physics-based validation and evaluation techniques.
651 Future research should also focus on finding ways to combine data-driven procedures with building SDPA domain
652 knowledge, which will serve to boost performance and provide model insights.
653 Acknowledgements
654 The research presented in this paper is supported by two National Science Foundation CMMI research grants: No.
656 Reference
657 1. Murphy KP. Machine learning: a probabilistic perspective. MIT press; 2012.
658 2. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning. vol. 1. Springer series in statistics New York;
659 2001.
660 3. Adeli H, Yeh C. Perceptron learning in engineering design. Computer-Aided Civil and Infrastructure Engineering 1989;
661 4(4): 247–256.
662 4. Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the
663 National Academy of Sciences 1982; 79(8): 2554–2558.
664 5. Vanluchene R, Sun R. Neural networks in structural engineering. Computer-Aided Civil and Infrastructure Engineering
665 1990; 5(3): 207–215.
666 6. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature 1986; 323(6088):
667 533–536.
668 7. Hajela P, Berke L. Neurobiological computational models in structural analysis and design. Computers & Structures
669 1991; 41(4): 657–667.
670 8. Ghaboussi J, Garrett Jr J, Wu X. Knowledge-based modeling of material behavior with neural networks. Journal of
671 Engineering Mechanics 1991; 117(1): 132–153.
672 9. Wu X, Ghaboussi J, Garrett Jr J. Use of neural networks in detection of structural damage. Computers & Structures
673 1992; 42(4): 649–659.
674 10. Masri S, Chassiakos A, Caughey T. Identification of nonlinear dynamic systems using neural networks 1993.
675 11. Kang HT, Yoon CJ. Neural network approaches to aid simple truss design problems. Computer-Aided Civil and
676 Infrastructure Engineering 1994; 9(3): 211–218.
677 12. Messner JI, Sanvido VE, Kumara SR. StructNet: A neural network for structural system selection. Computer-Aided
678 Civil and Infrastructure Engineering 1994; 9(2): 109–118.
679 13. Elkordy M, Chang K, Lee G. A structural damage neural network monitoring system. Computer-Aided Civil and
680 Infrastructure Engineering 1994; 9(2): 83–96.
681 14. Gunaratnam D, Gero J. Effect of representation on the performance of neural networks in structural engineering
682 applications. Computer-Aided Civil and Infrastructure Engineering 1994; 9(2): 97–108.
683 15. Adeli H, Park HS. A neural dynamics model for structural optimization—theory. Computers & Structures 1995; 57(3):
684 383–390.
685 16. Reich Y. Machine learning techniques for civil engineering problems. Computer-Aided Civil and Infrastructure
686 Engineering 1997; 12(4): 295–310.
687 17. Buhmann JM, Held M. Unsupervised learning without overfitting: Empirical risk approximation as an induction
688 principle for reliable clustering. International Conference on Advances in Pattern Recognition, Springer; 1999.
689 18. Huang TM, Kecman V, Kopriva I. Kernel based algorithms for mining huge data sets. vol. 1. Springer; 2006.
690 19. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE
691 computer society conference on computer vision and pattern recognition. CVPR 2001, vol. 1, IEEE; 2001.
692 20. Lienhart R, Maydt J. An extended set of haar-like features for rapid object detection. Image Processing. 2002.
693 Proceedings. 2002 International Conference on, vol. 1, IEEE; 2002.
694 21. Lowe DG. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 2004;
695 60(2): 91–110.
696 22. Dalal N, Triggs B. Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition,
697 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, IEEE; 2005.
698 23. Bunke O, Droge B. Bootstrap and cross-validation estimates of the prediction error for linear regression models. The
699 Annals of Statistics 1984: 1400–1424.
700 24. Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation 2011.
701 25. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in
702 neural information processing systems, 2012.
703 26. Mack Y, Goel T, Shyy W, Haftka R. Surrogate model-based optimization framework: a case study in aerospace design.
704 Evolutionary computation in dynamic and uncertain environments, Springer; 2007.
705 27. Burton HV, Sreekumar S, Sharma M, Sun H. Estimating aftershock collapse vulnerability using mainshock intensity,
706 structural response and physical damage indicators. Structural Safety 2017; 68: 85–96.
707 28. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B
708 (Methodological) 1996; 58(1): 267–288.
709 29. Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970; 12(1):
710 55–67.
711 30. Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control 1974; 19(6): 716–
712 723.
713 31. Schwarz G, others. Estimating the dimension of a model. The Annals of Statistics 1978; 6(2): 461–464.
714 32. Friedman JH. Multivariate adaptive regression splines. The Annals of Statistics 1991: 1–67.
715 33. Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. CRC press; 1984.
716 34. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Springer
717 Science & Business Media; 2009.
718 35. Freund Y, Schapire RE. A desicion-theoretic generalization of on-line learning and an application to boosting. European
719 conference on computational learning theory, Springer; 1995.
720 36. Hastie T, Rosset S, Zhu J, Zou H. Multi-class adaboost. Statistics and Its Interface 2009; 2(3): 349–360.
721 37. Abu-Mostafa YS, Magdon-Ismail M, Lin HT. Learning from data. vol. 4. AMLBook New York, NY, USA:; 2012.
722 38. Breiman L. Random forests. Machine Learning 2001; 45(1): 5–32.
723 39. Bishop CM. Pattern recognition and machine learning. springer; 2006.
724 40. Rosasco L, Vito ED, Caponnetto A, Piana M, Verri A. Are loss functions all the same? Neural Computation 2004; 16(5):
725 1063–1076.
726 41. Cortes C, Vapnik V. Support-vector networks. Machine Learning 1995; 20(3): 273–297.
727 42.Wu YN, Gao R, Han T, Zhu SC. A tale of three probabilistic families: Discriminative, descriptive, and generative models.
728 Quarterly of Applied Mathematics 2019; 77(2): 423–465.
729 43. Ghiasi R, Torkzadeh P, Noori M. A machine-learning approach for structural damage detection using least square
730 support vector machine based on a new combinational kernel function. Structural Health Monitoring 2016; 15(3): 302–
731 316.
732 44. Zhang Y, Burton HV. Pattern recognition approach to assess the residual structural capacity of damaged tall buildings.
733 Structural Safety 2019; 78: 12–22.
734 45. Mangalathu S, Burton HV. Deep learning-based classification of earthquake-impacted buildings using textual damage
735 descriptions. International Journal of Disaster Risk Reduction 2019; 36: 101111.
736 46. Seo J, Dueñas-Osorio L, Craig JI, Goodno BJ. Metamodel-based regional vulnerability estimate of irregular steel
737 moment-frame structures subjected to earthquake events. Engineering Structures 2012; 45: 585–597.
738 47. Khojastehfar E, Beheshti-Aval SB, Zolfaghari MR, Nasrollahzade K. Collapse fragility curve development using Monte
739 Carlo simulation and artificial neural network. Proceedings of the Institution of Mechanical Engineers, Part O: Journal
740 of Risk and Reliability 2014; 228(3): 301–312.
741 48. Jough FKG, Şensoy S. Prediction of seismic collapse risk of steel moment frame mid-rise structures by meta-heuristic
742 algorithms. Earthquake Engineering and Engineering Vibration 2016; 15(4): 743–757.
743 49. Kiani J, Camp C, Pezeshk S. On the application of machine learning techniques to derive seismic fragility curves.
744 Computers & Structures 2019; 218: 108–122.
745 50. Mitropoulou CC, Papadrakakis M. Developing fragility curves based on neural network IDA predictions. Engineering
746 Structures 2011; 33(12): 3409–3421.
747 51. Morfidis K, Kostinakis K. Seismic parameters’ combinations for the optimum prediction of the damage state of R/C
748 buildings using neural networks. Advances in Engineering Software 2017; 106: 1–16.
749 52. Zhang Y, Burton HV, Sun H, Shokrabadi M. A machine learning framework for assessing post-earthquake structural
750 safety. Structural Safety 2018; 72: 1–16.
751 53. Moradi S, Burton HV. Response surface analysis and optimization of controlled rocking steel braced frames. Bulletin
752 of Earthquake Engineering 2018; 16(10): 4861–4892.
753 54. Moradi S, Burton HV, Kumar I. Parameterized fragility functions for controlled rocking steel braced frames.
754 Engineering Structures 2018; 176: 254–264.
755 55. Sun H, Burton H, Wallace J. Reconstructing seismic response demands across multiple tall buildings using kernel-based
756 machine learning methods. Structural Control and Health Monitoring 2019: e2359.
757 56. Hobbs D. The compressive strength of concrete: a statistical approach to failure. Magazine of Concrete Research 1972;
758 24(80): 127–138.
759 57. Bažant ZP, Zebich S. Statistical linear regression analysis of prediction models for creep and shrinkage. Cement and
760 Concrete Research 1983; 13(6): 869–876.
761 58. Bazant ZP, Chern JC. Bayesian statistical prediction of concrete creep and shrinkage. ACJ Journal, Proceedings 1984;
762 81(4): 319–330.
763 59. Bažant ZP, Kim JK, Panula L. Improved prediction model for time-dependent deformations of concrete: Part 1-
764 Shrinkage. Materials and Structures 1991; 24(5): 327–345.
765 60. Bažant ZP, Kim JK. Improved prediction model for time-dependent deformations of concrete: Part 2—Basic creep.
766 Materials and Structures 1991; 24(6): 409.
767 61. Carpinteri A, Ferro G, Invernizzi S. A truncated statistical model for analyzing the size-effect on tensile strength of
768 concrete structures. Proceedings of the 2 nd International Conference on Fracture Mechanics of Concrete Structures,
769 ed. by F. H. Wittmann, Zürich, Aedificatio, 1995.
770 62. Haselton CB, Liel AB, Taylor-Lange SC, Deierlein GG. Calibration of model to simulate response of reinforced
771 concrete beam-columns to collapse. ACI Structural Journal 2016; 113(6).
772 63. Lignos DG, Krawinkler H. Deterioration modeling of steel components in support of collapse prediction of steel
773 moment frames under earthquake loading. Journal of Structural Engineering 2011; 137(11): 1291–1302.
774 64. Naeej M, Bali M, Naeej MR, Amiri JV. Prediction of lateral confinement coefficient in reinforced concrete columns
775 using M5′ machine learning method. KSCE Journal of Civil Engineering 2013; 17(7): 1714–1719.
776 65. Luo H, Paal SG. Machine learning–based backbone curve model of reinforced concrete columns subjected to cyclic
777 loading reversals. Journal of Computing in Civil Engineering 2018; 32(5): 04018042.
778 66. Luo H, Paal SG. A locally weighted machine learning model for generalized prediction of drift capacity in seismic
779 vulnerability assessments. Computer-Aided Civil and Infrastructure Engineering 2019; 34(11): 935–950.
780 67. Mangalathu S, Jeon JS. Classification of failure mode and prediction of shear strength for reinforced concrete beam-
781 column joints using machine learning techniques. Engineering Structures 2018; 160: 85–94.
782 68. Jeon JS, Shafieezadeh A, DesRoches R. Statistical models for shear strength of RC beam-column joints using machine-
783 learning techniques. Earthquake Engineering & Structural Dynamics 2014; 43(14): 2075–2095.
784 69. Vu DT, Hoang ND. Punching shear capacity estimation of FRP-reinforced concrete slabs using a hybrid machine
785 learning approach. Structure and Infrastructure Engineering 2016; 12(9): 1153–1161.
786 70. Huang H, Burton HV. Classification of in-plane failure modes for reinforced concrete frames with infills using machine
787 learning. Journal of Building Engineering 2019; 25: 100767.
788 71. Mangalathu S, Jang H, Hwang SH, Jeon JS. Data-driven machine-learning-based seismic failure mode identification of
789 reinforced concrete shear walls. Engineering Structures 2020; 208: 110331.
790 72. Szeliski R. Computer vision: algorithms and applications. Springer Science & Business Media; 2010.
791 73. Zhu Z, German S, Brilakis I. Visual retrieval of concrete crack properties for automated post-earthquake structural
792 safety evaluation. Automation in Construction 2011; 20(7): 874–883.
793 74. German S, Brilakis I, DesRoches R. Rapid entropy-based detection and properties measurement of concrete spalling
794 with machine vision for post-earthquake safety assessments. Advanced Engineering Informatics 2012; 26(4): 846–858.
795 75. German S, Jeon JS, Zhu Z, Bearman C, Brilakis I, DesRoches R, et al. Machine vision-enhanced postearthquake
796 inspection. Journal of Computing in Civil Engineering 2013; 27(6): 622–634.
797 76. Koch C, Paal SG, Rashidi A, Zhu Z, König M, Brilakis I. Achievements and challenges in machine vision-based
798 inspection of large concrete structures. Advances in Structural Engineering 2014; 17(3): 303–318.
799 77. Koch C, Georgieva K, Kasireddy V, Akinci B, Fieguth P. A review on computer vision based defect detection and
800 condition assessment of concrete and asphalt civil infrastructure. Advanced Engineering Informatics 2015; 29(2): 196–
801 210.
802 78. Kong X, Li J. Vision-based fatigue crack detection of steel structures using video feature tracking. Computer-Aided
803 Civil and Infrastructure Engineering 2018; 33(9): 783–799.
804 79. Brilakis I, Fathi H, Rashidi A. Progressive 3D reconstruction of infrastructure with videogrammetry. Automation in
805 Construction 2011; 20(7): 884–895.
806 80. Paal SG, Jeon JS, Brilakis I, DesRoches R. Automated damage index estimation of reinforced concrete columns for
807 post-earthquake evaluations. Journal of Structural Engineering 2015; 141(9): 04014228.
808 81. Cha YJ, Choi W, Suh G, Mahmoudkhani S, Büyüköztürk O. Autonomous structural visual inspection using region-
809 based deep learning for detecting multiple damage types. Computer-Aided Civil and Infrastructure Engineering 2018;
810 33(9): 731–747.
811 82. Kucuksubasi F, Sorguc A. Transfer Learning-Based Crack Detection by Autonomous UAVs. ArXiv Preprint
812 ArXiv:180711785 2018.
813 83. Hoang ND, Nguyen QL, Tran XL. Automatic detection of concrete spalling using piecewise linear stochastic gradient
814 descent logistic regression and image texture analysis. Complexity 2019; 2019.
815 84. Cha YJ, You K, Choi W. Vision-based detection of loosened bolts using the Hough transform and support vector
816 machines. Automation in Construction 2016; 71: 181–188.
817 85. Gao Y, Mosalam KM. Deep transfer learning for image-based structural damage recognition. Computer-Aided Civil and
818 Infrastructure Engineering 2018; 33(9): 748–768.
819 86. Gonzalez D, Rueda-Plata D, Acevedo AB, Duque JC, Ramos-Pollán R, Betancourt A, et al. Automatic detection of
820 building typology using deep learning methods on street level images. Building and Environment 2020: 106805.
821 87. Naito S, Tomozawa H, Mori Y, Nagata T, Monma N, Nakamura H, et al. Building-damage detection method based on
822 machine learning utilizing aerial photographs of the Kumamoto earthquake. Earthquake Spectra 2020:
823 8755293019901309.
824 88. Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network
825 architectures. Neural Networks 2005; 18(5–6): 602–610.
826 89. Hochreiter S, Schmidhuber J. LSTM can solve hard long time lag problems. Advances in neural information processing
827 systems, 1997.
828 90. Sohn H, Farrar CR. Damage diagnosis using time series analysis of vibration signals. Smart Materials and Structures
829 2001; 10(3): 446.
830 91. Lynch JP. Decentralization of wireless monitoring and control technologies for smart civil structures. PhD Thesis.
831 Stanford University Stanford, CA, 2002.
832 92. Young Noh H, Krishnan Nair K, Lignos DG, Kiremidjian AS. Use of wavelet-based damage-sensitive features for
833 structural damage diagnosis using strong motion data. Journal of Structural Engineering 2011; 137(10): 1215–1228.
834 93. Hwang SH, Lignos DG. Assessment of structural damage detection methods for steel structures using full-scale
835 experimental data and nonlinear analysis. Bulletin of Earthquake Engineering 2018; 16(7): 2971–2999.
836 94. Figueiredo E, Park G, Farrar CR, Worden K, Figueiras J. Machine learning algorithms for damage detection under
837 operational and environmental variability. Structural Health Monitoring 2011; 10(6): 559–572.
838 95. Rafiei MH, Adeli H. A novel machine learning-based algorithm to detect damage in high-rise building structures. The
839 Structural Design of Tall and Special Buildings 2017; 26(18): e1400.
840 96.Mangalathu S, Sun H, Nweke CC, Yi Z, Burton HV. Classifying earthquake damage to buildings using machine learning.
841 Earthquake Spectra 2020; 36(1): 183–208.
842 97. Rathje EM, Dawson C, Padgett JE, Pinelli JP, Stanzione D, Adair A, et al. DesignSafe: new cyberinfrastructure for
843 natural hazards engineering. Natural Hazards Review 2017; 18(3): 06017001.
844 98. Hoang ND, Tran XL, Nguyen H. Predicting ultimate bond strength of corroded reinforcement and surrounding concrete
845 using a metaheuristic optimized least squares support vector regression model. Neural Computing and Applications
846 2019: 1–21.
847 99. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. Advances
848 in neural information processing systems, 2014.
849 100. Kingma DP, Welling M. Auto-encoding variational bayes. ArXiv Preprint ArXiv:13126114 2013.
850 101. Ester M, Kriegel HP, Sander J, Xu X, others. A density-based algorithm for discovering clusters in large spatial
851 databases with noise. Kdd, vol. 96, 1996.
852 102. Lloyd S. Least squares quantization in PCM. IEEE Transactions on Information Theory 1982; 28(2): 129–137.
853 103. Rousseeuw PJ, Hubert M. Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and
854 Knowledge Discovery 2011; 1(1): 73–79.
855 104. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. Computer
856 Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, Ieee; 2009.
857 105. Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of Statistics 2001: 1189–1232.
858 106. Goldstein A, Kapelner A, Bleich J, Pitkin E. Peeking inside the black box: Visualizing statistical learning with plots
859 of individual conditional expectation. Journal of Computational and Graphical Statistics 2015; 24(1): 44–65.
860 107. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization.
861 Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
862 108. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep
863 networks via gradient-based localization. Proceedings of the IEEE international conference on computer vision, 2017.
864 109. Karpatne A, Atluri G, Faghmous JH, Steinbach M, Banerjee A, Ganguly A, et al. Theory-guided data science: A new
865 paradigm for scientific discovery from data. IEEE Transactions on Knowledge and Data Engineering 2017; 29(10):
866 2318–2331.
867 110. Muralidhar N, Islam MR, Marwah M, Karpatne A, Ramakrishnan N. Incorporating Prior Domain Knowledge into
868 Deep Neural Networks. 2018 IEEE International Conference on Big Data (Big Data), IEEE; 2018.
869 111. Karpatne A, Watkins W, Read J, Kumar V. Physics-guided neural networks (pgnn): An application in lake temperature
870 modeling. ArXiv Preprint ArXiv:171011431 2017.
871 112. Zhang Z, Sun C. Structural damage identification via physics-guided machine learning: a methodology integrating
872 pattern recognition with finite element model updating. Structural Health Monitoring 2020: 1475921720927488.
873