Machine Learning With MATLAB Quick Reference
Machine Learning With MATLAB Quick Reference
Getting Started
Review - Machine Learning Onramp
Summary: Machine Learning Onramp
allStats = readtable("bballStats.txt");
The readtable function creates a table
playerInfo = readtable("bballPlayers.txt");
in MATLAB from a data file.
positions = ["G","G-F","F-G","F","F-C","C-F","C"];
The categorical function creates a
playerInfo.pos = categorical(playerInfo.pos,positions);
categorical array from data.
allStats(19:end) = [];
Assigning the empty array removes
playerInfo = rmmissing(playerInfo);
rows or columns. The rmmissing
function removes any row with missing
or undefined elements.
playerStats = groupsummary(allStats,"playerID","sum");
The groupsummary function calculates
statistics grouped according to a
grouping variable.
data = innerjoin(playerInfo,playerStats);
The innerjoin function merges two
tables, retaining only the common key
variable observations.
boxplot(data.height,data.pos)
The boxplot function can create separate
box plots based on a grouping variable.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 1/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
data{:,8:22} = data{:,8:22}./data.minutes;
You can use indexing and element-wise
data.minutes = data.minutes./data.GP;
division to scale variables in a table.
gscatter(data.rebounds,data.points,data.pos)
knnmodel = fitcknn(data,"pos");
The fitcknn function fits a k-nearest
neighbors classification model.
knnmodel = fitcknn(data,"pos","NumNeighbors",5,...
You can use property name-value pairs to
"Standardize",true);
modify model options.
mdlLoss = loss(knnmodel,dataTest)
You can calculate the misclassification rate mdlLoss =
for a data set using the loss function. 0.4085
predPos = predict(knnmodel,dataTest);
The predict function uses a classification
model to predict classes for observations.
confusionchart(data.pos,predPos);
You can use the confusionchart function
to visually compare true classes and
predicted classes.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 2/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
You can use the function pdist to calculate the pairwise distance between the observations. Note
that the input should be a numeric matrix.
>> D = pdist(data,"distance")
Outputs Inputs
D A distance or dissimilarity vector containing data An m-by-n numeric matrix
the distance between each pair of containing the data. Each of the
observations. D is of length m(m-1)/2. m rows is considered an
observation.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 3/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
You can now use the dissimilarity vector as an input to the function cmdscale .
Outputs Inputs
x m-by-q matrix of the reconstructed D A distance or dissimilarity vector.
coordinates in q-dimensional space.
You can use the eigenvalues e to determine if a low-dimensional approximation to the points in x
provides a reasonable representation of the data. If the first p eigenvalues are significantly larger than
the rest, the points are well approximated by the first p dimensions (that is, the first p columns of x ).
Another commonly used method for dimensionality reduction is principal component analysis (PCA).
Use the function pca to perform principal component analysis.
Outputs Inputs
pcs A n-by-n matrix of principal components. data An m-by-n numeric matrix. The n
columns correspond to n observed
scrs An m-by-n matrix containing the data
variables. Each of the m rows
transformed using the linear coordinate
corresponds to an observation.
transformation matrix pcs (first output).
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 4/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Suppose that the input matrix data has two columns which contain values of the observed variables x1 and x2.
[P,scrs,~,~,pexp] = pca(data)
P =
0.5687 0.8225
0.8225 -0.5687
After performing PCA, the first column of the output matrix P contains the coefficients of the first principal component.
[P,scrs,~,~,pexp] = pca(data)
P =
0.5687 0.8225
0.8225 -0.5687
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 5/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
The second column of the output matrix P contains the coefficients of the second principal component.
[P,scrs,~,~,pexp] = pca(data);
The second output scrs is a matrix containing the observations in data expressed in the coordinate space of the principal
components.
[P,scrs,~,~,pexp] = pca(data);
scrs(42,:)
ans =
1.0726 0.4163
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 6/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
For example, a single data point and its coordinates in the transformed space are shown here.
[P,scrs,~,~,pexp] = pca(data);
pexp
ans =
95.6706 4.3294
The last output pexp is a vector containing the percent variance explained by each principal component. Here, most of the
variance is explained by the first principal component.
scrs(:,1)
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 7/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
We can use only the first column of the transformed data, reducing the dimension of the data from 2 to 1.
gm = fitgmdist(X,2);
You can use the function fitgmdist
to fit several multidimensional
gaussian (normal) distributions.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 8/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
g = cluster(gm,X);
Now, the data can be clustered
probabilistically, by calculating each
observation’s posterior probability for
each component.
[g,~,p] = cluster(gm,X);
You can also return the individual
probabilities used to determine the
clusters.
With high-dimensional data, it is difficult to visualize the groups as points in space. How can you
interpret the groups given by a clustering method?
Parallel Coordinates
Consider a data set in which each observation has 4 variables (measurements) x1, x2, x3, and x4.
Suppose that you have created two clusters.
You can visualize the first observation by plotting its variable value on the y-axis and the variable
number on the x-axis. Similarly, you can visualize the second observation. If the second observation
is in a different cluster, visualize it using a different color.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 9/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
After visualizing several observations, you can see that cluster 1 has higher values of x1 and x3
whereas cluster 2 has higher values of x2 and x4.
Instead of visualizing each observation one-by-one, use the function parallelcoords , which creates
the graph shown above.
>> parallelcoords(X,"Group",g)
Inputs
X Data, specified as a numeric matrix.
When using clustering techniques such as k-means and Gaussian mixture models, you have to
specify the number of clusters. However, for high-dimensional data, it is difficult to determine the
optimum number of clusters.
You can use the silhouette values to judge the quality of the clusters. An observation’s silhouette
value is a normalized measure (between -1 and +1) of how close that observation is to other
observations in the same cluster, compared to the observations in different clusters.
Silhouette Plots
A silhouette plot shows the silhouette value of each observation, grouped by cluster. Clustering
schemes in which most of the observations have high silhouette value are desirable.
Below are two silhouette plots for the same data set. On the left, the observations are divided into two
clusters, and on the right they are divided into three clusters.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 10/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
In this case, dividing the data into 2 clusters instead of 3 produces better quality clusters. The
silhouette plot of two clusters shows fewer negative silhouette values, and those negative values are
of smaller magnitude than the negative values in the silhouette plot of three clusters.
Instead of manually experimenting with different numbers of clusters, you can automate the process
with the evalclusters function.
The following function call creates 2, 3, 4, and 5 clusters using k-means clustering, and calculates the
silhouette value for each clustering scheme.
clustev = evalclusters(X,"kmeans","silhouette","KList",2:5)
The output variable, clustev , contains detailed information about the evaluation including the
optimum number of clusters.
kbest = clustev.OptimalK
In place of "silhouette" , you can use other evaluation criteria such as "CalinskiHarabasz" ,
"DaviesBouldin" , and "gap" . Refer to the documentation for further details.
Hierarchical Clustering
Hierarchical Clustering
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 11/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Finding the hierarchical structure involves calculating the distance between each pair of points and
then using these distances to link together pairs of “neighboring” points.
Z = linkage(X,"ward","cosine");
Use the linkage function to create the
hierarchical tree.
dendrogram(Z)
You can use the cluster function to assign observations into groups, according to the linkage
distances Z .
Z = linkage(X,"centroid","cosine");
dendrogram(Z)
grp = cluster(Z,"maxclust",3)
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 12/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Classification Methods
Nearest Neighbor Classification
k-Nearest Neighbor Overview
Function fitcknn
Classification Trees
Decision Trees Overview
Function fitctree
Special
Trees are a good choice when there is a significant amount of missing data.
Notes
Function fitcnb
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 13/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Special
Naive Bayes is a good choice when there is a significant amount of missing data.
Notes
Discriminant Analysis
Fitting Discriminant Analysis Models
daModel = fitcdiscr(dataTrain,"ResponseVarName")
daModel = fitcdiscr(dataTrain,"ResponseVarName","DiscrimType","quadratic")
Function fitcdiscr
Special Linear discriminant analysis works well for “wide” data (more predictors than
Notes observations).
Function fitcsvm
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 14/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
SVMs use a distance based algorithm. For data that is not normalized, use the "Standardize"
Special option.
Notes
Linear SVMs work well for “wide” data (more predictors than observations). Gaussian SVMs often
work better on “tall” data (more observations than predictors).
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 15/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
The underlying calculations for classification with SVMs are binary by nature. You can perform
multiclass SVM classification by creating an error-correcting output codes (ECOC) classifier.
By default, the ECOC model reduces the model to multiple, binary classifiers using the one-vs-one
design.
1. Create a template for a binary classifier – Create a template for a binary SVM using the function templateSVM .
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 16/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Outputs Inputs
template An SVM classifier. "PropertyName" Optional property
name, e.g.,
"KernelFunction" .
2. Create multiclass SVM classifier – Use the function fitcecoc to create a multiclass SVM classifier.
Outputs Inputs
ecocModel ECOC classifier. dataTrain Training data.
Function fitcnet
Special
Neural networks require data to be normalized. For normalizing the data, use the
Notes
"Standardize" option.
To train a neural network with more than one hidden layer specify the number of neurons
per hidden layer as a vector. For example, "LayerSizes",[20,15] creates a network
with two hidden layers with 20 and 15 neurons accordingly.
Neural networks work well for "tall" data (more observations than predictors).
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 17/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Cross Validation
Using Cross Validation
To create a model with cross validation, provide one of the following options in the model-creation
function.
mdl = fitcknn(data,"Response","PropertyName",PropertyValue)
"Holdout" scalar from 0 to 1 Holdout with the given fraction reserved for validation.
If you already have a partition created using the cvpartition function, you can provide that to the
fitting function instead.
cvpt = cvpartition(data.Response,"KFold",k)
mdl = fitcknn(data,"Response","CVPartition",cvpt)
mdlLoss = kfoldLoss(mdl)
Principal component analysis (PCA) transforms an n-dimensional feature space into a new n-
dimensional space of orthogonal components. The components are ordered by the variation
explained in the data.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 18/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Outputs Inputs
pcs Principal components X Data matrix with m observations and
coefficients. The contribution of n predictors.
each predictor to the n principal (m-by-n matrix)
components.
(n-by-n matrix)
The transformed variables contain the same amount of information as the original data. However,
assuming that the data contains some amount of noise, the components that contain the last few
percent of explained variance are likely to represent noise more than information.
PCA can be used for dimensionality reduction by discarding the principal components beyond a
chosen threshold of explained variance. The pareto function can be used to visualize the variance
explained by the principal components.
In the following example, the input X has 11 columns but the first 9 principal components explain
more than 95% of variance.
[pcs,scrs,~,~,pctExp] = pca(X);
pareto(pctExp)
Xreduced = scrs(:,1:9);
The principal components by themselves have no physical meaning. However, the coefficients of the
linear transformation indicate the contribution of each variable in the principal component.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 19/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
For example, if the coefficients of the first principal component are 0.8, 0.05, and 0.3, the first variable
has the largest contribution followed by the third and the second variable.
Biplot
You can visualize any two principal components using the function biplot . It's commonly used to
visualize the first two principal components, which explain the greatest amount of variance in the
data.
In the following biplot, you can see that the predictive variables Age , InducedSTDep , METS , and
ExerciseDuration contribute heavily to the first principal component, but not to the second principal
component.
biplot(pcs(:,1:2),"VarLabels",varnames)
Heat Map
You can also visualize the contributions of each variable to the principal components as a heat map.
For example, the following heat map visualizes the first three principal components. You can see that
METS , MaxHeartRate , and ExerciseDuration contribute heavily to the first principal component.
heatmap(abs(pcs(:,1:3)),...
"YDisplayLabels",varnames);
xlabel("Principal Component")
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 20/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
PCA can be performed independent of the response variable. However, when the data has a
response variable that has multiple categories (for example, true and false), a parallel coordinates
plot of the principal component scores can be useful.
In the following plot, notice that the observations from one group (false) have high values of the first
principal component and the observations from the second group (true) have low values.
parallelcoords(scrs,"Group",y,"Quantile",0.25)
Feature ranking algorithms assign scores to features based on how relevant they are to the model
according to a given metric. Some common algorithms include Chi-Square, Minimum Redundancy
Maximum Relevance (MRMR), and Neighborhood Component Analysis.
Use the fs__ functions to apply feature ranking algorithms to data. Most feature ranking algorithms
return the positions and scores for all the features sorted from highest to lowest.
Outputs Inputs
idx Indices of predictors ordered by tblData Table data with predictor
predictor importance. variables and a response
variable.
scores Predictor scores.
ResponseVarName Name of the response
variable.
You can then use linear indexing to select a subset of features and use them to train the model.
toKeep = idx(1:nFeatures);
selected = [tblData(:,toKeep),tblData(:,"ResponseVarName")];
mdl = fitcknn(selected,"ResponseVarName");
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 21/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Outputs Inputs
toKeep Logical vector indicating which fun Function handle for a function that
predictors are included in the fits a model and calculates the loss.
final model.
X Numeric matrix with m observations
and n predictors.
tokeep = sequentialfs(@errorFun,X,y)
You can use the optional property "cv" to specify the cross validation method. For example, you can
specify a 7-fold cross validation.
tokeep = sequentialfs(@errorFun,X,y,"cv",7)
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 22/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Sequential feature selection requires an error function that builds a model and calculates the
prediction error. The error function must have the following structure.
Four numeric inputs:
two for training the model (predictor matrix and response vector)
two for evaluating the model (predictor matrix and response vector)
One scalar output representing the prediction error.
errorFun.m
1. function error = errorFun(Xtrain,ytrain,Xtest,ytest)
2.
3. % Create the model with the learning method of your choice
4. mdl = fitcsvm(Xtrain,ytrain);
5.
6. % Calculate the number of test observations misclassified
7. ypred = predict(mdl,XTest);
8. error = nnz(ypred ~= ytest);
9.
10. end
Note You do not need to create the training and test data sets. The sequentialfs function will
internally partition the data before calling the error function.
Some algorithms and functionality (e.g., sequentialfs ) requires predictors in the form of a numeric
matrix. If your data contains categorical predictors, how can you include these predictors in the
model?
One option is to assign a number to each category. However, this may impose a false numerical
structure on the observations.
For example, say you assign the numbers 1 through 4 to four categories in a predictor. This implies
that the distance between categories 1 and 4 is three times larger than the distance between
categories 3 and 4. In reality, the categories are probably equidistant from each other.
You can create a matrix of dummy variables using the function dummyvar .
d = dummyvar(c)
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 23/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
This matrix can now be used in a machine learning model in place of the categorical vector c , with
each column being treated as a separate predictor variable that indicates the presence ( 1 ) or
absence ( 0 ) of that category of c .
Hyperparameter Optimization
Performing Hyperparameter Optimization
You can use the "OptimizeHyperparameters" property name-value pair to choose which model properties to optimize.
Most model-creation functions accept the "OptimizeHyperparameters" option.
Outputs Inputs
mdl Model fit using data,"ResponseName" Table of predictors and
optimized property values. response values and
response variable name.
During the optimization, iterative updates are displayed, as well as a plot with the best objective function value against the
iteration number.
mdl = fitcknn(data,"y","OptimizeHyperparameters","auto")
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 24/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Setting the "OptimizeHyperparameters" property value to "auto" will optimize a typical set of hyperparameters. The
properties optimized differ depending on the model type. For example, for Nearest Neighbor classification, the optimized
properties are "Distance" and "NumNeighbors" .
Optimization Options
By default, hyperparameter optimization uses Bayesian optimization and tries to minimize the 5-fold cross-validation loss. You
can change these settings with the "HyperparameterOptimizationOptions" property name-value pair.
Specify the optimization options using a structure. To use 10-fold cross validation, create a cross-validation partition and then
create a structure containing option name-value pairs.
part = cvpartition(y,"KFold",10);
opt = struct("CVPartition",part);
mdl = fitcknn(data,"y","OptimizeHyperparameters","auto","HyperparameterOptimizationOptions",opt);
You can set many optimization options in the structure. For example, you can hide the plots and set the maximum number of
objective function evaluations.
opt = struct("ShowPlots",false,"MaxObjectiveEvaluations",50);
To see the available options for a particular model-creation function, view the function's documentation.
Ensemble Learning
Fitting Ensemble Models
The fitcensemble function creates a classification ensemble of weak learners. Similarly, the fitrensemble function
creates a regression ensemble. Both functions have identical syntax.
Outputs Inputs
mdl Ensemble model variable. data Table containing the
predictors and
response values.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 25/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
"Method" - Bagging (bootstrap aggregation) and boosting are the two most common approaches used in ensemble
modeling. The fitcensemble function provides several bagging and boosting methods. For example, use the "Bag"
method to create a random forest.
mdl - fitcensemble(data,"Y","Method","Bag")
The default method depends on if it is a binary or multiclass classification problem, as well as the type of learners in the
ensemble.
"Learners" - You can specify the type of weak-learner to use in the ensemble: "tree" , "discriminant" , or "knn" .
The default learner type depends on the method specified: the method "Subspace" has default learner "knn" , and all
other methods have default learner "tree" .
mdl = fitcensemble(data,"Y","Learners","knn")
The fitcensemble function uses the default settings for each learner type. To customize learner properties, use a weak-
learner template.
mdl = fitcensemble(data,"Y","Learners",templateKNN("NumNeighbors",3))
You can use a cell vector of learners to create an ensemble composed of more than one type of learner. For example, an
ensemble could consist of two types of kNN learners.
lnrs = {templateKNN("NumNeighbors",3),templateKNN("NumNeighbors",5)}
mdl = fitcensemble(data,"Y","Learners",lnrs)
"NumLearningCycles" - At every learning cycle, one weak learner is trained for each learner specified in "Learners" .
The default number of learning cycles is 100. If "Learners" contains only one learner (as is usually the case), then by
default 100 learners are trained. If "Learners" contains two learners, then by default 200 learners are trained (two
learners per learning cycle).
Regression Methods
Linear Models
Fitting Linear Regression Models
Outputs Inputs
mdl A regression model variable containing the data A table containing the data used
coefficients and other information about to fit the regression model. See
the model. below for details.
In a regression model, the relationship between the The first input to fitlm is a table containing the predictors
predictors and the response can be described by the and the response. By default, fitlm uses the last column as
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 26/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
When modeling a linear regression, you can apply different functions to the predictive variables. As the second input to
fitlm , you can use one of the predefined models or you can specify a model by providing a formula in Wilkinson–Rogers
notation.
"linear" Intercept and linear terms + Include "y ~ x1+x2" includes the intercept
for each predictor. this term. term, x1, and x2:
y = c0 + c1 x1 + c2 x2
"interactions" Intercept, linear terms, and
all products of pairs of - Exclude "y ~ x1+x2-1" excludes the intercept
distinct predictors (no this term. term:
squared terms). y = c1 x1 + c2 x2
"quadratic" Intercept, linear terms,
* Include "y ~ x1*x2" includes the intercept
interactions, and squared
product term, x1, x2, and x1*x2:
terms.
and all y = c0 + c1 x1 + c2 x2 + c3 x1 x2
lower-
order
terms.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 27/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
If your data is stored in numeric arrays instead of a single table, you can provide the fitlm function
the predictor and response values as numeric arrays.
Outputs Inputs
mdl Linear regression model X Predictor values, specified as a
variable. numeric matrix.
Each column of the predictor matrix X is treated as one predictor variable. By default, fitlm will fit a
model with an intercept and a linear term for each predictor (column).
To fit a different regression formula, you have two options. You can store the predictors and the
response in a table and provide the model specification separately. Alternatively, you can create a
matrix with a column for each term in the regression formula. This matrix is called a design matrix.
Stepwise Fitting
Fitting Stepwise Linear Regression
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 28/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Outputs Inputs
stepwiseMdl A linear model variable data A table containing the
containing the coefficients data used to fit the
and other information regression model. See
about the model. below for details.
As with fitlm , stepwiselm uses the last column of data as the response and all other columns as predictors.
stepwiselm chooses the model for you. However, you can provide the following inputs to control the
model selection process.
"modelspec" - The second input to the function specifies the starting model. stepwiselm starts
with this model and adds or removes terms based on certain criteria.
"Lower" and "Upper" - If you want to limit the complexity of the model, use these properties.
For example, the following model will definitely contain the intercept and the linear terms but will
not contain any terms with a degree of three or more.
mdl = stepwiselm(data,"Lower","linear","Upper","quadratic")
By default, stepwiselm considers models as simple as a constant term only and as complex as
an interaction model.
stepwiselm iteratively adds and subtracts terms from the starting model, if the modified model is
better than the previous iteration.
“Better” is judged according to the value of the "Criterion" property. The default value is "sse" —
an F-test on the sum of squared error. You can change this measure by setting the "Criterion"
property.
mdl = stepwiselm(data,"Criterion","rsquared")
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 29/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
In linear regression, the coefficients are chosen by minimizing the mean squared error (MSE). Mean
squared error is the squared difference between the observed and the predicted response value.
This penalty term is composed of the fit coefficient values and a tuning parameter λ. The larger the
value of λ, the greater the penalty and, therefore, the more the coefficients are “shrunk” towards zero.
The difference between the two methods is how the penalty term is calculated. Ridge regression uses
an L2 norm of the coefficients. Lasso uses an L1 norm.
Lasso can be used as a form of feature selection, however feature selection may not be appropriate
for cases with similar, highly correlated variables. It may result in loss of information which could
impact accuracy and the interpretation of results. Ridge regression maintains all features, but the
model may still be very complex if there is a large number of predictors.
Elastic Net
You can also use a penalty term that uses a weighted average of both. This is elastic net regression,
which introduces another parameter – the weighting between ridge (L2 norm) and lasso (L1 norm).
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 30/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
You can use the function ridge to fit a ridge regression model.
>> b = ridge(y,X,lambda,scaled)
Outputs Inputs
b Ridge regression y Response values,
coefficients. specified as a vector.
X Predictor values,
specified as a
numeric matrix.
lambda Regularization
parameter.
Notes
The matrix X is a numeric design matrix with columns representing the terms in the regression
formula. If the original data contains two predictive variables x1 and x2, but the desired regression
model formula contains the terms x1, x2, and x1*x2, the matrix X should have 3 columns: x1, x2, and
x1*x2.
The ridge parameter lambda is a non-negative number. In a later section, you will try to estimate the
optimum value of lambda .
The ridge function normalizes the predictors before fitting the model. Therefore, by default, the
regression coefficients correspond to the normalized data. Set the scaled flag to 0 to restore the
coefficients to the scale of the original data.
When using a ridge regression model for predictions, you will want the regression coefficients in the
scale of the original data. In this case (the scaled flag set to 0 ), the coefficient vector b will contain
n+1 coefficients for a model with n predictors. The first element of b corresponds with the intercept
term.
You can predict the response by multiplying the matrix containing the predictors and the last n
elements of the coefficient vector. Add the first element of the coefficient vector to incorporate the
intercept in the calculation.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 31/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
When modeling data using ridge regression, how can you determine the regression parameter,
lambda ?
b = ridge(y,X,lambda,scaled);
You can now use each column of the matrix b as regression coefficients and predict the response.
The response yPred is a matrix where each column is the predicted response for the corresponding
value of lambda . You can use yPred to calculate the mean squared error (MSE), and choose the
coefficients which minimize MSE.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 32/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Lasso (Least Absolute Shrinkage and Selection Operator) regression models are fit using the lasso
function.
Outputs Inputs
b Lasso coefficients. X Predictor values,
specified as a
fitInfo A structure
numeric matrix.
containing
information about y Response values,
the model. specified as a
vector.
lambda Regularization
parameter value.
Notes
Like in ridge, the matrix X is a design matrix with columns representing the terms in the regression
formula.
The "Lambda" property is optional. If not specified, lasso uses a geometric sequence of
λ values based on the data.
ridge and lasso implement their penalty terms slightly differently, and as a result, use different
scalings for lambda . To use λ values in lasso which have the same interpretation as for ridge ,
scale lambda in lasso by the number of observations.
Use the optional property "Alpha" with a value between 0 and 1 to create an elastic net. Recall that
elastic net regression uses a penalty term which is a weighted average of the ridge (L2) and lasso
(L1) penalty terms. "Alpha" values near 1 are closer to lasso, and "Alpha" values near 0 are closer
to ridge.
You can predict the response by multiplying the matrix containing the predictors by the coefficient
vector.
Note that the intercept term is not included in the output coefficients. Instead, it is a field in the output
structure fitInfo .
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 33/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
Linear regression techniques like fitlm are parametric, which means predictions are
based on a finite set of parameters that are estimated from the training data.
You may not need a model with a specific interpretable formula if its primary purpose
is simply to predict the response for unknown observations. In this case, you can use
a nonparametric model.
Support vector machines (SVMs), decision trees, and neural networks are some of the nonparametric
techniques you can use for regression.
These techniques are covered in detail in the Classification Methods chapter. You can find information
specific to regression in the documentation.
Outputs Inputs
mdl A GPR model variable. data A table containing
the predictor and
response values.
Use the "KernelFunction" property to change the kernel to one of the predefined options.
mdl = fitrgp(data,"ResponseVarName","KernelFunction","exponential")
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 34/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
In addition to the predicted response values, the predict function for GPR models can also return
the standard deviation and prediction intervals for the predicted values.
Outputs Inputs
yPred Predicted response value(s). mdl A GPR model variable.
yStd Standard deviation for each predicted dataNew Predictor values for one or more new
value. observations.
You can change the significance level of the prediction intervals by setting the "Alpha" property to a
value between 0 and 1. The default value is 0.05.
[yPred,yStd,yInt] = predict(mdl,dataNew,"Alpha",0.01)
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 35/36
2/21/24, 5:59 PM Machine Learning with MATLAB - Quick Reference
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlml&language=en&release=R2023a 36/36