0% found this document useful (0 votes)
39 views

Why Do We Use Cross Validation Set in Our Models?

Cross validation is used to test a model on data it has not been trained on to evaluate how well the model will perform on new data. It helps prevent overfitting by using all data for both training and testing. K-fold cross validation splits the data into mutually exclusive folds, with each fold used once for testing while the rest are used for training to evaluate the model's ability to generalize to new data.

Uploaded by

vinay kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Why Do We Use Cross Validation Set in Our Models?

Cross validation is used to test a model on data it has not been trained on to evaluate how well the model will perform on new data. It helps prevent overfitting by using all data for both training and testing. K-fold cross validation splits the data into mutually exclusive folds, with each fold used once for testing while the rest are used for training to evaluate the model's ability to generalize to new data.

Uploaded by

vinay kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

1. Why do we use cross validation set in our models?

Initially while building a model we will train our model based on the known or past data, so
once we train our data we need to know how well our model is working, if we apply our
model with the train data or known data our model will works fine, but in reality we will
encounter with different data which is different from train data and our model may not
work well, in order to avoid these and to build our model more efficiently we will use cross
validation technique, cross validation technique will test the that it has never seen before.
Cross validation is also used for avoiding the problem of over-fitting, by using cross validation we
are able to use all our dataset for both training and testing.

2. For k fold cross validation, which of the following is/are true?


a) the set of points in each fold are mutually exclusive.
b) cross validation prevents model from overfitting.
c) we use stratified k-fold cross validation in skewed datasets for preserving the percentage
of samples for each class.
d) The default number of folds in sklearn.model_selection.KFold is 3

Answer - a, b, c

3. What is data leakage?


We split our data into train and test data sets and we need to make sure the data present in
training and testing sets should not be similar in order to build our model more accurately, so we
need to ensure that no data is shared between the two. If the data is shared between the trainig
and testing sets then it is called Data Leakage, this results in unreliable and bad prediction
outcomes after model deployment

4.Which train-test based splitting should be used for time series data and why?
TimeSeriesSplit is used to split time series data at fixed intervals, in train-test sets, In time
series we cannot split our data randomly since our observations are not independent so
when dealing with time related data we need to use time based splitting.
5. What are different cross validation techniques used for regression problems?
Leave p out cross-validation
Leave one out cross validation – LOOCV
K-fold cross-validation
Stratified k-fold cross-validation

6. How training and CV scores help you to find an optimum hyperparameter for your
model?
optimization procedure follows these steps:

Split the data into training and testing subsets

Repeat optimization loop a fixed number of times or until a condition is met:

Select a new set of model hyperparameters, then train the data on training subset from selected
parameter

Then apply the model in test data and generate corresponding predictions, then evaluate the
predictions using score metric compare all the score metrics and choose the hyper parameter
that yields the best metric score

You might also like