0% found this document useful (0 votes)
6 views

Capstone Notes-Model

Uploaded by

ANIL
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Capstone Notes-Model

Uploaded by

ANIL
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Cricket Win Prediction –

Capstone Project Notes-2

This document helps in building strategy based on the models build


on the data set

Windows User
5/15/2022
Table of Contents
List of Figures:.......................................................................................................................................................................2
1. Model building and interpretation................................................................................................................................3
a) Build various models (You can choose to build models for either or all of descriptive, predictive or prescriptive
purposes):......................................................................................................................................................................... 3
a) Logistic Regression Model using Sklearn:..............................................................................................................3
b) Test your predictive model against the test set using various appropriate performance metrics............................5
c) Interpretation of the model(s):.................................................................................................................................6
2). Model Tuning and business implication...........................................................................................................................6
a) and b) Ensemble modeling, wherever applicable:.....................................................................................................6
i) Decision Tree Model:.............................................................................................................................................6
ii) Random Forest Model:..........................................................................................................................................8
iii) ANN Model:.........................................................................................................................................................10
c) Interpretation of the most optimum model and its implication on the business....................................................13
Strategy for the upcoming matches....................................................................................................................................13
 1 Test match with England in England. All the match are day matches. In England, it will be rainy season at the
time to match..................................................................................................................................................................13
 Output of the Model:..........................................................................................................................................15
 2 T20 match with Australia in India. All the match are Day and Night matches. In India, it will be winter season at
the time to match............................................................................................................................................................15
 Output of the Model:..........................................................................................................................................16
 2 ODI match with Sri Lanka in India. All the match are Day and Night matches. In India, it will be winter season at
the time to match............................................................................................................................................................17
 Output of the Model:..........................................................................................................................................18

1
List of Figures:

Figure 1 – Logistic Model – “GridSearchCV” Method 3


Figure 2 – Logistic Model – Best model Parameters 3
Figure 3 – Logistic Model – Confusion Matrix of Train Data 4
Figure 4 – Logistic Model – Classification Report of Train Data 4
Figure 5 – Logistic Model – ROC Curve of Train Data 4
Figure 6 – Logistic Model – Confusion Matrix of Test Data 5
Figure 7 – Logistic Model – Classification Report of Test Data 5
Figure 8 – Logistic Model – ROC Curve of Test Data 5
Figure 9 – Decision Tree Model – “GridSearchCV” Method 6
Figure 10 – Decision Tree Model – Best model Parameters 6
Figure 11 – Decision Tree Model – Confusion Matrix of Train Data 6
Figure 12 – Decision Tree Model – Classification Report of Train Data 7
Figure 13 – Decision Tree Model – ROC Curve of Train Data 7
Figure 14 – Decision Tree Model – Confusion Matrix of Test Data 7
Figure 15 – Decision Tree Model – Classification Report of Test Data 8
Figure 16 – Decision Tree Model – ROC Curve of Test Data 8
Figure 17 – Random Forest Model – “GridSearchCV” Method 8
Figure 18 – Random Forest Model – Confusion Matrix of Train Data 9
Figure 19 – Random Forest Model – Classification Report of Train Data 9
Figure 20 – Random Forest Model – ROC Curve of Train Data 9
Figure 21 – Random Forest Model – Confusion Matrix of Test Data 9
Figure 22 – Random Forest Model – Classification Report of Test Data 10
Figure 23 – Random Forest Model – ROC Curve of Test Data 10
Figure 24 – ANN Model – “GridSearchCV” Method 10
Figure 25 – ANN Model – Confusion Matrix of Train Data 11
Figure 26 – ANN Model – Classification Report of Train Data 11
Figure 27 – ANN Model – ROC Curve of Train Data 11
Figure 28 – ANN Model – Confusion Matrix of Test Data 12
Figure 29 – ANN Model – Classification Report of Test Data 12
Figure 30 – ANN Model – ROC Curve of Test Data 12
Figure 31 – Model Metrics Comparison between Models 13
Figure 32 – Dependent Variables used in Model building 14
Figure 33 – Actual Test Data to predict the Winning Strategy against England 14
Figure 34 – Winning Strategy against England for Test Match 15
Figure 35 – Actual Test Data to predict the Winning Strategy against Australia 16
Figure 36 – Winning Strategy against England for Test Match 16
Figure 37 – Actual Test Data to predict the Winning Strategy against Srilanka 17
Figure 38 – Winning Strategy against Srilanka for ODI Match 18

2
3
1. Model building and interpretation
a) Build various models (You can choose to build models for either or all of descriptive,
predictive or prescriptive purposes):
PRECAP:

a) In continuation with Notes-1, in this notes we will be creating a model that predict the performance of Team
Indai against the Opponents.
b) Based on the inputs from the EDA performed, it is decided to remove the unwanted variables likes “Game
Number, ‘Wicket_Keeper” and “ Audience_Number” (Based on the Boxplot and EDA Analysis, we found
audience number has no considerable impact on the Result.)
c) So in this section we will build four models “Decision Tree, Random Forest, ANN and Logistic Regression(both
sklearn and stats) and will evaluate the best model based on the model metrics.
d) All the ‘Object’ variables are encoded using ‘One hot encoding method’ and the target variable is encoded using
‘Label Encoder’ method.
e) For the model building, performed train test split is done in the ratio of 70:30

Imp Note: I have built the model on the data without splitting the dataset based on the Match format type.
This is because, in one of the problem statements, it asked to provide the winning strategy of team India
against Australia in T20. But as per the source data set we don’t have any records of India playing with
India so splitting the data set based on format wise will not give the accurate the results. So, build model
without splitting the data.

a) Logistic Regression Model using Sklearn:


A logistic Regression Model is built on the train data. Following are the parameters used:

penalty':['elasticnet','l2','none'], 'solver':['newton-cg', 'saga'], 'tol':[0.001,0.00001]

A GridSearchCV method is applied to find the best model,

Figure 1 – Logistic Model – “GridSearchCV” Method

Using the best Params, found the best model and below are the best parameters. L2-penalty, saga-solver and
tolerance of 1e-05 are the best parameters and the prediction is made using this model.

4
Figure 2 – Logistic Model – Best model Parameters

Performance of the Logistic Model on the Train Data:


Confusion Matrix:

Figure 3 – Logistic Model – Confusion Matrix of Train Data

Classification Report:

The model has an accuracy of 87% on the train data. Correspondingly, precision= 0.88, recall =0.98, f1= 0.9

Figure 4 – Logistic Model – Classification Report of Train Data

AUC and ROC Curve:

The AUC of the model on Train data is 84.36% on the train data. Below is the ROC curve of the Train data

Figure 5 – Logistic Model – ROC Curve of Train Data

5
b) Test your predictive model against the test set using various appropriate
performance metrics

Performance of the Logistic Model on the Test Data:


Confusion Matrix:

Figure 6 – Logistic Model – Confusion Matrix of Test Data

Classification Report:

The model has an accuracy of 87% on the train data. Correspondingly, precision= 0.89, recall =0.97, f1= 0.93

Figure 7 – Logistic Model – Classification Report of Test Data

AUC and ROC Curve:

The AUC of the model on Train data is 84.32% on the test data. Below is the ROC curve of the Test data

Figure 8 – Logistic Model – ROC Curve of Test Data

6
c) Interpretation of the model(s):
Based on the metrics, from train and test data looks like model is stable with an accuracy of 87%. So model doesn’t
look like overfit or underfit. Even the precision values are high with 88%. Hence the model seems good. But we can
cross validate the metrics by building some more models.

2). Model Tuning and business implication


a) and b) Ensemble modeling, wherever applicable:
To validate the logistic model, another three models have been built and model metrics are compared. Three
models are

(a) Decision Tree Model


(b) Random Forest Model
(c) Artificial Neural Network (ANN) Model

i) Decision Tree Model:


A Decision Tree Model is built on the train data. Following are the parameters used:

'criterion': ['gini'], 'max_depth': [10,20,30,50], 'min_samples_leaf': [50,100,150],

'min_samples_split': [150,300,450],

A GridSearchCV method is applied to find the best model,

Figure 9 – Decision Tree Model – “GridSearchCV” Method

After multiple iterations best Parameters are identified to build the model and below are the best parameters..

Figure 10 – Decision Tree Model – Best model Parameters

Performance of the Decision Tree Model on the Train Data:


Confusion Matrix:

Figure 11 – Decision Tree Model – Confusion Matrix of Train Data


7
Classification Report:

The model has an accuracy of 85% on the train data. Correspondingly, precision= 0.87, recall =0.97, f1= 0.91

Figure 12 – Decision Tree Model – Classification Report of Train Data

AUC and ROC Curve:

The AUC of the model on Train data is 78.70% on the train data. Below is the ROC curve of the Train data

Figure 13 – Decision Tree Model – ROC Curve of Train Data

Performance of the Decision Tree Model on the Test Data:


Confusion Matrix:

Figure 14 – Decision Tree Model – Confusion Matrix of Test Data

Classification Report:

The model has an accuracy of 84% on the train data. Correspondingly, precision= 0.86, recall =0.97, f1= 0.91

8
Figure 15 – Decision Tree Model – Classification Report of Test Data

AUC and ROC Curve:

The AUC of the model on Train data is 75.40% on the Test data. Below is the ROC curve of the Test data

Figure 16 – Decision Tree Model – ROC Curve of Test Data

ii) Random Forest Model:


A Random Forest Model is built on the train data. Following are the parameters used:

'max_depth': [4,5], 'max_features': [2,3], 'min_samples_leaf': [8,9], 'min_samples_split': [46,50],

'n_estimators': [290]. A GridSearchCV method is applied to find the best model,

Figure 17 – Random Forest Model – “GridSearchCV” Method

9
Performance of the Random Forest Model on the Train Data:
Confusion Matrix:

Figure 18 – Random Forest Model – Confusion Matrix of Train Data

Classification Report:

The model has an accuracy of 84% on the train data. Correspondingly, precision= 0.84, recall =1, f1= 0.91

Figure 19 – Random Forest Model – Classification Report of Train Data

AUC and ROC Curve:

The AUC of the model on Train data is 84.98% on the train data. Below is the ROC curve of the Train data

Figure 20 – Random Forest Model – ROC Curve of Train Data

Performance of the Random Forest on the Test Data:


Confusion Matrix:

Figure 21 – Random Forest Model – Confusion Matrix of Test Data

10
Classification Report:

The model has an accuracy of 83% on the train data. Correspondingly, precision= 0.83, recall =1, f1= 0.91

Figure 22 – Random Forest Model – Classification Report of Test Data

AUC and ROC Curve:

The AUC of the model on Train data is 83.11% on the Test data. Below is the ROC curve of the Test data

Figure 23 – Random Forest Model – ROC Curve of Test Data

iii) ANN Model:


An ANN Model is built on the train data. Following are the parameters used:

'hidden_layer_sizes': [50,100,200], 'max_iter': [2500,3000,4000], 'solver': ['adam'],

'tol': [0.01], A GridSearchCV method is applied to find the best model,

Figure 24 – ANN Model – “GridSearchCV” Method

Performance of the ANN Model on the Train Data:


Confusion Matrix:
11
Figure 25 – ANN Model – Confusion Matrix of Train Data

Classification Report:

The model has an accuracy of 87% on the train data. Correspondingly, precision= 0.88, recall =0.98, f1= 0.93

Figure 26 – ANN Model – Classification Report of Train Data

AUC and ROC Curve:

The AUC of the model on Train data is 84.30% on the train data. Below is the ROC curve of the Train data

Figure 27 – ANN Model – ROC Curve of Train Data

12
Performance of the ANN Model on the Test Data:
Confusion Matrix:

Figure 28 – ANN Model – Confusion Matrix of Test Data

Classification Report:

The model has an accuracy of 84% on the train data. Correspondingly, precision= 0.89, recall =0.97, f1= 0.93

Figure 29 – ANN Model – Classification Report of Test Data

AUC and ROC Curve:

The AUC of the model on Train data is 84.07% on the train data. Below is the ROC curve of the Train data

Figure 30 – ANN Model – ROC Curve of Test Data

13
c) Interpretation of the most optimum model and its implication on the business
As mentioned above all the models are built on the train and test dataset. The metrics of each model are
compared based on the Accuracy, precision and recall values. Below Figure is the comparison of metrics
between the models.

Figure 31 – Model Metrics Comparison between Models

On comparison, it is observed that Accuracy is high in ANN and Logistic Regression models. Also, the precision is
high in these two models compared to Decision Tree and Random Forest models. So being a binomial target
variable, I have opted for Logistic Regression to build the strategy.

So finally, the Logistic model has

Train Data: Test Data:

a. Accuracy : 0.87 0.87


b. AUC: 0.84 0.84
c. Recall: 0.98 0.97
d. Precision: 0.88 0.89
e. F1 Score: 0.93 0.93

Strategy for the upcoming matches


 1 Test match with England in England. All the match are day matches. In England, it
will be rainy season at the time to match.
To build the strategy against the England, an excel sheet is developed as the actual test data.

In the excel sheet, since the one-hot encoding is done on the object variables, the parameters mentioned in the
problem statement are marked as ‘1’. Rest of them is marked with 0’s and 1’s as per the strategy plan.

So, Variables like :

Opponent_England England 1
Match_Format_Test Test 1
Match_light_Type_Day Day 1
Offshore_Yes England 1
Season_Rainy Rainy 1

Table 1 - Strategy Variables against England

14
The variables used for the model building are show in the below Figure.

Figure 32 – Dependent Variables used in Model building

By fixing the problem variables, rest of the variables are changed to build enough strategy and a csv test file is
built to predict the output using the Logistic Regression model.
Match_li Max_wic Max_wic Max_wic
Max_run Extra_bo Min_run_ Min_run_ Max_run extra_bo player_hi ght_type Match_li Match_fo Bowlers_ Bowlers_ Bowlers_ Bowlers_ All_roun All_roun All_roun First_sel Opponen Opponen Opponen Opponen Opponen Opponen Opponen ket_take ket_take ket_take Players_s Players_s Players_s player_hi player_hi player_hi player_hi
Avg_tea _scored_ wls_bowl given_1o scored_1 _given_1 wls_opp ghest_ru _Day and ght_type Match_fo rmat_Tes in_team_ in_team_ in_team_ in_team_ der_in_t der_in_t der_in_t ection_B t_Bangla t_Englan Opponen t_Pakista t_South t_Srilank t_West t_Zimbab Season_S Season_ Offshore n_1over_ n_1over_ n_1over_ cored_ze cored_ze cored_ze ghest_wi ghest_wi ghest_wi ghest_wi
m_Age 1over ed ver over over onent n Night _Night rmat_T20 t 2.0 3.0 4.0 5.0 eam_2.0 eam_3.0 eam_4.0 owling desh d t_Kenya n Africa a Indies we ummer Winter _Yes 2 3 4 ro_2 ro_3 ro_4 cket_2 cket_3 cket_4 cket_5
30 11 24 3 2 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0
50 18 22 2 3 12 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0
50 20 27 2 2 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0
50 15 10 2 2 10 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0
50 19 10 6 4 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 0 0
50 13 8 1 3 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0
50 20 6 0 3 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0
50 15 10 3 2 10 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0
50 20 17 2 3 17 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0
50 20 27 2 2 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0

Figure 33 – Actual Test Data to predict the Winning Strategy against England

15
 Output of the Model:
The model has predicted the output given an array of results. The value ‘1’ represents Team India wins and ‘0’
represents loss.

Ma tch_l i g
ht_type_D Ma tch_l i g Opponent Opponent Opponent Opponent Bowl ers _i Bowl ers _i Bowl ers _i Bowl ers _i Al l _round Al l _round Al l _round Fi rs t_s el ec Ma x_run_s Mi n_run_s extra _bow Ma x_wi cke Ma x_wi cke Ma x_wi cke Pl a yers _s c Pl a yers _s c Pl a yers _s c pl a yer_hi g pl a yer_hi g pl a yer_hi g pl a yer_hi g
Res ul ts _Pr a y a nd ht_type_Ni Ma tch_for Ma tch_for _Ba ngl a de Opponent Opponent Opponent _South Opponent _Wes t _Zi mba bw Sea s on_Su Sea s on_W Offs hore_Y n_tea m_2. n_tea m_3. n_tea m_4. n_tea m_5. er_i n_tea er_i n_tea er_i n_tea ti on_Bowl i Extra _bowl Avg_tea m_ cored_1ov Mi n_run_g cored_1ov Ma x_run_g l s _oppone pl a yer_hi g t_ta ken_1 t_ta ken_1 t_ta ken_1 ored_zero ored_zero ored_zero hes t_wi ck hes t_wi ck hes t_wi ck hes t_wi ck
ed Ni ght ght ma t_T20 ma t_Tes t s h _Engl a nd _Kenya _Pa ki s ta n Afri ca _Sri l a nka Indi es e mmer i nter es 0 0 0 0 m_2.0 m_3.0 m_4.0 ng s _bowl ed Age er i ven_1over er i ven_1over nt hes t_run over_2 over_3 over_4 _2 _3 _4 et_2 et_3 et_4 et_5
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 24 30 11 3 2 6 15 10 0 0 1 0 0 1 0 0 1 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 22 50 18 2 3 12 15 10 0 0 1 0 1 0 0 0 1 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 27 50 20 2 2 6 15 10 0 0 1 1 0 0 0 0 1 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 10 50 15 2 2 10 15 10 0 1 0 1 0 0 1 0 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 10 50 19 6 4 6 15 10 0 0 1 1 0 0 0 1 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 8 50 13 1 3 6 15 10 0 0 0 1 0 0 1 0 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 6 50 20 0 3 6 15 10 0 1 0 0 0 1 1 0 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 10 50 15 3 2 10 15 10 0 0 1 0 1 0 1 0 0 0
0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 17 50 20 2 3 17 15 10 0 0 0 1 0 0 0 1 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 27 50 20 2 2 6 15 10 0 0 1 1 0 0 0 0 1 0

Figure 34 – Winning Strategy against England for Test Match

So Team Dynamics should be as follows:

 Bowlers_in_team_3.0: 1 – There should be at least three bowlers in the team


 All_rounder_in_team_4.0: 1 – Team should have 4 all rounder’s
 First_selection_Bowling : 1 - Team should opt Bowling first
 Extra_bowls_bowled: 24 - Team should limit the extra bowls bowled to 24
 Avg_team_Age:30 – Team average age should be 30

 2 T20 match with Australia in India. All the match are Day and Night matches. In
India, it will be winter season at the time to match..
To build the strategy against the Australia, an excel sheet is developed as the actual test data.

In the excel sheet, since the one-hot encoding is done on the object variables, the parameters mentioned in the
problem statement are marked as ‘1’. Rest of them is marked with 0’s and 1’s as per the strategy plan.

So, Variables like :

Opponent_England Australia 1
Match_Format_T20 T20 1
Match_light_Type_Day Day and Night 1
Offshore_Yes India 0
Season_Winter Winter 1

Table 2 - Strategy Variables against Australia

By fixing the problem variables, rest of the variables are changed to build enough strategy and a csv test file is
built to predict the output using the Logistic Regression model.

16
Match_li
Max_run Extra_bo Min_run_ Min_run_ Max_run extra_bo player_hi ght_type Match_li Match_fo Bowlers_ Bowlers_ Bowlers_ All_roun All_roun All_roun First_sel Opponen Opponen Opponen Opponen Opponen Opponen Opponen
Avg_tea _scored_ wls_bowl given_1o scored_1 _given_1 wls_opp ghest_ru _Day and ght_type Match_fo rmat_Tes Bowlers_in_te in_team_ in_team_ in_team_ der_in_t der_in_t der_in_t ection_B t_Bangla t_Englan Opponen t_Pakista t_South t_Srilank t_West t_Zimbab Season_S Season_
m_Age 1over ed ver over over onent n Night _Night rmat_T20 t am_2.0 3.0 4.0 5.0 eam_2.0 eam_3.0 eam_4.0 owling desh d t_Kenya n Africa a Indies we ummer Winter
30 24 31 0 2 29 10 83 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 12 6 3 2 6 4 48 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1
30 17 20 6 3 6 0 60 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
30 16 5 1 3 6 3 62 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1
30 13 6 2 3 6 2 93 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1
30 22 21 3 3 6 16 55 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
30 13 10 0 1 6 3 80 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 12 6 2 3 6 3 42 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1
30 12 12 2 3 6 0 66 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
30 11 1 5 3 6 0 32 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 14 9 2 2 7 7 87 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
30 12 35 2 2 9 8 39 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 23 28 0 3 26 15 69 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
30 11 5 3 2 6 2 95 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 12 4 2 2 6 2 83 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1

Figure 35 – Actual Test Data to predict the Winning Strategy against Australia

 Output of the Model:


The model has predicted the output given an array of results. The value ‘1’ represents Team India wins and ‘0’
represents loss. The model prediction has combination of both 1’s and 0’s as highlighted in the figure.

Result Avg_team_Age
Max_run_scored_1over
Extra_bowls_bowled
Min_run_given_1over
Min_run_scored_1over
Max_run_given_1over
extra_bowls_opponent
player_highest_run
Match_light_type_Day
Match_light_type_Night
Match_format_T20
and Night
Match_format_Test
Bowlers_in_team_2.0
Bowlers_in_team_3.0
Bowlers_in_team_4.0
Bowlers_in_team_5.0
All_rounder_in_team_2.0
All_rounder_in_team_3.0
All_rounder_in_team_4.0
First_selection_Bowling
Opponent_Bangladesh
Opponent_England
Opponent_Kenya
Opponent_Pakistan
Opponent_South
Opponent_Srilanka
Africa
Opponent_West
Opponent_Zimbabwe
Indies
Season_Summer
Season_Winter
Offshore_Yes
Max_wicket_taken_1over_2
Max_wicket_taken_1over_3
Max_wicket_taken_1over_4
Players_scored_zero_2
Players_scored_zero_3
Players_scored_zero_4
player_highest_wicket_2
player_highest_wicket_3
player_highest_wicket_4
player_highest_wicket_5
1 30 24 31 0 2 29 10 83 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1
1 30 12 6 3 2 6 4 48 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0
1 30 17 20 6 3 6 0 60 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0
1 30 16 5 1 3 6 3 62 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0
1 30 13 6 2 3 6 2 93 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
1 30 22 21 3 3 6 16 55 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1
1 30 13 10 0 1 6 3 80 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0
1 30 12 6 2 3 6 3 42 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
1 30 12 12 2 3 6 0 66 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0
1 30 11 1 5 3 6 0 32 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0
1 30 14 9 2 2 7 7 87 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0
1 30 12 35 2 2 9 8 39 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0
1 30 23 28 0 3 26 15 69 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0
1 30 11 5 3 2 6 2 95 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0

Figure 36 – Winning Strategy against England for Test Match

So Team Dynamics should be as follows:

Strategy for T20 match 1

 Bowlers_in_team_3.0: 1 – There should be at least three bowlers in the team


 All_rounder_in_team_2.0: 1 – Team should have 2 all rounder’s
 First_selection_Bowling: 1 - Team should opt Bowling first
 Extra_bowls_bowled: 6 - Team should limit the extra bowls bowled to 6

17
 Avg_team_Age:30 – Team average age should be 30
 player_highest_run:48 – Player Highest run should be above 40
 Players_scored_zero_2: 1 – Duck out’s should be at most 2

 Strategy for T20 match 2


 Bowlers_in_team_4.0: 1 – There should be at least four bowlers in the team
 All_rounder_in_team_3.0: 1 – Team should have 3 all rounder’s
 First_selection_Bowling: 0 - Team should opt Batting first
 Extra_bowls_bowled: 9 - Team should limit the extra bowls bowled to 9
 Avg_team_Age:30 – Team average age should be 30
 player_highest_run:33 – Player Highest run should be above 30
 Players_scored_zero_3: 1 – Duck out’s should be at most 3

 2 ODI match with Sri Lanka in India. All the match are Day and Night matches. In
India, it will be winter season at the time to match.
To build the strategy against the Srilanka, an excel sheet is developed as the actual test data.

In the excel sheet, since the one-hot encoding is done on the object variables, the parameters mentioned in the
problem statement are marked as ‘1’. Rest of them is marked with 0’s and 1’s as per the strategy plan.

So, Variables like :

Opponent_Srilanka Srilanka 1
Match_Format_ODI ODI 1
Match_light_Type_Day_Night Day and Night 1
Offshore_Yes India 0
Season_Winter Winter 1

Table 3 - Strategy Variables against Srilanka

By fixing the problem variables, rest of the variables are changed to build enough strategy and a csv test file is
built to predict the output using the Logistic Regression model.
Match_li Max_wic Max_wic Max_wic
Max_run Extra_bo Min_run_ Min_run_ Max_run extra_bo player_hi ght_type Match_li Match_fo Bowlers_ Bowlers_ Bowlers_ Bowlers_ All_roun All_roun All_roun First_sel Opponen Opponen Opponen Opponen Opponen Opponen Opponen ket_take ket_take ket_take Players_s Players_s Players_s player_hi player_hi player_hi player_high
Avg_tea _scored_ wls_bowl given_1o scored_1 _given_1 wls_opp ghest_ru _Day and ght_type Match_fo rmat_Tes in_team_ in_team_ in_team_ in_team_ der_in_t der_in_t der_in_t ection_B t_Bangla t_Englan Opponen t_Pakista t_South t_Srilank t_West t_Zimbab Season_S Season_ Offshore n_1over_ n_1over_ n_1over_ cored_ze cored_ze cored_ze ghest_wi ghest_wi ghest_wi est_wicket_
m_Age 1over ed ver over over onent n Night _Night rmat_T20 t 2.0 3.0 4.0 5.0 eam_2.0 eam_3.0 eam_4.0 owling desh d t_Kenya n Africa a Indies we ummer Winter _Yes 2 3 4 ro_2 ro_3 ro_4 cket_2 cket_3 cket_4 5
30 11 9 3 3 9 7 82 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0
30 20 3 4 2 6 2 45 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
30 14 7 4 3 7 7 60 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
30 12 1 3 3 6 0 75 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
30 15 11 3 3 10 8 76 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0
30 13 6 2 4 6 2 91 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
30 14 8 5 3 6 3 62 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
30 14 2 3 3 6 1 57 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0
30 11 11 2 4 8 7 81 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0
30 14 5 2 3 6 2 71 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
30 15 1 2 3 6 0 45 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
30 12 1 3 3 6 0 75 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
30 12 1 2 2 6 1 59 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
30 14 7 4 3 7 7 60 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
30 14 7 1 3 6 2 96 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
30 12 10 2 3 10 0 79 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0
30 12 4 2 3 6 0 61 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
30 20 5 2 3 6 3 94 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
30 11 10 2 3 10 8 89 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0
30 11 11 2 4 8 7 81 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0
30 13 6 2 4 6 2 91 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0

Figure 37 – Actual Test Data to predict the Winning Strategy against Srilanka

18
 Output of the Model:
The model has predicted the output given an array of results. The value ‘1’ represents Team India wins and ‘0’
represents loss. The model prediction has combination of both 1’s and 0’s as highlighted in the figure.

Results_Pred
Avg_team_Age
Max_run_scored_1over
Extra_bowls_bowled
Min_run_given_1over
Min_run_scored_1over
Max_run_given_1over
extra_bowls_opponent
player_highest_run
Match_light_type_Day
Match_light_type_Night
Match_format_T20
and NightMatch_format_Test
Bowlers_in_team_2.0
Bowlers_in_team_3.0
Bowlers_in_team_4.0
Bowlers_in_team_5.0
All_rounder_in_team_2.0
All_rounder_in_team_3.0
All_rounder_in_team_4.0
First_selection_Bowling
Opponent_Bangladesh
Opponent_England
Opponent_Kenya
Opponent_Pakistan
Opponent_South
Opponent_Srilanka
Africa
Opponent_West
Opponent_Zimbabwe
Indies
Season_Summer
Season_Winter
Offshore_Yes
Max_wicket_taken_1over_2
Max_wicket_taken_1over_3
Max_wicket_taken_1over_4
Players_scored_zero_2
Players_scored_zero_3
Players_scored_zero_4
player_highest_wicket_2
player_highest_wicket_3
player_highest_wicket_4
player_highest_wicket_5
1 30 11 9 3 3 9 7 82 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0
1 30 20 3 4 2 6 2 45 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
1 30 14 7 4 3 7 7 60 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
1 30 12 1 3 3 6 0 75 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
1 30 15 11 3 3 10 8 76 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0
1 30 13 6 2 4 6 2 91 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
1 30 14 8 5 3 6 3 62 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
1 30 14 2 3 3 6 1 57 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0
1 30 11 11 2 4 8 7 81 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0
0 30 14 5 2 3 6 2 71 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 30 15 1 2 3 6 0 45 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
1 30 12 1 3 3 6 0 75 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
1 30 12 1 2 2 6 1 59 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
1 30 14 7 4 3 7 7 60 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
1 30 14 7 1 3 6 2 96 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
1 30 12 10 2 3 10 0 79 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0
1 30 12 4 2 3 6 0 61 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
1 30 20 5 2 3 6 3 94 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
1 30 11 10 2 3 10 8 89 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0
1 30 11 11 2 4 8 7 81 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0
1 30 13 6 2 4 6 2 91 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0

Figure 38 – Winning Strategy against Srilanka for ODI Match

So Team Dynamics should be as follows:

Strategy for ODI match 1

 Bowlers_in_team_3.0: 1 – There should be at least three bowlers in the team


 All_rounder_in_team_2.0: 1 – Team should have 2 all rounder’s
 First_selection_Bowling: 1 - Team should opt Bowling first
 Extra_bowls_bowled: 6 - Team should limit the extra bowls bowled to 6
 Avg_team_Age:30 – Team average age should be 30
 player_highest_run:44 – Player Highest run should be above 40
 Players_scored_zero_3: 1 – Duck out’s should be at most 3

 Strategy for T20 match 2


 Bowlers_in_team_4.0: 1 – There should be at least four bowlers in the team
 All_rounder_in_team_3.0: 1 – Team should have 3 all rounder’s
 First_selection_Bowling: 0 - Team should opt Batting first
 Extra_bowls_bowled: 9 - Team should limit the extra bowls bowled to 9
 Avg_team_Age:30 – Team average age should be 30
 player_highest_run:82 – Player Highest run should be above 80
 Players_scored_zero_2: 1 – Duck out’s should be at most 2

19

You might also like