ch. 3. Logistic Regression Classifier using gradient descent #147

nishamuktewar · 2021-02-04T23:34:57Z

Thank you for writing this wonderful book. It has allowed me to recollect all my learnings from "The Introduction to Statistical Learning" book plus learn some practical insights.

That said, one of the examples has caused some confusion regarding how the cost or error (or loss) function is computed and then applied to calculate the weights. My understanding is that cost, loss or error functions are essentially one and the same but in this particular case they are likely used differently. Looking at the implementation of the logistic regression algorithm raises a few questions:

Under the "fit()" method the "errors" and "cost_" is calculated differently. The "cost_" calculation seems correct.
Further, the code depends on "errors" to update the weights - "w_" . why is that?
Now if we were to regularize by adding L2 regularization term (as suggested on Page 74, second edition), yes it will affect the "cost_" but won't shrink the weights under the given implementation

Can you share your thoughts?

rasbt · 2021-02-05T00:14:27Z

cost or error (or loss) function

Nowadays, they are indeed mostly used synonymously.

Under the "fit()" method the "errors" and "cost_" is calculated differently. The "cost_" calculation seems correct.

I was using the traditional "terminology" where the "error" is the difference between the label and the prediction.

Further, the code depends on "errors" to update the weights - "w_" . why is that?

Good question, it's not very obvious from looking at it in this code, but that's because that's derivative of the loss (or cost) function. I have summarized it on the slide below:

Now if we were to regularize by adding L2 regularization term (as suggested on Page 74, second edition), yes it will affect the "cost_" but won't shrink the weights under the given implementation

Yeah, good point. You will have to update the weight update, too. So, instead of

self.w_[1:] += self.eta * (X.T.dot(errors))

it can be changed to

self.w_[1:] += self.eta * (X.T.dot(errors) - self.l2_lambda * self.w_[1:])

nishamuktewar · 2021-02-05T00:32:09Z

Fantastic, that makes sense! thank you for clarifying.

nishamuktewar closed this as completed Feb 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch. 3. Logistic Regression Classifier using gradient descent #147

ch. 3. Logistic Regression Classifier using gradient descent #147

nishamuktewar commented Feb 4, 2021

rasbt commented Feb 5, 2021 •

edited

Loading

nishamuktewar commented Feb 5, 2021

ch. 3. Logistic Regression Classifier using gradient descent #147

ch. 3. Logistic Regression Classifier using gradient descent #147

Comments

nishamuktewar commented Feb 4, 2021

rasbt commented Feb 5, 2021 • edited Loading

nishamuktewar commented Feb 5, 2021

rasbt commented Feb 5, 2021 •

edited

Loading