Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch. 3. Logistic Regression Classifier using gradient descent #147

Closed
nishamuktewar opened this issue Feb 4, 2021 · 2 comments
Closed

ch. 3. Logistic Regression Classifier using gradient descent #147

nishamuktewar opened this issue Feb 4, 2021 · 2 comments

Comments

@nishamuktewar
Copy link

Thank you for writing this wonderful book. It has allowed me to recollect all my learnings from "The Introduction to Statistical Learning" book plus learn some practical insights.

That said, one of the examples has caused some confusion regarding how the cost or error (or loss) function is computed and then applied to calculate the weights. My understanding is that cost, loss or error functions are essentially one and the same but in this particular case they are likely used differently. Looking at the implementation of the logistic regression algorithm raises a few questions:

  1. Under the "fit()" method the "errors" and "cost_" is calculated differently. The "cost_" calculation seems correct.
  2. Further, the code depends on "errors" to update the weights - "w_" . why is that?
  3. Now if we were to regularize by adding L2 regularization term (as suggested on Page 74, second edition), yes it will affect the "cost_" but won't shrink the weights under the given implementation

Screen Shot 2021-02-04 at 3 30 20 PM

Can you share your thoughts?

@rasbt
Copy link
Owner

rasbt commented Feb 5, 2021

cost or error (or loss) function

Nowadays, they are indeed mostly used synonymously.

Under the "fit()" method the "errors" and "cost_" is calculated differently. The "cost_" calculation seems correct.

I was using the traditional "terminology" where the "error" is the difference between the label and the prediction.

Further, the code depends on "errors" to update the weights - "w_" . why is that?

Good question, it's not very obvious from looking at it in this code, but that's because that's derivative of the loss (or cost) function. I have summarized it on the slide below:

Screen Shot 2021-02-04 at 6 08 14 PM

Now if we were to regularize by adding L2 regularization term (as suggested on Page 74, second edition), yes it will affect the "cost_" but won't shrink the weights under the given implementation

Yeah, good point. You will have to update the weight update, too. So, instead of

self.w_[1:] += self.eta * (X.T.dot(errors))

it can be changed to

self.w_[1:] += self.eta * (X.T.dot(errors) - self.l2_lambda * self.w_[1:])

@nishamuktewar
Copy link
Author

Fantastic, that makes sense! thank you for clarifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants