Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add double ml notebook and fix a minor typo in dml doc #24

Merged
merged 8 commits into from
Apr 9, 2019

Conversation

heimengqi
Copy link
Contributor

please take a look at my notebook, especially the last part (multi treatment and multi output). Let me know if I am doing anything wrong or anything I need to add.

@vsyrgkanis
Copy link
Collaborator

vsyrgkanis commented Mar 29, 2019

I would change how we plot the cross price elasticities in the final cells and write some more comments in the way that you process the data to fit cross price. For how to depict: can we do something like: we have 3x3 subplots and each subplot contains the cross price elasticity as a function income. This way we can visualize it as a matrix and each matrix contains a plot. Also I would like to have an example that uses the FistaRegressor from here:
http://contrib.scikit-learn.org/lightning/generated/lightning.regression.FistaRegressor.html
and here
https://github.com/scikit-learn-contrib/lightning/blob/master/lightning/impl/fista.py
as our final model and we use 'trace' as the penalty. Similar to what we had in the demo presentation. This would show how to perform nuclear norm penalization for multiple treatments and multiple outcomes. This is important for latent factor models (i.e. the products interact in some low rank latent space).

@kbattocchi
Copy link
Collaborator

@vasilismsr From an economist's point of view, is the nuclear norm the right regularization? Low rank is nice, but it seems like they'd be more likely to try to apply a mixed effects model of some sort (e.g. shrink all individual own-price elasticities towards one value and all cross-price elasticities towards another).

@vsyrgkanis
Copy link
Collaborator

Yeap. Low rank is the right way for high dimensional product spaces. Such latent factor models have been studied in pricing (see athey and blei, or the paper we wrote with taddy).

@kbattocchi
Copy link
Collaborator

@vasilismsr But here we aren't in a high-dimensional regime; we're regressing the three log quantites on the three prices interacted with income (plus an intercept), so there are only 18 coefficients total.

Maybe we should see if we can find (or generate) a high-dimensional example as well, so that we can demonstrate using the trace norm in a more appropriate setting.

@vsyrgkanis
Copy link
Collaborator

I know. But the notebook is more for expository purposes, i.e. you could even do this. And we can say in the cell above that this would be more appropriate when you have many products and you believe they interact in a latent space. I think its an important special case and we could show we can handle it. We could in principle have a simulated data example too where we have many products.

Copy link
Contributor

@vasilismsr vasilismsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Copy link
Collaborator

@kbattocchi kbattocchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Vasilis says this looks good, but I've made a few minor suggestions.

notebooks/Double Machine Learning Examples.ipynb Outdated Show resolved Hide resolved
notebooks/Double Machine Learning Examples.ipynb Outdated Show resolved Hide resolved
notebooks/Double Machine Learning Examples.ipynb Outdated Show resolved Hide resolved
@kbattocchi
Copy link
Collaborator

As we discussed over lunch, see what happens if you add shuffle=True to the KFold constructor in the DML fit method and then use many fewer bootstrap samples - does that solve the confidence interval issue?

notebooks/Double Machine Learning Examples.ipynb Outdated Show resolved Hide resolved
notebooks/Double Machine Learning Examples.ipynb Outdated Show resolved Hide resolved
notebooks/Double Machine Learning Examples.ipynb Outdated Show resolved Hide resolved
@heimengqi heimengqi requested a review from moprescu April 9, 2019 19:09
@heimengqi heimengqi merged commit d2118d5 into master Apr 9, 2019
@heimengqi heimengqi deleted the mehei/dmlnb branch April 11, 2019 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants