Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change reference_data.py to use tf.gfile #3921

Merged
merged 3 commits into from
Apr 10, 2018
Merged

Conversation

robieta
Copy link
Contributor

@robieta robieta commented Apr 10, 2018

Reference tests are failing on TAP because they use open(). Due to some quirks with gfile (in py2 it defaults to bytes and in py3 it defaults to string, but it is illegal to pass "rt" or "rb") it is necessary to only interact with files as bytes. This leads to some awkward encode/decode calls with json, but those calls keep the code identical between py2 and py3.

@robieta robieta requested review from karmel and qlzh727 April 10, 2018 00:45
@robieta robieta requested a review from a team as a code owner April 10, 2018 00:45
Copy link
Member

@qlzh727 qlzh727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious about why read/write to JSON file as binary format. What's the content of JSON file? TF version should just be text.

@robieta
Copy link
Contributor Author

robieta commented Apr 10, 2018

My contention would be that we should adopt one absolutely safe way to dump JSON, and use it everywhere. Which unfortunately is serializing and then writing to the file. I understand the position that sometimes it's overkill, but that way the code is robust and you never have to think about whether the code will ever see ASCII.

Copy link
Contributor

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add lots of comments explaining why this is necessary to avoid the next helpful engineer from thinking, gee, this could be simpler if we...

@qlzh727
Copy link
Member

qlzh727 commented Apr 10, 2018

As chatted offline, I think for now its better to skip the encode/decode for utf8 since most of the data you dump to json are just text data. Using "wb" and "rb" confuses people.

@robieta robieta force-pushed the reference_testing_gfile branch from f3c519b to 3f29187 Compare April 10, 2018 19:44
@robieta
Copy link
Contributor Author

robieta commented Apr 10, 2018

Alright, everything should be good now. The tests were also throwing warnings due to a small change in batch_norm, so I updated them.

@robieta robieta force-pushed the reference_testing_gfile branch from b3a2443 to 7f6373d Compare April 10, 2018 22:51
@karmel
Copy link
Contributor

karmel commented Apr 10, 2018

"Superficial change in batch_norm" in theory shouldn't require an update here, right? I thought we were going to be robust to that? I worry that we're about to introduce flaky tests.

@robieta
Copy link
Contributor Author

robieta commented Apr 10, 2018

We are robust to that. If you don't change the reference files it still passes, you just get warnings that the op graph is different than what is expected. So once we confirm that the change is innocuous we can make the change at our convenience.

@robieta robieta merged commit 2661eb9 into master Apr 10, 2018
@robieta robieta deleted the reference_testing_gfile branch April 11, 2018 16:44
omegafragger pushed a commit to omegafragger/models that referenced this pull request May 15, 2018
* change reference_data.py to use tf.gfile

* simplify json treatment

* Update reference files to account for a superficial change in batch_norm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants