Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deeplab] UnicodeDecodeError #3885

Closed
hhwxxx opened this issue Apr 5, 2018 · 12 comments
Closed

[deeplab] UnicodeDecodeError #3885

hhwxxx opened this issue Apr 5, 2018 · 12 comments
Assignees

Comments

@hhwxxx
Copy link

hhwxxx commented Apr 5, 2018

System information

  • What is the top-level directory of the model you are using: /deeplab
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): ('v1.6.0-0-gd2e24b6039', '1.6.0')
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version: 9.0/7.0
  • GPU model and memory: 4 * 1080 Ti / 11GB
  • Exact command to reproduce: bash local_test.sh

Describe the problem

Hello. I am trying to run the script /deeplab/local_test.sh, but I got the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
when it begin to run build_voc2012_data.py.
I have no idea how to solve this probelm.
Can anyone give me some advice. Thanks for your help.

Source code / logs

Converting PASCAL VOC 2012 dataset...
2018-04-05 23:16:40.948383: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-05 23:16:41.469332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.91GiB freeMemory: 10.71GiB
2018-04-05 23:16:41.849896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:03:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-04-05 23:16:42.222259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:82:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-04-05 23:16:42.613493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:83:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-04-05 23:16:42.615426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1227] Device peer to peer matrix
2018-04-05 23:16:42.615527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1233] DMA: 0 1 2 3
2018-04-05 23:16:42.615540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 0: Y Y N N
2018-04-05 23:16:42.615548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 1: Y Y N N
2018-04-05 23:16:42.615555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 2: N N Y Y
2018-04-05 23:16:42.615563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 3: N N Y Y
2018-04-05 23:16:42.615581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0, 1, 2, 3
2018-04-05 23:16:43.988067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10365 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-04-05 23:16:45.055691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10407 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2018-04-05 23:16:45.914958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10407 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1)
2018-04-05 23:16:47.003813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10407 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1)
2018-04-05 23:16:48.271267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0, 1, 2, 3
2018-04-05 23:16:48.271744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 320 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-04-05 23:16:48.272464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 322 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2018-04-05 23:16:48.280169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 322 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1)
2018-04-05 23:16:48.280717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 322 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1)

Converting image 1/1449 shard 0Traceback (most recent call last):
File "./build_voc2012_data.py", line 142, in
tf.app.run()
File "/data0/hhw/paper/KITTI/py2.7_KITTI/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "./build_voc2012_data.py", line 138, in main
_convert_dataset(dataset_split)
File "./build_voc2012_data.py", line 129, in _convert_dataset
image_data, filenames[i], height, width, seg_data)
File "/data0/hhw/paper/KITTI/deeplab/datasets/build_data.py", line 148, in image_seg_to_tfexample
'image/encoded': _bytes_list_feature(image_data),
File "/data0/hhw/paper/KITTI/deeplab/datasets/build_data.py", line 131, in _bytes_list_feature
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))
File "/data0/hhw/paper/KITTI/deeplab/datasets/build_data.py", line 129, in norm2bytes
return value.encode() if isinstance(value, str) else value
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

@yugitw
Copy link

yugitw commented Apr 5, 2018

Pull the latest version should fix the problem.

@georgosgeorgos
Copy link

try to run build_voc2012_data.py with python2

@hhwxxx
Copy link
Author

hhwxxx commented Apr 6, 2018

@georgosgeorgos
Thank you.
But I run build_voc2012_data.py with python2 and get the error above.

@hhwxxx
Copy link
Author

hhwxxx commented Apr 6, 2018

@onlythr3e
Thank you.
I pull the latest version. But the error still exists.

@sk-g
Copy link

sk-g commented Apr 6, 2018

I think this might be because of the way python 2 and python 3 handle default encodings. Try changing the default encoding to UTF-8. So in build_voc2012_data.py try adding a default encoding style or manually check for the proper encoding to be used. This s/o topic might help.

@hhwxxx
Copy link
Author

hhwxxx commented Apr 6, 2018

@sk-g
Thank you for your advice.

Finally I find the problem is caused by the difference between previous version and the latest version in the file "/deeplab/datasets/build_data.py".
In the function _bytes_list_feature(), the latest version add a new function norm2bytes() in it. And the error is caused by this new function. I tried to use a proper encoding, but I don't know which one to choose.

The latest function is below.

 def _bytes_list_feature(values):   
    def norm2bytes(value):
        return value.encode() if isinstance(value, str) else value   
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))

And the previous one is below.

def _bytes_list_feature(values):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values]))

At last I replace the _bytes_list_feature() with the previous one, then I can run local_test.sh.
I don't know whether there would be a side-effect. Can anyone give a better solution?

@qlzh727
Copy link
Member

qlzh727 commented Apr 6, 2018

As @sk-g pointed out, this is caused by the default encoding difference between python 2.7 (ascii) and 3.x (utf-8). When you can do to test this out is change the code as following:

"return value.encode()" to "return value.encode('utf-8')"

and see if the problem mitigate itself.

@qlzh727 qlzh727 added the stat:awaiting response Waiting on input from the contributor label Apr 6, 2018
@hhwxxx
Copy link
Author

hhwxxx commented Apr 7, 2018

@qlzh727 Thank you.
I tried, but the error still exists.

@mattgara
Copy link

mattgara commented Apr 8, 2018

I have the same issue as OP.

I have been able to get it to partially work by explicitly replacing

    def norm2bytes(value):
        return value.encode() if isinstance(value, str) else value  

with

    def norm2bytes(value):
        return unicode(value,'utf8','ignore') if isinstance(value, str) else value  

however, I then get the error:

TypeError: u'\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\x00C\x00\x02\x01\x01\x01\x01\x01\x02\x01\x01\ has type unicode, but expected one of: bytes

@qlzh727
Copy link
Member

qlzh727 commented Apr 8, 2018

I think the cause here is py2/3 compatibility. https://github.com/tensorflow/models/pull/3852/files was trying to make the code py3 compatible, however break py2 at the same time.

@qlzh727 qlzh727 removed the stat:awaiting response Waiting on input from the contributor label Apr 8, 2018
@ruifmarques
Copy link

I get the same error using py3

qlzh727 added a commit to qlzh727/models that referenced this issue Apr 10, 2018
The current code does not work in py2 since the input string could
contain unicode string, and default encoding in ASCII in py2. Change
the method to only do encode() which convert string to byte array
when running python3.
YknZhu added a commit that referenced this issue Apr 10, 2018
Fix string to bytes conversion in deeplab data input. #3885
@qlzh727
Copy link
Member

qlzh727 commented Apr 10, 2018

Should be fixed in #3941

@qlzh727 qlzh727 closed this as completed Apr 10, 2018
omegafragger pushed a commit to omegafragger/models that referenced this issue May 15, 2018
The current code does not work in py2 since the input string could
contain unicode string, and default encoding in ASCII in py2. Change
the method to only do encode() which convert string to byte array
when running python3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants