[deeplab] UnicodeDecodeError #3885

hhwxxx · 2018-04-05T15:39:27Z

System information

What is the top-level directory of the model you are using: /deeplab
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): ('v1.6.0-0-gd2e24b6039', '1.6.0')
Bazel version (if compiling from source):
CUDA/cuDNN version: 9.0/7.0
GPU model and memory: 4 * 1080 Ti / 11GB
Exact command to reproduce: bash local_test.sh

Describe the problem

Hello. I am trying to run the script /deeplab/local_test.sh, but I got the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
when it begin to run build_voc2012_data.py.
I have no idea how to solve this probelm.
Can anyone give me some advice. Thanks for your help.

Source code / logs

Converting PASCAL VOC 2012 dataset...
2018-04-05 23:16:40.948383: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-05 23:16:41.469332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.91GiB freeMemory: 10.71GiB
2018-04-05 23:16:41.849896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:03:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-04-05 23:16:42.222259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:82:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-04-05 23:16:42.613493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:83:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-04-05 23:16:42.615426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1227] Device peer to peer matrix
2018-04-05 23:16:42.615527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1233] DMA: 0 1 2 3
2018-04-05 23:16:42.615540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 0: Y Y N N
2018-04-05 23:16:42.615548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 1: Y Y N N
2018-04-05 23:16:42.615555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 2: N N Y Y
2018-04-05 23:16:42.615563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1243] 3: N N Y Y
2018-04-05 23:16:42.615581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0, 1, 2, 3
2018-04-05 23:16:43.988067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10365 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-04-05 23:16:45.055691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10407 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2018-04-05 23:16:45.914958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10407 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1)
2018-04-05 23:16:47.003813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10407 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1)
2018-04-05 23:16:48.271267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0, 1, 2, 3
2018-04-05 23:16:48.271744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 320 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-04-05 23:16:48.272464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 322 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2018-04-05 23:16:48.280169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 322 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1)
2018-04-05 23:16:48.280717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 322 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1)

Converting image 1/1449 shard 0Traceback (most recent call last):
File "./build_voc2012_data.py", line 142, in
tf.app.run()
File "/data0/hhw/paper/KITTI/py2.7_KITTI/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "./build_voc2012_data.py", line 138, in main
_convert_dataset(dataset_split)
File "./build_voc2012_data.py", line 129, in _convert_dataset
image_data, filenames[i], height, width, seg_data)
File "/data0/hhw/paper/KITTI/deeplab/datasets/build_data.py", line 148, in image_seg_to_tfexample
'image/encoded': _bytes_list_feature(image_data),
File "/data0/hhw/paper/KITTI/deeplab/datasets/build_data.py", line 131, in _bytes_list_feature
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))
File "/data0/hhw/paper/KITTI/deeplab/datasets/build_data.py", line 129, in norm2bytes
return value.encode() if isinstance(value, str) else value
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

yugitw · 2018-04-05T19:19:01Z

Pull the latest version should fix the problem.

georgosgeorgos · 2018-04-05T21:28:20Z

try to run build_voc2012_data.py with python2

hhwxxx · 2018-04-06T00:49:29Z

@georgosgeorgos
Thank you.
But I run build_voc2012_data.py with python2 and get the error above.

hhwxxx · 2018-04-06T01:00:08Z

@onlythr3e
Thank you.
I pull the latest version. But the error still exists.

sk-g · 2018-04-06T01:07:36Z

I think this might be because of the way python 2 and python 3 handle default encodings. Try changing the default encoding to UTF-8. So in build_voc2012_data.py try adding a default encoding style or manually check for the proper encoding to be used. This s/o topic might help.

hhwxxx · 2018-04-06T03:16:49Z

@sk-g
Thank you for your advice.

Finally I find the problem is caused by the difference between previous version and the latest version in the file "/deeplab/datasets/build_data.py".
In the function _bytes_list_feature(), the latest version add a new function norm2bytes() in it. And the error is caused by this new function. I tried to use a proper encoding, but I don't know which one to choose.

The latest function is below.

 def _bytes_list_feature(values):   
    def norm2bytes(value):
        return value.encode() if isinstance(value, str) else value   
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))

And the previous one is below.

def _bytes_list_feature(values):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values]))

At last I replace the _bytes_list_feature() with the previous one, then I can run local_test.sh.
I don't know whether there would be a side-effect. Can anyone give a better solution?

qlzh727 · 2018-04-06T21:10:07Z

As @sk-g pointed out, this is caused by the default encoding difference between python 2.7 (ascii) and 3.x (utf-8). When you can do to test this out is change the code as following:

"return value.encode()" to "return value.encode('utf-8')"

and see if the problem mitigate itself.

hhwxxx · 2018-04-07T14:09:55Z

@qlzh727 Thank you.
I tried, but the error still exists.

mattgara · 2018-04-08T01:25:10Z

I have the same issue as OP.

I have been able to get it to partially work by explicitly replacing

    def norm2bytes(value):
        return value.encode() if isinstance(value, str) else value

with

    def norm2bytes(value):
        return unicode(value,'utf8','ignore') if isinstance(value, str) else value

however, I then get the error:

TypeError: u'\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\x00C\x00\x02\x01\x01\x01\x01\x01\x02\x01\x01\ has type unicode, but expected one of: bytes

qlzh727 · 2018-04-08T02:42:51Z

I think the cause here is py2/3 compatibility. https://github.com/tensorflow/models/pull/3852/files was trying to make the code py3 compatible, however break py2 at the same time.

ruifmarques · 2018-04-10T08:16:10Z

I get the same error using py3

The current code does not work in py2 since the input string could contain unicode string, and default encoding in ASCII in py2. Change the method to only do encode() which convert string to byte array when running python3.

Fix string to bytes conversion in deeplab data input. #3885

qlzh727 · 2018-04-10T19:45:51Z

Should be fixed in #3941

The current code does not work in py2 since the input string could contain unicode string, and default encoding in ASCII in py2. Change the method to only do encode() which convert string to byte array when running python3.

tensorflowbutler assigned qlzh727 Apr 6, 2018

qlzh727 added the stat:awaiting response label Apr 6, 2018

qlzh727 mentioned this issue Apr 8, 2018

Fix string to bytes conversion in deeplab data input. #3910

Closed

qlzh727 removed the stat:awaiting response label Apr 8, 2018

qlzh727 mentioned this issue Apr 10, 2018

Fix string to bytes conversion in deeplab data input. #3924

Closed

YknZhu added a commit that referenced this issue Apr 10, 2018

Merge pull request #3941 from qlzh727/3885-fix

748ed3b

Fix string to bytes conversion in deeplab data input. #3885

qlzh727 closed this as completed Apr 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[deeplab] UnicodeDecodeError #3885

[deeplab] UnicodeDecodeError #3885

hhwxxx commented Apr 5, 2018

yugitw commented Apr 5, 2018

georgosgeorgos commented Apr 5, 2018

hhwxxx commented Apr 6, 2018

hhwxxx commented Apr 6, 2018

sk-g commented Apr 6, 2018

hhwxxx commented Apr 6, 2018 •

edited

Loading

qlzh727 commented Apr 6, 2018

hhwxxx commented Apr 7, 2018

mattgara commented Apr 8, 2018

qlzh727 commented Apr 8, 2018

ruifmarques commented Apr 10, 2018

qlzh727 commented Apr 10, 2018

[deeplab] UnicodeDecodeError #3885

[deeplab] UnicodeDecodeError #3885

Comments

hhwxxx commented Apr 5, 2018

System information

Describe the problem

Source code / logs

yugitw commented Apr 5, 2018

georgosgeorgos commented Apr 5, 2018

hhwxxx commented Apr 6, 2018

hhwxxx commented Apr 6, 2018

sk-g commented Apr 6, 2018

hhwxxx commented Apr 6, 2018 • edited Loading

qlzh727 commented Apr 6, 2018

hhwxxx commented Apr 7, 2018

mattgara commented Apr 8, 2018

qlzh727 commented Apr 8, 2018

ruifmarques commented Apr 10, 2018

qlzh727 commented Apr 10, 2018

hhwxxx commented Apr 6, 2018 •

edited

Loading