a bug about function bbox_iou() #2376

chongkuiqi · 2021-03-05T12:28:19Z

🐛 Bug

A clear and concise description of what the bug is.
In function bbox_iou() of general.py, if box1 is equal to box2, it should output 1.0, but we get NaN. And the default eps is too small for Half precision.

To Reproduce (REQUIRED)

Input:
(1)
a = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], dtype=torch.float32)
b = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], dtype=torch.float32)
import math
giou = bbox_iou(a.T, b, x1y1x2y2=False, CIoU=True) # giou(prediction, target)
print(giou)

it will output NaN, and if we change a line in bbox_iou(), ''alpha = v / ((1 + eps) - iou + v)'' ==> ''alpha = v / ((1 - iou + v)+eps)'', we will get the right output 1.0;

(2)
a = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], device='cuda:0', dtype=torch.float16)
b = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], device='cuda:0', dtype=torch.float16)
import math
giou = bbox_iou(a.T, b, x1y1x2y2=False, CIoU=True) # giou(prediction, target)
print(giou)

it will output NaN, and if we use 3e-8 as the default eps rather than 1e-9, we will get the right output 1.0;

glenn-jocher · 2021-03-05T19:19:50Z

@chongkuiqi thanks for the bug report! This is a very interesting find.

I used this order of operations specifically for speed, as 1 and eps are both scalars and iou and v are both large tensors, so the scalar + scalar operation results in a faster iou computation, but I didn't realize NaN outputs were possible with this. I'll take a look at your proposed solution.

The eps value also may need some adjustment as you found. Good discovery, I will look at this also! test.py generally runs at FP16, and training is mixed, so its possible that real world usage may be passing FP16 boxes in here.

glenn-jocher · 2021-03-05T19:23:19Z

@chongkuiqi could you submit a PR with your proposed fix please?

glenn-jocher · 2021-03-06T01:15:09Z

I get this profile time for the two implementations:

%timeit alpha = v / (-iou + v + (1 + eps))
11.2 µs ± 59.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit alpha = v / (-iou + v + 1 + eps)
16.3 µs ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit alpha = v / ((1 + eps) - iou + v)
16.8 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit alpha = v / ((1 - iou + v)+eps)
22.7 µs ± 299 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

When I try to test FP16 I get a not-implemented error, so I don't think this function is used in FP16 mode. One solution I found that works is updating default eps to 1e-7 to keep the fast alpha computation.

EDIT: fastest implementation on my computer seems to be here. Perhaps coupling this with an eps=1e-7 would work best.

%timeit alpha = v / (v - iou + (1 + eps))
9.28 µs ± 87.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

chongkuiqi · 2021-03-06T08:00:18Z

v / (v - iou + (1 + eps))

Thanks ! It's exciting! I didn't take speed into consideration before. I use "alpha = v / (v - iou + (1 + eps))" and test again.
(1) In FP32 mode,
box1 = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], device='cuda:0', dtype=torch.float32)
box2 = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], device='cuda:0', dtype=torch.float32)
import math
giou = bbox_iou(box1.T, box2, x1y1x2y2=False, CIoU=True, eps=1e-7) # giou(prediction, target)
print(giou)

It will output NaN, and with an eps=1e-7, it works well.
(2) In FP16 mode,
However, I got NaN again using "alpha = v / (v - iou + (1 + eps))" with an eps=1e-7, I set box1 and box2 in FP16 mode explicitly as follows:
box1 = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], device='cuda:0', dtype=torch.float16)
box2 = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], device='cuda:0', dtype=torch.float16)
import math
giou = bbox_iou(box1.T, box2, x1y1x2y2=False, CIoU=True, eps=1e-7) # giou(prediction, target)
print(giou)

glenn-jocher · 2021-03-06T20:53:16Z

Yes this is true I did not check FP16, and in general we want eps to be as small as possible to not impact the iou results. Let me see what the datatypes look like during training... ok in my colab tests I added a simple line to check the datatypes, and during GPU training and testing they show as FP32: print(box1.dtype, box2.dtype)

So I'll apply two changes then in a PR: 1) adjust eps to 1e-7 and modify alpha to the faster version, which should speed it up by almost 2X.

glenn-jocher · 2021-03-06T21:02:35Z

@chongkuiqi ok I've made a PR!

I think this is probably the best compromise. Another factor is that the function actually doesn't support FP16 on CPU, when I try it I get an error, so all of the usage is FP32 GPU/CPU.

RuntimeError: "clamp_min_cpu" not implemented for 'Half'

github-actions · 2021-04-06T00:16:40Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

chongkuiqi added the bug label Mar 5, 2021

glenn-jocher added the TODO label Mar 5, 2021

glenn-jocher mentioned this issue Mar 6, 2021

bbox_iou() stability and speed improvements #2385

Merged

glenn-jocher removed the TODO label Mar 6, 2021

glenn-jocher linked a pull request Mar 6, 2021 that will close this issue

bbox_iou() stability and speed improvements #2385

Merged

github-actions bot added the Stale label Apr 6, 2021

github-actions bot closed this as completed Apr 11, 2021

This was referenced Apr 11, 2021

YOLOv5 v5.0 Release #2762

Merged

YOLOv5 v5.0 release compatibility update for YOLOv3 ultralytics/yolov3#1737

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a bug about function bbox_iou() #2376

a bug about function bbox_iou() #2376

chongkuiqi commented Mar 5, 2021

glenn-jocher commented Mar 5, 2021

glenn-jocher commented Mar 5, 2021

glenn-jocher commented Mar 6, 2021 •

edited

Loading

chongkuiqi commented Mar 6, 2021

glenn-jocher commented Mar 6, 2021 •

edited

Loading

glenn-jocher commented Mar 6, 2021

github-actions bot commented Apr 6, 2021

a bug about function bbox_iou() #2376

a bug about function bbox_iou() #2376

Comments

chongkuiqi commented Mar 5, 2021

🐛 Bug

To Reproduce (REQUIRED)

glenn-jocher commented Mar 5, 2021

glenn-jocher commented Mar 5, 2021

glenn-jocher commented Mar 6, 2021 • edited Loading

chongkuiqi commented Mar 6, 2021

glenn-jocher commented Mar 6, 2021 • edited Loading

glenn-jocher commented Mar 6, 2021

github-actions bot commented Apr 6, 2021

glenn-jocher commented Mar 6, 2021 •

edited

Loading

glenn-jocher commented Mar 6, 2021 •

edited

Loading