-
-
Notifications
You must be signed in to change notification settings - Fork 16.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a bug about function bbox_iou() #2376
Comments
@chongkuiqi thanks for the bug report! This is a very interesting find. I used this order of operations specifically for speed, as 1 and eps are both scalars and iou and v are both large tensors, so the scalar + scalar operation results in a faster iou computation, but I didn't realize NaN outputs were possible with this. I'll take a look at your proposed solution. The eps value also may need some adjustment as you found. Good discovery, I will look at this also! test.py generally runs at FP16, and training is mixed, so its possible that real world usage may be passing FP16 boxes in here. |
@chongkuiqi could you submit a PR with your proposed fix please? |
I get this profile time for the two implementations: %timeit alpha = v / (-iou + v + (1 + eps))
11.2 µs ± 59.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit alpha = v / (-iou + v + 1 + eps)
16.3 µs ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit alpha = v / ((1 + eps) - iou + v)
16.8 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit alpha = v / ((1 - iou + v)+eps)
22.7 µs ± 299 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) When I try to test FP16 I get a not-implemented error, so I don't think this function is used in FP16 mode. One solution I found that works is updating default eps to 1e-7 to keep the fast alpha computation. EDIT: fastest implementation on my computer seems to be here. Perhaps coupling this with an eps=1e-7 would work best. %timeit alpha = v / (v - iou + (1 + eps))
9.28 µs ± 87.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) |
Thanks ! It's exciting! I didn't take speed into consideration before. I use "alpha = v / (v - iou + (1 + eps))" and test again. It will output NaN, and with an eps=1e-7, it works well. |
Yes this is true I did not check FP16, and in general we want eps to be as small as possible to not impact the iou results. Let me see what the datatypes look like during training... ok in my colab tests I added a simple line to check the datatypes, and during GPU training and testing they show as FP32: So I'll apply two changes then in a PR: 1) adjust eps to 1e-7 and modify alpha to the faster version, which should speed it up by almost 2X. |
@chongkuiqi ok I've made a PR! I think this is probably the best compromise. Another factor is that the function actually doesn't support FP16 on CPU, when I try it I get an error, so all of the usage is FP32 GPU/CPU.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
🐛 Bug
A clear and concise description of what the bug is.
In function bbox_iou() of general.py, if box1 is equal to box2, it should output 1.0, but we get NaN. And the default eps is too small for Half precision.
To Reproduce (REQUIRED)
Input:
(1)
a = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], dtype=torch.float32)
b = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], dtype=torch.float32)
import math
giou = bbox_iou(a.T, b, x1y1x2y2=False, CIoU=True) # giou(prediction, target)
print(giou)
it will output NaN, and if we change a line in bbox_iou(), ''alpha = v / ((1 + eps) - iou + v)'' ==> ''alpha = v / ((1 - iou + v)+eps)'', we will get the right output 1.0;
(2)
a = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], device='cuda:0', dtype=torch.float16)
b = torch.tensor([[1.42969, 0.25635, 3.78125, 7.37109]], device='cuda:0', dtype=torch.float16)
import math
giou = bbox_iou(a.T, b, x1y1x2y2=False, CIoU=True) # giou(prediction, target)
print(giou)
it will output NaN, and if we use 3e-8 as the default eps rather than 1e-9, we will get the right output 1.0;
The text was updated successfully, but these errors were encountered: