-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DXGI_ERROR_DEVICE_REMOVED Error #95
Comments
Looks like Radeon Vega 8 is an integrated GPU, and from the logs you shared it looks like it's having trouble allocating memory. How much system memory (RAM) do you have? If you can provide a dxdiag.txt it would be helpful in understanding the capabilities of your system. One thing you can try is lowering the default DML heap allocator's allocation size from 4GB to something smaller. For example, you can add these lines to the top of your first script (or set the environment variable elsewhere before the python process launches): import os
os.environ["TF_DIRECTML_MAX_ALLOC_SIZE"] = "536870912" # 512MB You can also enabling verbose logging, which will print even more details that might help here. Example: import os
os.environ["TF_CPP_MIN_VLOG_LEVEL"] = "3" |
@jstoecker, Thank you for your help. here is the DxDiag.txt file. My PC has 6GB of ram and 2GB of GPU. I tested this parameter here to allocate memory and removed the error. Thanks a lot for the help |
Good to hear, and thanks for the dxdiag! I'll open a bug internally to see if we can improve this experience so it's not necessary to set an environment variable. |
One more thing to add - if you're still seeing the error with the yolov3 sample, don't forget to run setup.py first before trying detect_video.py, because it looks like it's having trouble finding the checkpoint file. :) |
@jstoecker and @adtsai really with the memory allocation it worked, now one thing I saw, was that detect-video.py is using shared memory and not dedicated memory. Do you know that directml supports access to dedicated memory? I ask this because the detection of the objects is very slow |
In short: yes, DirectML supports access to dedicated memory! DirectML itself doesn't allocate memory for GPU resources: that's up to the application/framework using it, such as TensorFlow-DirectML (TFDML) in this case. TFDML has a number of allocators for different purposes, but the bulk of the memory (to store the tensors used in GPU calculations) will be backed by subregions of a so-called default heap. Default heaps reflect different memory pools based on the GPU architecture (UMA or NUMA/discrete). Your Radeon Vega 8 is an integrated GPU, so the 2GB of dedicated memory you see isn't physical VRAM but rather reserved system memory. In other words, your system actually has 8GB of RAM, but the integrated GPU is claiming 2GB of it for exclusive access. This blog explains some of the differences between dedicated and shared memory, how they are reported in task manager, and some differences between discrete and integrated GPUs in this respect. Integrated GPUs are, unfortunately, not going to be particularly fast in machine learning. It's worth pointing out that we haven't really optimized TFDML for integrated GPUs (e.g. we could avoid some memory copies since default-heap resources will always live in the "L0" memory pool); however, it's unlikely that you'll see huge performance gains over the CPU without using a more powerful discrete GPU. |
hello, i have a problem. I don't know if anyone has had this problem. I have a Vega8, the drivers are all installed correctly but it is giving the error DXGI_ERROR_DEVICE_REMOVED when I try to run the following script.
I've already followed the instructions on the link https://aka.ms/tfdmltimeout but it doesn't work.
I think this is the problem when I try to run
The text was updated successfully, but these errors were encountered: