Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[suggestion]WSL2 vGPU performance naturally lacking; DirectML on native Linux? #166

Open
Sazoji opened this issue Oct 23, 2021 · 1 comment
Labels
pytorch-directml Issues in PyTorch when using its DirectML backend

Comments

@Sazoji
Copy link

Sazoji commented Oct 23, 2021

Running tensorflow-directml and now pytorch-directml has possible use with heterogeneous multi GPU training, especially under Linux in WSL2, but the fact that wsl2 docker, wsl2 Ubuntu, and anything requiring Linux in ML training has to be running in a virtualized GPU instance means that native windows or native Linux will always run faster. Looking at a single 3090 running under wsl2, the reduction is over 20% moving from windows directly(which often isn't available in training), with even more performance lost compared to native Linux (you lose a nearly half of performance for a single GPU on Linux compared to WSL2, comparing CUDA -> DirectML, Linux -> Windows, and Windows emulating WSL2) before accounting for overhead when using heterogeneous GPUs and device-specific acceleration.
Is there the slightest possibility of a Linux or minimal Windows overhead instance where docker can run at near-native speed AND have DirectML functionality? An operating system that can support btrfs drives with 500gb+ datasets without locking up due to poor driver performance, use Linux-specific python libraries, and utilize multiple AMD GPUs under DirectML for training/deploying PyTorch models to ONNX without being emulated under a whole other operating system?

Pytorch-DirectML under WSL2 finally allows for many mainstream GPUs without RCOm to be used in deep learning, but it will always be slower running in WSL2 than native Linux; naturally the solution is to have DirectML outside Windows or in a pipeline for WinML deployment using ONNX to windows inference.
It is currently impractical to explore ways to make the ML training space more democratic if the only ways for users to train new models locally is within a walled-off program limited to windows insiders at a severely reduced speed or buy into CUDA.

@ryanlai2 ryanlai2 added the pytorch-directml Issues in PyTorch when using its DirectML backend label Oct 26, 2021
@ryanlai2
Copy link
Contributor

HI @mjc619 , we have implemented operator caching in our latest release of the package. Can you try it out to see if it meets your performance requirements?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pytorch-directml Issues in PyTorch when using its DirectML backend
Projects
None yet
Development

No branches or pull requests

2 participants