You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running tensorflow-directml and now pytorch-directml has possible use with heterogeneous multi GPU training, especially under Linux in WSL2, but the fact that wsl2 docker, wsl2 Ubuntu, and anything requiring Linux in ML training has to be running in a virtualized GPU instance means that native windows or native Linux will always run faster. Looking at a single 3090 running under wsl2, the reduction is over 20% moving from windows directly(which often isn't available in training), with even more performance lost compared to native Linux (you lose a nearly half of performance for a single GPU on Linux compared to WSL2, comparing CUDA -> DirectML, Linux -> Windows, and Windows emulating WSL2) before accounting for overhead when using heterogeneous GPUs and device-specific acceleration.
Is there the slightest possibility of a Linux or minimal Windows overhead instance where docker can run at near-native speed AND have DirectML functionality? An operating system that can support btrfs drives with 500gb+ datasets without locking up due to poor driver performance, use Linux-specific python libraries, and utilize multiple AMD GPUs under DirectML for training/deploying PyTorch models to ONNX without being emulated under a whole other operating system?
Pytorch-DirectML under WSL2 finally allows for many mainstream GPUs without RCOm to be used in deep learning, but it will always be slower running in WSL2 than native Linux; naturally the solution is to have DirectML outside Windows or in a pipeline for WinML deployment using ONNX to windows inference.
It is currently impractical to explore ways to make the ML training space more democratic if the only ways for users to train new models locally is within a walled-off program limited to windows insiders at a severely reduced speed or buy into CUDA.
The text was updated successfully, but these errors were encountered:
HI @mjc619 , we have implemented operator caching in our latest release of the package. Can you try it out to see if it meets your performance requirements?
Running tensorflow-directml and now pytorch-directml has possible use with heterogeneous multi GPU training, especially under Linux in WSL2, but the fact that wsl2 docker, wsl2 Ubuntu, and anything requiring Linux in ML training has to be running in a virtualized GPU instance means that native windows or native Linux will always run faster. Looking at a single 3090 running under wsl2, the reduction is over 20% moving from windows directly(which often isn't available in training), with even more performance lost compared to native Linux (you lose a nearly half of performance for a single GPU on Linux compared to WSL2, comparing CUDA -> DirectML, Linux -> Windows, and Windows emulating WSL2) before accounting for overhead when using heterogeneous GPUs and device-specific acceleration.
Is there the slightest possibility of a Linux or minimal Windows overhead instance where docker can run at near-native speed AND have DirectML functionality? An operating system that can support btrfs drives with 500gb+ datasets without locking up due to poor driver performance, use Linux-specific python libraries, and utilize multiple AMD GPUs under DirectML for training/deploying PyTorch models to ONNX without being emulated under a whole other operating system?
Pytorch-DirectML under WSL2 finally allows for many mainstream GPUs without RCOm to be used in deep learning, but it will always be slower running in WSL2 than native Linux; naturally the solution is to have DirectML outside Windows or in a pipeline for WinML deployment using ONNX to windows inference.
It is currently impractical to explore ways to make the ML training space more democratic if the only ways for users to train new models locally is within a walled-off program limited to windows insiders at a severely reduced speed or buy into CUDA.
The text was updated successfully, but these errors were encountered: