[AARCH64] Fall back to GEMM if mkldnn_matmul fails by malfet · Pull Request #115936 · pytorch/pytorch

malfet · 2023-12-15T17:12:21Z

Add call to at::globalContext().userEnabledMkldnn() to apply_mkldnn_matmul_heur
Surround calls to mkldnn_matmul with try {} catch {}
Print warning and fall back to BLAS (by calling at::globalContext().setUserEnabledMkldnn()) if mkldnn_matmul() fails

Test plan: On Linux arm run:

$ sudo chmod 400 /sys; python -c "import torch;m=torch.nn.Linear(1, 32);print(torch.__version__);print(m(torch.rand(32, 1)))"
Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors
2.3.0.dev20231215
bad err=11 in Xbyak::Error
bad err=11 in Xbyak::Error
/home/ubuntu/miniconda3/envs/py311/lib/python3.11/site-packages/torch/nn/modules/linear.py:116: UserWarning: mkldnn_matmul failed, switching to BLAS gemm:internal error (Triggered internally at /pytorch/aten/src/ATen/native/LinearAlgebra.cpp:1509.)
  return F.linear(input, self.weight, self.bias)
tensor([[-0.5183,  0.2279, -0.4035,  ..., -0.3446,  0.0938, -0.2113],
        [-0.5111,  0.2362, -0.3821,  ..., -0.3536,  0.1011, -0.2159],
        [-0.6387,  0.0894, -0.7619,  ..., -0.1939, -0.0282, -0.1344],
        ...,
        [-0.6352,  0.0934, -0.7516,  ..., -0.1983, -0.0247, -0.1366],
        [-0.4790,  0.2733, -0.2862,  ..., -0.3939,  0.1338, -0.2365],
        [-0.5702,  0.1682, -0.5580,  ..., -0.2796,  0.0412, -0.1782]],
       grad_fn=<AddmmBackward0>)

Fixes #114750

pytorch-bot · 2023-12-15T17:12:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115936

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 4a3390c with merge base 4ea7430 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 7, linux.2xlarge) (gh)
test_linalg.py::TestLinalgCPU::test_ldl_factor_cpu_complex64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lezcano

Do we test this in CI? If so, can we add a regression test?

malfet · 2023-12-15T18:18:12Z

@lezcano we can only test it in nightlies, but I will add it later (and do some manual testing right now)

context

malfet · 2023-12-16T19:04:06Z

@pytorchbot merge

pytorchmergebot · 2023-12-16T19:06:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

malfet · 2023-12-16T19:07:44Z

@jgong5 FYI, is it safe to assume that mkldnn_matmul should never raise an exception and if it does, it safe to fallback to BLAS GEMM? Also, would be nice to check perf impact (should be near zero as try which do not raise are generally safe)

- Add call to `at::globalContext().userEnabledMkldnn()` to `apply_mkldnn_matmul_heur` - Surround calls to `mkldnn_matmul` with `try {} catch {}` - Print warning and fall back to BLAS (by calling `at::globalContext().setUserEnabledMkldnn()`) if `mkldnn_matmul()` fails Test plan: On Linux arm run: ```shell $ sudo chmod 400 /sys; python -c "import torch;m=torch.nn.Linear(1, 32);print(torch.__version__);print(m(torch.rand(32, 1)))" Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present Error in cpuinfo: failed to parse both lists of possible and present processors 2.3.0.dev20231215 bad err=11 in Xbyak::Error bad err=11 in Xbyak::Error /home/ubuntu/miniconda3/envs/py311/lib/python3.11/site-packages/torch/nn/modules/linear.py:116: UserWarning: mkldnn_matmul failed, switching to BLAS gemm:internal error (Triggered internally at /pytorch/aten/src/ATen/native/LinearAlgebra.cpp:1509.) return F.linear(input, self.weight, self.bias) tensor([[-0.5183, 0.2279, -0.4035, ..., -0.3446, 0.0938, -0.2113], [-0.5111, 0.2362, -0.3821, ..., -0.3536, 0.1011, -0.2159], [-0.6387, 0.0894, -0.7619, ..., -0.1939, -0.0282, -0.1344], ..., [-0.6352, 0.0934, -0.7516, ..., -0.1983, -0.0247, -0.1366], [-0.4790, 0.2733, -0.2862, ..., -0.3939, 0.1338, -0.2365], [-0.5702, 0.1682, -0.5580, ..., -0.2796, 0.0412, -0.1782]], grad_fn=<AddmmBackward0>) ``` Fixes pytorch#114750 Pull Request resolved: pytorch#115936 Approved by: https://github.com/lezcano

- Add call to `at::globalContext().userEnabledMkldnn()` to `apply_mkldnn_matmul_heur` - Surround calls to `mkldnn_matmul` with `try {} catch {}` - Print warning and fall back to BLAS (by calling `at::globalContext().setUserEnabledMkldnn()`) if `mkldnn_matmul()` fails Test plan: On Linux arm run: ```shell $ sudo chmod 400 /sys; python -c "import torch;m=torch.nn.Linear(1, 32);print(torch.__version__);print(m(torch.rand(32, 1)))" Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present Error in cpuinfo: failed to parse both lists of possible and present processors 2.3.0.dev20231215 bad err=11 in Xbyak::Error bad err=11 in Xbyak::Error /home/ubuntu/miniconda3/envs/py311/lib/python3.11/site-packages/torch/nn/modules/linear.py:116: UserWarning: mkldnn_matmul failed, switching to BLAS gemm:internal error (Triggered internally at /pytorch/aten/src/ATen/native/LinearAlgebra.cpp:1509.) return F.linear(input, self.weight, self.bias) tensor([[-0.5183, 0.2279, -0.4035, ..., -0.3446, 0.0938, -0.2113], [-0.5111, 0.2362, -0.3821, ..., -0.3536, 0.1011, -0.2159], [-0.6387, 0.0894, -0.7619, ..., -0.1939, -0.0282, -0.1344], ..., [-0.6352, 0.0934, -0.7516, ..., -0.1983, -0.0247, -0.1366], [-0.4790, 0.2733, -0.2862, ..., -0.3939, 0.1338, -0.2365], [-0.5702, 0.1682, -0.5580, ..., -0.2796, 0.0412, -0.1782]], grad_fn=<AddmmBackward0>) ``` Fixes #114750 Pull Request resolved: #115936 Approved by: https://github.com/lezcano

- Add call to `at::globalContext().userEnabledMkldnn()` to `apply_mkldnn_matmul_heur` - Surround calls to `mkldnn_matmul` with `try {} catch {}` - Print warning and fall back to BLAS (by calling `at::globalContext().setUserEnabledMkldnn()`) if `mkldnn_matmul()` fails Test plan: On Linux arm run: ```shell $ sudo chmod 400 /sys; python -c "import torch;m=torch.nn.Linear(1, 32);print(torch.__version__);print(m(torch.rand(32, 1)))" Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present Error in cpuinfo: failed to parse both lists of possible and present processors 2.3.0.dev20231215 bad err=11 in Xbyak::Error bad err=11 in Xbyak::Error /home/ubuntu/miniconda3/envs/py311/lib/python3.11/site-packages/torch/nn/modules/linear.py:116: UserWarning: mkldnn_matmul failed, switching to BLAS gemm:internal error (Triggered internally at /pytorch/aten/src/ATen/native/LinearAlgebra.cpp:1509.) return F.linear(input, self.weight, self.bias) tensor([[-0.5183, 0.2279, -0.4035, ..., -0.3446, 0.0938, -0.2113], [-0.5111, 0.2362, -0.3821, ..., -0.3536, 0.1011, -0.2159], [-0.6387, 0.0894, -0.7619, ..., -0.1939, -0.0282, -0.1344], ..., [-0.6352, 0.0934, -0.7516, ..., -0.1983, -0.0247, -0.1366], [-0.4790, 0.2733, -0.2862, ..., -0.3939, 0.1338, -0.2365], [-0.5702, 0.1682, -0.5580, ..., -0.2796, 0.0412, -0.1782]], grad_fn=<AddmmBackward0>) ``` Fixes #114750 Pull Request resolved: #115936 Approved by: https://github.com/lezcano Co-authored-by: Nikita Shulga <nshulga@meta.com>

[AARCH64] Fall back to GEMM if mkldnn_matmul fails

fafff73

Fixes #114750

malfet requested review from IvanYashchuk, lezcano and nikitaved as code owners December 15, 2023 17:12

pytorch-bot bot added the release notes: linalg_frontend release notes category label Dec 15, 2023

lezcano approved these changes Dec 15, 2023

View reviewed changes

malfet added ciflow/binaries Trigger all binary build and upload jobs on the PR topic: bug fixes topic category ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR and removed ciflow/binaries Trigger all binary build and upload jobs on the PR labels Dec 15, 2023

Change code a bit more for all mkldnn calls to respect global user

4a3390c

context

malfet requested review from CaoE and jgong5 December 16, 2023 15:02

malfet added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 16, 2023

pytorchmergebot added the merging label Dec 16, 2023

pytorchmergebot added Merged and removed merging labels Dec 16, 2023

pytorchmergebot closed this in 4123cca Dec 16, 2023

malfet deleted the malfet/aarch64-fallback-to-openblas branch December 18, 2023 19:04

atalman added this to the 2.2.0 milestone Jan 2, 2024

huydhn mentioned this pull request Jan 2, 2024

[v.2.2.0] Release Tracker #115300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AARCH64] Fall back to GEMM if mkldnn_matmul fails#115936

[AARCH64] Fall back to GEMM if mkldnn_matmul fails#115936
malfet wants to merge 2 commits intomainfrom
malfet/aarch64-fallback-to-openblas

malfet commented Dec 15, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 15, 2023 •

edited

Loading

Uh oh!

lezcano left a comment

Uh oh!

malfet commented Dec 15, 2023

Uh oh!

malfet commented Dec 16, 2023

Uh oh!

pytorchmergebot commented Dec 16, 2023

Uh oh!

malfet commented Dec 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

malfet commented Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115936

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

malfet commented Dec 15, 2023

Uh oh!

malfet commented Dec 16, 2023

Uh oh!

pytorchmergebot commented Dec 16, 2023

Merge started

Uh oh!

malfet commented Dec 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

malfet commented Dec 15, 2023 •

edited

Loading

pytorch-bot bot commented Dec 15, 2023 •

edited

Loading