Flash Attention for Neuron #939

apoorvtintin · 2025-01-21T23:15:13Z

This PR adds support for flash attention kernel for Neuron implemented through Neuron Kernel Interface (NKI).

The flash attention kernel works with TRN1 and TRN2.

This PR is a newer version of #883 from a different fork. All comments from the previous PR are addressed in this one. It has dropout support.

Dropout and Segment ID support in the flash attention kernel is in progress and will be available at a later date.

kelvin-zou

Maybe wait until this PR is checked in. From what i can tell, your PR also has the remat bug not fixed. #942 (review)

axlearn/common/flash_attention/neuron_attention.py

axlearn/common/flash_attention/utils.py

ruomingp · 2025-01-22T15:03:22Z

axlearn/common/flash_attention/neuron_attention.py

+
+
+def _mha_forward(query, key, value, bias, causal, softmax_scale, dropout_rate):
+    # Get the batch size, sequence lengths, number of heads, and hidden dimension


Nit: end comments with . (here and everywhere)

axlearn/common/flash_attention/neuron_attention.py

kelvin-zou · 2025-01-23T02:38:20Z

axlearn/common/flash_attention/neuron_attention.py

+    key: Tensor,
+    value: Tensor,
+    bias: Tensor,
+    causal: bool = False,


Can we support segment ID? Or a more general masking fn (with optimized handling) is even better.

If not, I am fine with leaving a TODO here, but it is a hard blocker for enabling it for our internal training.

Can we do segment IDs in a separate PR? That involves non-trivial work and needs some time.

Sure, in this regard, I may ask for more, let's do general mask then, since we have want things beyond causal.

apoorvtintin · 2025-01-23T02:39:49Z

Thanks for all the reviews @ruomingp @kelvin-zou. I resolved all the comments, please let me know if any more changes are needed.

axlearn/common/flash_attention/neuron_attention.py

axlearn/common/flash_attention/neuron_attention_test.py

apoorvtintin · 2025-01-24T22:02:09Z

I rebased the PR to avoid merge conflicts, can I please get a new approval? Thank you!

axlearn/common/flash_attention/neuron_attention_test.py

kelvin-zou

@apoorvtintin I see quite a few unit tests failed, can you take a look?

Ruixuan · 2025-02-05T22:35:28Z

can we in the file, disable the pytype check with annotation like # pytype: disable=import-error

FAILED: /home/runner/work/axlearn/axlearn/.pytype/pyi/axlearn/common/flash_attention/neuron_attention.pyi
/opt/hostedtoolcache/Python/3.10.16/x64/bin/python -m pytype.single --imports_info /home/runner/work/axlearn/axlearn/.pytype/imports/axlearn.common.flash_attention.neuron_attention.imports --module-name axlearn.common.flash_attention.neuron_attention --platform linux -V 3.10 -o /home/runner/work/axlearn/axlearn/.pytype/pyi/axlearn/common/flash_attention/neuron_attention.pyi --analyze-annotated --nofail --quick /home/runner/work/axlearn/axlearn/axlearn/common/flash_attention/neuron_attention.py
File "/home/runner/work/axlearn/axlearn/axlearn/common/flash_attention/neuron_attention.py", line 9, in : Can't find module 'jax_neuronx'. [import-error]
File "/home/runner/work/axlearn/axlearn/axlearn/common/flash_attention/neuron_attention.py", line 10, in : Can't find module 'neuronxcc'. [import-error]
File "/home/runner/work/axlearn/axlearn/axlearn/common/flash_attention/neuron_attention.py", line 12, in : Can't find module 'neuronxcc.nki.kernels.attention'. [import-error]

For more details, see https://google.github.io/pytype/errors.html#import-error
[10/540] check axlearn.cloud.common.types
[11/540] infer axlearn.cloud.common.config
[12/540] check axlearn.common.utils
ninja: build stopped: subcommand failed.
Computing dependencies
Analyzing 535 sources with 0 local dependencies
Leaving directory '.pytype'
Error: Process completed with exit code 1.

apoorvtintin · 2025-02-06T04:57:14Z

@Ruixuan I updated the PR with # pytype: disable=import-error and verified that pytype passes. Can I get a new approval and re-trigger the CI? Thank you!

axlearn/common/flash_attention/neuron_attention.py

axlearn/common/flash_attention/neuron_attention_test.py

axlearn/common/flash_attention/neuron_attention.py

axlearn/common/flash_attention/neuron_attention_test.py

axlearn/common/flash_attention/utils.py

axlearn/common/flash_attention/neuron_attention.py

axlearn/common/flash_attention/neuron_attention_test.py

apoorvtintin · 2025-02-07T02:46:58Z

Resolved all comments and fixed CI failures, @Ruixuan @kelvin-zou @ruomingp can we re-trigger the CI and merge this?

apoorvtintin requested review from ruomingp, markblee and a team as code owners January 21, 2025 23:15

apoorvtintin mentioned this pull request Jan 21, 2025

Flash Attention for Neuron #883

Closed

kelvin-zou reviewed Jan 22, 2025

View reviewed changes

axlearn/common/flash_attention/neuron_attention.py Show resolved Hide resolved

axlearn/common/flash_attention/neuron_attention.py Outdated Show resolved Hide resolved

ruomingp reviewed Jan 22, 2025

View reviewed changes

apoorvtintin force-pushed the mainline_upstream_fa branch 4 times, most recently from 8a92182 to 73a2808 Compare January 23, 2025 02:32

kelvin-zou reviewed Jan 23, 2025

View reviewed changes

apoorvtintin requested review from ruomingp and kelvin-zou January 23, 2025 02:39

ruomingp approved these changes Jan 23, 2025

View reviewed changes

axlearn/common/flash_attention/neuron_attention.py Outdated Show resolved Hide resolved

apivovarov reviewed Jan 23, 2025

View reviewed changes

axlearn/common/flash_attention/neuron_attention.py Outdated Show resolved Hide resolved

apivovarov reviewed Jan 23, 2025

View reviewed changes

axlearn/common/flash_attention/neuron_attention_test.py Outdated Show resolved Hide resolved

apoorvtintin force-pushed the mainline_upstream_fa branch from 73a2808 to c226d03 Compare January 24, 2025 22:00

kelvin-zou approved these changes Jan 24, 2025

View reviewed changes

markblee reviewed Jan 28, 2025

View reviewed changes

axlearn/common/flash_attention/neuron_attention_test.py Outdated Show resolved Hide resolved

kelvin-zou reviewed Jan 28, 2025

View reviewed changes

apoorvtintin force-pushed the mainline_upstream_fa branch 2 times, most recently from 42720ad to f7f06fd Compare January 30, 2025 16:16

apoorvtintin force-pushed the mainline_upstream_fa branch 3 times, most recently from 925e0fe to 2d38cb5 Compare February 6, 2025 04:54