-
Notifications
You must be signed in to change notification settings - Fork 304
Pull requests: apple/axlearn
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Support skipping warmup in cosine schedule.
#697
by xianzhidu
was merged Sep 11, 2024
Loading…
updated Sep 11, 2024
Generalized top-k gating for MoE.
#679
by xianzhidu
was merged Aug 29, 2024
Loading…
updated Aug 29, 2024
A few updates to MoE and test_utils
#654
by xianzhidu
was merged Aug 16, 2024
Loading…
updated Aug 16, 2024
Add l2 norms for the VQ-VAE quantizer.
#2
by xianzhidu
was merged Jul 18, 2023
Loading…
updated Mar 15, 2024
Enable building StackedTransformerLayer with a sequence of layer cfgs
#27
by xianzhidu
was merged Aug 4, 2023
Loading…
updated Mar 15, 2024
Support naive mesh in multi-slice env.
#234
by xianzhidu
was merged Dec 11, 2023
Loading…
updated Dec 11, 2023
Support aux loss in causal_lm.Model
#198
by xianzhidu
was merged Nov 26, 2023
Loading…
updated Nov 26, 2023
Keep constant LR in cosine schedule
#192
by xianzhidu
was merged Nov 22, 2023
Loading…
updated Nov 22, 2023
Update set_double_shard_weights_config
#146
by xianzhidu
was merged Oct 27, 2023
Loading…
updated Oct 27, 2023
Make repeat in RepeatedTransformerLayer configurable.
#109
by xianzhidu
was merged Oct 11, 2023
Loading…
updated Oct 11, 2023
Make logits partition specs configurable in decoder.
#41
by xianzhidu
was merged Aug 23, 2023
Loading…
updated Aug 23, 2023
Add comments for the StackedTransformerLayer layer config.
#29
by xianzhidu
was merged Aug 6, 2023
Loading…
updated Aug 6, 2023
ProTip!
What’s not been updated in a month: updated:<2025-03-05.