Skip to content

Pull requests: apple/axlearn

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Support skipping warmup in cosine schedule.
#697 by xianzhidu was merged Sep 11, 2024 Loading… updated Sep 11, 2024
Generalized top-k gating for MoE.
#679 by xianzhidu was merged Aug 29, 2024 Loading… updated Aug 29, 2024
A few updates to MoE and test_utils
#654 by xianzhidu was merged Aug 16, 2024 Loading… updated Aug 16, 2024
Quick fix
#3 by xianzhidu was merged Jul 19, 2023 Loading… updated Mar 15, 2024
Add l2 norms for the VQ-VAE quantizer.
#2 by xianzhidu was merged Jul 18, 2023 Loading… updated Mar 15, 2024
Fix some permlinks
#5 by xianzhidu was merged Jul 22, 2023 Loading… updated Mar 15, 2024
Enable building StackedTransformerLayer with a sequence of layer cfgs
#27 by xianzhidu was merged Aug 4, 2023 Loading… updated Mar 15, 2024
Support naive mesh in multi-slice env.
#234 by xianzhidu was merged Dec 11, 2023 Loading… updated Dec 11, 2023
Support aux loss in causal_lm.Model
#198 by xianzhidu was merged Nov 26, 2023 Loading… updated Nov 26, 2023
Keep constant LR in cosine schedule
#192 by xianzhidu was merged Nov 22, 2023 Loading… updated Nov 22, 2023
Update set_double_shard_weights_config
#146 by xianzhidu was merged Oct 27, 2023 Loading… updated Oct 27, 2023
Make repeat in RepeatedTransformerLayer configurable.
#109 by xianzhidu was merged Oct 11, 2023 Loading… updated Oct 11, 2023
Make logits partition specs configurable in decoder.
#41 by xianzhidu was merged Aug 23, 2023 Loading… updated Aug 23, 2023
Add comments for the StackedTransformerLayer layer config.
#29 by xianzhidu was merged Aug 6, 2023 Loading… updated Aug 6, 2023
ProTip! What’s not been updated in a month: updated:<2025-03-05.