Adds `convert_dense_to_moe_parameters` . #584

ruomingp · 2024-07-14T02:53:38Z

... to support upcycling of dense to MoE models.

…ers`.

…get sharding.

…ones.

xianzhidu

Thanks, Ruoming!

* Adds `convert_dense_to_sparse_parameters`. * Adds `convert_dense_to_moe_parameters`. * Removes `target_layer` from the args of `convert_dense_to_moe_parameters`. * Makes `convert_dense_to_moe_parameters` use pjit to get the right target sharding. * Adds comments to `convert_dense_to_moe_parameters`. * Simplifies the conversion logic. * Use `einsum` instead of `tile` to replicate the dense weights to MoE ones. * Adds testing that the dense weights are replicated to each expert. * Adds explicit sharding constraint on the dispatch tensor. * Addresses Xianzhi's review.

ruomingp force-pushed the rpang_moe branch 2 times, most recently from 4da0333 to 337e90e Compare July 16, 2024 14:26

ruomingp added 10 commits July 17, 2024 21:56

Adds convert_dense_to_sparse_parameters.

29f8054

Adds convert_dense_to_moe_parameters.

b8c6f26

Removes target_layer from the args of `convert_dense_to_moe_paramet…

7e135e3

…ers`.

Makes convert_dense_to_moe_parameters use pjit to get the right tar…

d15ef0c

…get sharding.

Adds comments to convert_dense_to_moe_parameters.

1d5c549

Simplifies the conversion logic.

458fb3c

Use einsum instead of tile to replicate the dense weights to MoE …

bbdf3e6

…ones.

Adds testing that the dense weights are replicated to each expert.

9e65046

Adds explicit sharding constraint on the dispatch tensor.

3dc7c3f

Addresses Xianzhi's review.

deab2b3

ruomingp force-pushed the rpang_moe branch from 337e90e to deab2b3 Compare July 18, 2024 01:56

ruomingp marked this pull request as ready for review July 19, 2024 15:06

ruomingp requested a review from markblee as a code owner July 19, 2024 15:06

ruomingp requested review from xianzhidu and dunan July 19, 2024 15:06

ruomingp changed the title ~~Adds convert_dense_to_sparse_parameters.~~ Adds convert_dense_to_moe_parameters . Jul 19, 2024

ruomingp enabled auto-merge July 19, 2024 15:08

xianzhidu approved these changes Jul 19, 2024

View reviewed changes

ruomingp disabled auto-merge July 20, 2024 11:19

ruomingp enabled auto-merge July 20, 2024 11:19

ruomingp disabled auto-merge July 21, 2024 01:11

ruomingp merged commit 23e587f into apple:main Jul 21, 2024
4 checks passed

ruomingp deleted the rpang_moe branch July 21, 2024 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds `convert_dense_to_moe_parameters` . #584

Adds `convert_dense_to_moe_parameters` . #584

ruomingp commented Jul 14, 2024 •

edited

Loading

xianzhidu left a comment

Adds convert_dense_to_moe_parameters . #584

Adds convert_dense_to_moe_parameters . #584

Conversation

ruomingp commented Jul 14, 2024 • edited Loading

xianzhidu left a comment

Choose a reason for hiding this comment

Adds `convert_dense_to_moe_parameters` . #584

Adds `convert_dense_to_moe_parameters` . #584

ruomingp commented Jul 14, 2024 •

edited

Loading