Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add support different layer order in conformer (apple#568)
* Open-source MoE (#547) * MoE * update * update * update * update * update * update --------- Co-authored-by: Xianzhi Du <[email protected]> * Supports return_aux in PipelinedTransformerLayer. (apple#557) * link fix (#561) Co-authored-by: Xianzhi Du <[email protected]> * GKE GPU A3 with TCPX support (apple#517) * add GKE GPU support to axlearn * add volumes and initContainer * finish up the pod spec * add GKE runner for GPU * extend != append duhh * fix volume mount change * add local queue for internal cluster * move kueue annotation to jobset * introduce gpu container image * add ld_library_path * add env variables for a3 * ensure replicas of jobset is 1 * automatically set distributed coordinator as env variables * change NCCL_DEBUG to warn * install gpu jax using pyproject.toml * address comments from Mark * fix missing sidecar * remove accidental double quote * add default XLA flags * hardcode max step to 1000 * fix sidecar command * fix sidecar termination * allow passing queue name through flag * port over remaining xla flags * only use bare minimum xla flags * address Marks' nit on using env_vars.update * remove flags that make perf worse * remove tpu np provionser from gpu runner * add mesh rule * Revert "hardcode max step to 1000" This reverts commit 8cda4f91414c00deb28c7f15d54d183076101d8b. * add doc for queue flag * fix punctuation and add link to tcpx docs * more puntuation * document NCCL env vars * throw error if GCS mount is set * throw error when pre provisioner is enabled * add testing coverage for GPUGKEJob * add basic gke_runner tests * add more gke_runner tests * address pr comments * add space * fix missing . * add missing space * fix pytype error in job_test.py * update golden configs * [inference] remove unncessary mocks and use absolute imports (apple#563) Co-authored-by: guoli-yin <[email protected]> * Minor style changes. (apple#564) * add support different layer order in conformer * update * address review feedback * fix formatting * fix black --------- Co-authored-by: xianzhi <[email protected]> Co-authored-by: Xianzhi Du <[email protected]> Co-authored-by: Mark Lee <[email protected]> Co-authored-by: Sam Stoelinga <[email protected]> Co-authored-by: Guoli Yin <[email protected]> Co-authored-by: guoli-yin <[email protected]> Co-authored-by: Yongqiang Wang <[email protected]>
- Loading branch information