Skip to content

Commit

Permalink
Update cutlass submodule to stable 3.1 from RC (pytorch#104638)
Browse files Browse the repository at this point in the history
CUTLASS was on a release candidate previously this updates it up to stable with a few additional fixes.
Pull Request resolved: pytorch#104638
Approved by: https://github.com/ezyang
  • Loading branch information
Skylion007 authored and pytorchmergebot committed Jul 6, 2023
1 parent 2252096 commit c3f29ed
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion third_party/cutlass
Submodule cutlass updated 71 files
+7 −1 CHANGELOG.md
+2 −0 CUDA.cmake
+1 −1 PUBLICATIONS.md
+8 −1 README.md
+9 −1 examples/13_two_tensor_op_fusion/threadblock/b2b_mma_multistage.h
+9 −1 examples/13_two_tensor_op_fusion/threadblock/b2b_mma_multistage_smem_accumulator.h
+4 −7 examples/45_dual_gemm/threadblock/dual_mma_multistage.h
+4 −0 examples/47_ampere_gemm_universal_streamk/CMakeLists.txt
+3 −3 examples/47_ampere_gemm_universal_streamk/ampere_gemm_universal_streamk.cu
+653 −0 examples/47_ampere_gemm_universal_streamk/ampere_gemm_universal_streamk_broadcast.cu
+1 −1 include/cute/algorithm/tuple_algorithms.hpp
+19 −15 include/cute/arch/util.hpp
+3 −3 include/cute/atom/mma_atom.hpp
+1 −1 include/cute/config.hpp
+15 −3 include/cute/container/cuda_types.hpp
+37 −50 include/cutlass/epilogue/collective/builders/sm90_builder.inl
+3 −0 include/cutlass/epilogue/collective/default_epilogue.hpp
+60 −36 include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized.hpp
+214 −93 include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized_bias_elementwise.hpp
+3 −3 include/cutlass/epilogue/thread/activation.h
+1 −1 include/cutlass/epilogue/thread/linear_combination_bias_elementwise.h
+183 −0 include/cutlass/epilogue/threadblock/default_epilogue_tensor_op_row_broadcast.h
+62 −0 include/cutlass/epilogue/threadblock/default_epilogue_with_broadcast.h
+443 −0 include/cutlass/epilogue/threadblock/epilogue_streamk_with_broadcast.h
+178 −0 include/cutlass/epilogue/threadblock/epilogue_with_broadcast.h
+519 −0 include/cutlass/epilogue/threadblock/predicated_tile_iterator_row_broadcast.h
+514 −0 include/cutlass/gemm/device/gemm_sparse_row_broadcast.h
+21 −21 include/cutlass/gemm/device/gemm_universal_adapter.h
+386 −0 include/cutlass/gemm/device/gemm_universal_streamk_with_broadcast.h
+22 −9 include/cutlass/gemm/device/gemv.h
+0 −167 include/cutlass/gemm/device/gemv_strided_batched.h
+19 −7 include/cutlass/gemm/gemm.h
+191 −0 include/cutlass/gemm/kernel/default_gemm_sparse_row_broadcast.h
+146 −0 include/cutlass/gemm/kernel/default_gemm_streamk_with_broadcast.h
+2,405 −0 include/cutlass/gemm/kernel/gemm_streamk_with_fused_epilogue.h
+375 −26 include/cutlass/gemm/kernel/gemv.h
+0 −368 include/cutlass/gemm/kernel/gemv_strided_batched.h
+4 −8 include/cutlass/gemm/kernel/sm70_gemm.hpp
+8 −11 include/cutlass/gemm/kernel/sm90_gemm_tma.hpp
+7 −9 include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized.hpp
+7 −9 include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp
+12 −14 include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp
+3 −3 include/cutlass/gemm/kernel/sm90_tile_scheduler.hpp
+400 −0 include/cutlass/gemm/kernel/sparse_gemm_row_broadcast.h
+6 −0 include/cutlass/gemm/threadblock/ell_mma_multistage.h
+5 −0 include/cutlass/gemm/threadblock/mma_blas3_multistage.h
+4 −6 include/cutlass/gemm/threadblock/mma_layernorm_mainloop_fusion_multistage.h
+4 −6 include/cutlass/gemm/threadblock/mma_multistage.h
+6 −0 include/cutlass/gemm/threadblock/mma_planar_complex_multistage.h
+5 −0 include/cutlass/gemm/threadblock/mma_softmax_mainloop_fusion_multistage.h
+6 −0 include/cutlass/gemm/threadblock/mma_sparse_multistage.h
+4 −6 include/cutlass/gemm/threadblock/mma_with_reduction_multistage.h
+99 −83 include/cutlass/numeric_conversion.h
+2 −2 include/cutlass/transform/threadblock/predicated_tile_access_iterator.h
+2 −2 media/docs/cutlass_3x_backwards_compatibility.md
+1 −1 media/docs/gemm_api.md
+1 −1 media/docs/layout.md
+16 −4 media/docs/profiler.md
+0 −11 test/unit/gemm/device/CMakeLists.txt
+19 −0 test/unit/gemm/device/gemm_f16n_f16n_f16t_tensor_op_f32_sparse_sm80.cu
+187 −44 test/unit/gemm/device/gemv.cu
+0 −490 test/unit/gemm/device/gemv_strided_batched.cu
+76 −2 test/unit/gemm/device/sm90_gemm_f16_f16_f16_tensor_op_f32_cluster_warpspecialized_cooperative.cu
+44 −0 ...nit/gemm/device/sm90_gemm_f16_f16_f16_tensor_op_f32_cluster_warpspecialized_cooperative_bias_elementwise.cu
+102 −16 test/unit/gemm/device/sm90_gemm_f16_f16_f16_tensor_op_f32_cluster_warpspecialized_pingpong.cu
+44 −0 test/unit/gemm/device/sm90_gemm_f16_f16_f16_tensor_op_f32_cluster_warpspecialized_pingpong_bias_elementwise.cu
+144 −0 test/unit/gemm/device/sm90_gemm_s8_s8_s8_tensor_op_s32.cu
+21 −7 test/unit/gemm/device/testbed_sparse.h
+56 −20 tools/library/scripts/generator.py
+0 −352 tools/library/src/reference/gemm.cu
+15 −2 tools/profiler/src/cublas_helpers.cu

0 comments on commit c3f29ed

Please sign in to comment.