Pulse · openxla/xla · GitHub

November 13, 2024 – December 13, 2024

Overview

1,094 Active pull requests

23 Active issues

764 Pull requests merged by 2 people

Automated Code Change
#20461 merged Dec 14, 2024
Configure HloPjRtTestBase with new option structs.
#20452 merged Dec 14, 2024
Extend use_parameter_layout_on_device option to ExecuteReplicated.
#20441 merged Dec 14, 2024
Integrate LLVM at llvm/llvm-project@a21f9bfe29c2
#20538 merged Dec 14, 2024
Remove a bunch of #if CUDA macro use in xla_compile_lib.
#20331 merged Dec 14, 2024
Improve compilation time by not fusing large constants into LLVM modules for XLA::CPU.
#20257 merged Dec 14, 2024
Swap output format of "MatchTrivialLoopRange" to better align with Range usage.
#20454 merged Dec 13, 2024
timespan, xplane_visitor: Support operator<=.
#20405 merged Dec 13, 2024
[XLA] Generalize type handling within InProcessCollectives
#20498 merged Dec 13, 2024
Migrate some TSL code over to ABSL equivalents
#20457 merged Dec 13, 2024
[xla:cpu] Add parallel loop runner
#20529 merged Dec 13, 2024
Stop using redzone_allocator in autotuner_util.
#20520 merged Dec 13, 2024
Add new ml-build-rbe container to the configurations of RBE used with remote config.
#20459 merged Dec 13, 2024
Integrate LLVM at llvm/llvm-project@bc29fc937c6c
#20511 merged Dec 13, 2024
Move AutotunerUtil::CreateBuffer from a static method to a method on RedzoneAllocator.
#20517 merged Dec 13, 2024
[XLA] Add some traceme annotations around XLA:CPU compilation and CPU compiler stack trace logging.
#20312 merged Dec 13, 2024
[XLA:MSA] Allow cross-program prefetch for buffers that are already pinned to alternate memory.
#20316 merged Dec 13, 2024
Replace TSL's BlockingCounter with absl's.
#20522 merged Dec 13, 2024
Add TF package import tests in CPU presubmit, continuous and nightly jobs.
#20493 merged Dec 13, 2024
Add has_megacore and has_merged_vmem in XPlane stats.
#20491 merged Dec 13, 2024
[StableHLO] Add shape refinement callback to specify additional patterns.
#20321 merged Dec 13, 2024
[XLA] Return the number of overlapping chunks instead of chunks themselves for tracking outstanding prefetches/evictions
#20438 merged Dec 13, 2024
[IFRT] Add option to compile IFRT IR atom programs using Sdy
#20521 merged Dec 13, 2024
[XLA:Collective] Add utility functions.
#20038 merged Dec 13, 2024
[XLA:Python] Use &PyArray_Type rather than looking up numpy.ndarray via Python attrs.
#20513 merged Dec 13, 2024
PR #20109: Create a workflow to run CPU benchmarks
#20475 merged Dec 13, 2024
Reenable buildifier for all files under xla/, fix warnings
#19866 merged Dec 13, 2024
Remove unneeded #ifdef'ed dependency.
#20328 merged Dec 13, 2024
[xla:python] Add support for stateful FFI calls registered via Python.
#16789 merged Dec 13, 2024
Integrate LLVM at llvm/llvm-project@5e53a8dadb00
#20507 merged Dec 13, 2024
[XLA:CPU] Add shape method python binding to Literal
#20466 merged Dec 13, 2024
Replace xla proto dependency to make TF happy
#20506 merged Dec 13, 2024
[XLA:GPU] Readability and performance nits.
#20473 merged Dec 13, 2024
Automated Code Change
#20504 merged Dec 13, 2024
[XLA:GPU] Clean up TF_RET_CHECKs in TritonFusionAnalysis::ExecuteForDotFusion.
#20505 merged Dec 13, 2024
Automated Code Change
#20416 merged Dec 13, 2024
Use absl::Mutex::Await() instead of absl::CondVar::Wait() in XLA.
#19957 merged Dec 13, 2024
Automated Code Change
#20418 merged Dec 13, 2024
Automated Code Change
#20468 merged Dec 13, 2024
Fix range analysis bug.
#20479 merged Dec 13, 2024
[XLA] Guarantee ordering of infeeds/outfeeds across called computations
#19827 merged Dec 13, 2024
Add dynamic_arg_layouts to C++ cache and add a test in JAX which checks for cache miss if layouts of inputs arguments are different to the same jitted function.
#20484 merged Dec 13, 2024
[XLA:Python] Mark from_python and from_cpp methods of nanobind typecasters as noexcept.
#20482 merged Dec 13, 2024
[XLA:GPU] add execution tests for NCCL group with partially pipelined send/recv instructions
#20178 merged Dec 13, 2024
Fix an issue in PartitionGatherTrivialSlicedOperandDimensions when handling out-of-bound indices.
#20401 merged Dec 13, 2024
Make ReshapeOp return MHLO_AnyTensor instead of MHLO_StaticShapeTensor.
#18964 merged Dec 13, 2024
Adds a "SHARDING" ProfileType to HloModuleProto.
#20283 merged Dec 13, 2024
Implement flatten one level with keys in C++ and use it for the prefix/equality error printing.
#20044 merged Dec 13, 2024
Switch multihost runner to public XLA:GPU target
#20486 merged Dec 13, 2024
Add a tuple sharding when creating get-tuple-element(tuple(single_result)).
#20126 merged Dec 13, 2024
Update py_import macros for the ability to unpack additional wheels in the same folder as the main wheel.
#20089 merged Dec 13, 2024
Update the Windows Docker CI image to include the C++ ATL library.
#20487 merged Dec 12, 2024
Make CompiledMemoryStats::ToProto() const.
#20458 merged Dec 12, 2024
#sdy add option to avoid escaping attribute when adding to frontend attrs.
#20472 merged Dec 12, 2024
Integrate LLVM at llvm/llvm-project@0876c11ceeb0
#20465 merged Dec 12, 2024
[XLA:GPU][Emitters] Move gpu/fusions/ir to backends/gpu/codegen/ir
#20471 merged Dec 12, 2024
[XLA:GPU] Propagate all profiling failures to gemm_fusion_autotuner.cc.
#20383 merged Dec 12, 2024
PR #20463: Updated multiple typo's
#20470 merged Dec 12, 2024
Automated Code Change
#20411 merged Dec 12, 2024
[XLA:GPU][NFC] Modularize a little bit gpu_hlo_schedule.cc.
#20430 merged Dec 12, 2024
PR #19099: [XLA:CPU][oneDNN] Add post-ops for oneDNN Convolutions
#20445 merged Dec 12, 2024
[XLA:GPU] Remove restriction on bitcasts being a no-op with regards to tiling
#20433 merged Dec 12, 2024
[XLA:GPU] Deprecate diamond chains in SoftmaxRewriterTriton.
#20431 merged Dec 12, 2024
[XLA:GPU] Rollback introduction of EmitterLocOpBuilder.
#20464 merged Dec 12, 2024
[XLA:CPU] Implement ElementalKernelEmitter
#20378 merged Dec 12, 2024
#sdy Add unique module name in Shardy dumps
#20436 merged Dec 12, 2024
Fix build failure in nvjitlink_impl.cc
#20460 merged Dec 12, 2024
Integrate LLVM at llvm/llvm-project@19bc282320ba
#20450 merged Dec 12, 2024
Automated Code Change
#20285 merged Dec 12, 2024
[XLA:GPU] Add a nested builder arg to the EmitXlaLoop builder.
#20443 merged Dec 12, 2024
Automated Code Change
#20362 merged Dec 12, 2024
Automated Code Change
#20361 merged Dec 12, 2024
Skip inserting into frontend attr if the (key, value) pair already exists and override if key exists
#20255 merged Dec 12, 2024
[Mosaic] Pad trailing transposes chunks with zeros.
#20390 merged Dec 12, 2024
Split ROCm-specific backend calls into their own targets.
#20337 merged Dec 12, 2024
[xla-auto-sharding] Fix potential dangling pointer (reference) bug.
#20446 merged Dec 12, 2024
[xla:cpu] Add missing files from openxla/xla#16438
#20447 merged Dec 11, 2024
[XLA:GPU] Schedule send/recv early if pipeline parallelism ops enabled
#20403 merged Dec 11, 2024
Internal CI/CD change
#20212 merged Dec 11, 2024
[XLA] Avoid redundant lookup in ConsumeResource
#20434 merged Dec 11, 2024
Remove the test_hlo_pjrt_runner tag.
#20392 merged Dec 11, 2024
Add B100 to default Nvidia gpu backends
#20407 merged Dec 11, 2024
Migrate broadcast_test to always use PjRt for its test backend.
#20345 merged Dec 11, 2024
Add a default error spec field to HloRunnerAgnosticTestBase.
#20397 merged Dec 11, 2024
[XLA:CPU] Benchmark for grouped strided convolutions
#20425 merged Dec 11, 2024
[XLA GPU] Add additional unit tests for IsPtxRegisterAllocationError.
#20424 merged Dec 11, 2024
Integrate LLVM at llvm/llvm-project@eacdbc269e5f
#20420 merged Dec 11, 2024
[XLA:GPU] Implement NcclRaggedAllToAllThunk.
#20265 merged Dec 11, 2024
Fix infinite loop in TopKSplitter
#20422 merged Dec 11, 2024
PR #20214: Evaluate simple offset values, if possible
#20303 merged Dec 11, 2024
PR #20313: Fix async wrapper to walk child computations
#20421 merged Dec 11, 2024
#sdy Swap XLA Shardy passes to use StableHLO instead of MHLO as much as possible.
#19939 merged Dec 11, 2024
Internal: add missing dependency on numpy
#20298 merged Dec 11, 2024
[XLA:GPU] Use absl::Status payload to more precisely identify register allocation errors.
#20396 merged Dec 11, 2024
[XLA:CPU] Use KernelApiIrBuilder in IrEmitter2
#20380 merged Dec 11, 2024
[XLA:GPU] Introduce EmitterLocOpBuilder that could annotate the mlir with the file:line annotations that are visible in the triton dump
#19472 merged Dec 11, 2024
[XLA:CPU] Add new KernelApiIrBuilder
#20379 merged Dec 11, 2024
PR #20334: [nfc] clang-format is failing on unrelated PRs because of this
#20419 merged Dec 11, 2024
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#20347 merged Dec 11, 2024
[XLA:GPU] Decrease VLOG levels to start logging at level 2 in softmax_rewriter_triton.cc.
#20415 merged Dec 11, 2024
fix audit wheel compliance issues for pywrap rules
#20356 merged Dec 11, 2024
[XLA:LatencyHidingScheduler] Fix crash with non-standard async ops whose done op does not consume the respective start op and that they might have a reverse data dependency (e.g., done -> ops -> start).
#20408 merged Dec 11, 2024
Add LayoutModeToXlaShape util to header so that users can get xla::Shape with layout without an XlaComputation.
#20409 merged Dec 11, 2024
Cleanup inconsistent names/comments
#20406 merged Dec 11, 2024
Remove unused ErrorSpec.
#20400 merged Dec 11, 2024
[XLA:GPU] Guard send/recv schedule manipulation behind xla_gpu_enable_pipelined_p2p flag
#20382 merged Dec 10, 2024
Migrate all_reduce_test to always use PjRt for its test backend.
#20344 merged Dec 10, 2024
Replace std::string_view with absl::string_view
#20349 merged Dec 10, 2024
Respect DeviceAssignment in HloRunnerPjRt.
#20389 merged Dec 10, 2024
Migrate gather_operation_test to always use PjRt for its test backend.
#20339 merged Dec 10, 2024
[HLO->MHLO] Consolidate non-pipelined async ops into MHLO ops.
#20309 merged Dec 10, 2024
Add implicit device step tracking.
#20261 merged Dec 10, 2024
Migrate copy_test to always use PjRt for its test backend.
#20343 merged Dec 10, 2024
[XLA:GPU] Remove --xla_gpu_experimental_enable_triton_softmax_priority_fusion.
#20384 merged Dec 10, 2024
Set implicitTrunc on APInt creation
#20387 merged Dec 10, 2024
Integrate LLVM at llvm/llvm-project@0f7b3a9407d2
#20377 merged Dec 10, 2024
Replace std::string_view with absl::string_view
#20322 merged Dec 10, 2024
Add MHLO mhlo.custom_call @ragged_all_to_all -> HLO RaggedAllToAll pass
#20354 merged Dec 10, 2024
[XLA:CPU] Update ShapeToIrType & PrimitiveTypeToIrType to take a LLVMContext
#20381 merged Dec 10, 2024
[Cleanup] Use HloPredicateIs(Not)Op
#19734 merged Dec 10, 2024
[xla:gpu] Extracted CreateTritonPipeline into a separate target
#20367 merged Dec 10, 2024
[xla:cpu] Add an object pool for efficient xnnpack object pooling
#20307 merged Dec 10, 2024
Add test_migrated_to_hlo_runner_pjrt tag to xla_test.
#20338 merged Dec 10, 2024
Simplify GetGatherScatterOperandPassthroughDims since offset_dims or inserted_window_dims are sorted in gather/scatter operations.
#20335 merged Dec 10, 2024
[xla-auto-sharding] Add BRKGA heuristic as an XLA auto-sharding option.
#20330 merged Dec 10, 2024
Integrate LLVM at llvm/llvm-project@be2df95e9281
#20365 merged Dec 10, 2024
Automated Code Change
#20364 merged Dec 10, 2024
PR #16438: aarch64: implement onednn matmul operator with explicit reorders
#20369 merged Dec 10, 2024
[XLA:GPU] Disable cutlass dynamic-update-slice rewrite on V100.
#20366 merged Dec 10, 2024
PR #20025: [NVIDIA GPU] LHS enhancement for collective multi-streaming
#20213 merged Dec 10, 2024
[XLA:GPU] Use absl instead of tensorflow functions/types.
#20311 merged Dec 10, 2024
Add debug option for failing the PTX compilation on register spilling
#20358 merged Dec 10, 2024
Automated Code Change
#20290 merged Dec 10, 2024
Replace std::string_view with absl::string_view
#20353 merged Dec 10, 2024
[XLA] Don't bail when encountering complex loop pipelining patterns
#20258 merged Dec 10, 2024
Add an interpreter PjRt client registry for testing.
#20336 merged Dec 10, 2024
Add a new test base class for a default PjRt test runner w/ SE interpreter.
#20320 merged Dec 10, 2024
Add RegisterMlirToHloDependentDialects to register required dependent dialects
#20318 merged Dec 10, 2024
Register WhileLoopAllReduceCodeMotion pass to the opt tool
#20342 merged Dec 10, 2024
Create OpStatsToRooflineModel, in preparation of Roofline Model creation
#20280 merged Dec 10, 2024
IFRT proxy: Add profiler spans to all entrypoints at the client.
#20325 merged Dec 9, 2024
[XLA] Fix latency hiding scheduler when faced with annotated no-op instructions.
#20324 merged Dec 9, 2024
Modify XlaOp Exp to accept result accuracy as an argument. We want to be able to select implementation of exp depending on this config.
#19820 merged Dec 9, 2024
Stop using DISABLED_ON_GPU_ROCM for GPU tests, and instead just use GTEST_SKIP with a runtime check for ROCm.
#20323 merged Dec 9, 2024
Add CPU specific passes for hlo-opt tool.
#20154 merged Dec 9, 2024
[Cleanup] Use HloPredicateIs(Not)Op
#20012 merged Dec 9, 2024
Split up cusolver_context into CUDA-specific and ROCM-specific parts.
#20036 merged Dec 9, 2024
[XLA:GPU] Remove unused xla_experimental_exec_time_optimization_effort flag.
#20297 merged Dec 9, 2024
Add support for CUDA 12.6.3 and CUDNN 9.5.1/9.6.0.
#20310 merged Dec 9, 2024
[xla:gpu] Removed redundant parameter from CompileTritonToLLVM
#20301 merged Dec 9, 2024
[XLA:CPU] Add a Python extension for KernelRunner.
#20196 merged Dec 9, 2024
[XLA:Python] Use nanobind::isinstance from upstream nanobind, delete xla::nb_isinstance.
#20250 merged Dec 9, 2024
Reland of PR #19571. Fix test FunctionalHloRunnerTest.ShardedAutotuningWorks
#20197 merged Dec 9, 2024
[XLA:GPU] Use absl::Microseconds instead of doing duration arithmetic.
#20305 merged Dec 9, 2024
Automated Code Change
#20194 merged Dec 9, 2024
PR #18989: [AllGatherCSE] Add a pass that CSEs all-gathers on parameters.
#20300 merged Dec 9, 2024
PR #20241: Updated Typo's in multiple documents
#20291 merged Dec 9, 2024
[xla] Update warnings.bazelrc
#20299 merged Dec 9, 2024
Reverts 26df9b97719020df83e882b31bbe4a7f2cbbdff5
#20293 merged Dec 9, 2024
Automated Code Change
#20289 merged Dec 9, 2024
Automated Code Change
#20286 merged Dec 8, 2024
Fix the heuristic for extending events in derived timeline to have a 2x(previous event's duration) threshold for the gap for TPU sessions.
#20163 merged Dec 7, 2024
Delete HloUnaryInstruction used in CreateUnary. Instead make result_accuracy a Rare field in HloInstruction and optionally set the field in CreateUnary.
#20165 merged Dec 7, 2024
Add a public UpdateEntryComputationLayout method
#20262 merged Dec 7, 2024
[xla:cpu] Add xnnpack dependency to xla:cpu runtime
#20264 merged Dec 7, 2024
Add ability to disable TargetConfig metadata for se_gpu_pjrt_client.
#20275 merged Dec 6, 2024
Allow platform-specific relaxation of fusion restrictions on in-place update ops.
#20273 merged Dec 6, 2024
Add method for HloRunnerAgnosticTestBase implementations to preprocess modules.
#20229 merged Dec 6, 2024
[XLA:GPU:ROCm] Restore threads per warp behavior
#20270 merged Dec 6, 2024
[Cleanup] Do not std::move on return
#19735 merged Dec 6, 2024
hlo_original_value: Don't blow up when printing empty values.
#20266 merged Dec 6, 2024
[Cleanup] Use HloPredicateIs(Not)Op
#19730 merged Dec 6, 2024
Remove the use of TENSORFLOW_USE_ROCM from convolution_thunk.cc.
#20259 merged Dec 6, 2024
[tsl] Deprecate tsl::mutex and tsl::condition_variable, make tsl::Condition an alias of absl::Condition
#20239 merged Dec 6, 2024
Integrate StableHLO at openxla/stablehlo@b3d3cacd
#20263 merged Dec 6, 2024
[xla-auto-sharding] Add SolveGreedy() heuristic to make local greedy choices.
#20260 merged Dec 6, 2024
[XLA] Move the scheduling annotation from fused instruction to the caller if the caller is not annotated. Return error if fused instructions contain ops with different annotation ids.
#20034 merged Dec 6, 2024
[Cleanup] Use HloPredicateIs(Not)Op to unify opcode checking across XLA
#19731 merged Dec 6, 2024
#sdy Fix MHLO<->HLO translation bug with multi result host offload functions.
#20130 merged Dec 6, 2024
Allow programatic override of the default values for the gcs file system cache
#20254 merged Dec 6, 2024
Re-enable current StableHLO current version attribute in PJRT
#20234 merged Dec 6, 2024
[Cleanup] Use HloPredicateIs(Not)Op
#19729 merged Dec 6, 2024
[StableHLO] Refactor XlaCallModule to use more upstream StableHLO machinery.
#19132 merged Dec 6, 2024
[Cleanup] Use HloPredicateIs(Not)Op
#19732 merged Dec 6, 2024
[XLA:GPU] Drop unnecessary bitcast from the chain convert(s4)->bitcast->dot operation
#20117 merged Dec 6, 2024
[xla-auto-sharding] Generalize multilevel flag to heuristic solver option.
#20225 merged Dec 6, 2024
Fix type stub for register_node to note that to_iterable_with_keys is optional.
#20251 merged Dec 6, 2024
Exclude more broken tilings from Triton exhaustive autotuning
#20222 merged Dec 6, 2024
Integrate LLVM at llvm/llvm-project@2ccf7ed277df
#20246 merged Dec 6, 2024
[XLA:GPU] Add a test for RaggedAllToAll that runs on 8 GPUs.
#20139 merged Dec 6, 2024
[xla:ffi] Add num_threads() API to external FFI thread pool
#20247 merged Dec 6, 2024
Reverts 120fbf9e88b159bed64addb4da6de08b3c6ea5bc
#20244 merged Dec 6, 2024
Integrate Triton up to [a69ebfaa](https://github.com/openai/triton/commits/a69ebfaa0832281eabca71ee73f643b28dc034ec)
#20184 merged Dec 6, 2024
[XLA:LatencyHidingScheduler] Do not allow target-defined resources to have a concurrency limit of 0.
#20235 merged Dec 6, 2024
[XLA] Support splitting ragged all-to-all into async start and done.
#20030 merged Dec 6, 2024
Automated Code Change
#20192 merged Dec 6, 2024
[xla:gpu] NFC: Replace all users of NcclApi with GpuCollectives
#20221 merged Dec 6, 2024
Added interface "MatchTrivialLoopRange" to while_loop_analysis to get the range for the induction variable of a loop.
#20157 merged Dec 6, 2024
Demote log from ERROR to VLOG(2).
#20237 merged Dec 6, 2024
Correct the order of arguments in comment for RaggedAllToAll.
#20233 merged Dec 6, 2024
[Cleanup] Use push_back instead of emplace_back where appropriate
#20011 merged Dec 6, 2024
[xla] Add detailed tracing to RendezvousSingle
#20238 merged Dec 6, 2024
[Cleanup] Use absl::StrCat
#20013 merged Dec 6, 2024
Integrate LLVM at llvm/llvm-project@698d83218565
#20230 merged Dec 6, 2024
Remove nsync from TensorFlow
#19931 merged Dec 6, 2024
[xla:collectives] Make NcclApi an alias for GpuCollectives
#20219 merged Dec 6, 2024
Add wrapper for PM Sampling metrics to be added from std::vector<PmSamples> to XLines
#19368 merged Dec 6, 2024
Add minimal test case to collective pipeliner
#20102 merged Dec 6, 2024
Remove obsolete TODOs and the ones associated with closed bug in third_party/tensorflow/compiler/*
#20168 merged Dec 5, 2024
Remove #if TENSORFLOW_USE_ROCM from gpu_executable.cc.
#20166 merged Dec 5, 2024
Use default HloParserOptions in HLO runner.
#20041 merged Dec 5, 2024
Reverts ee92ce1bc50fb94024dc945e08c8c3d7aa0837bc
#20149 merged Dec 5, 2024
Remove #if GOOGLE_CUDA from matmul_utils.cc.
#20161 merged Dec 5, 2024
[hlo-opt] Register gpu passes and add "--list-passes" option
#20104 merged Dec 5, 2024
Add flatten_conditional and conditional_value to HloControlFlowFlattening to allow flattening conditionals
#20045 merged Dec 5, 2024
Integrate LLVM at llvm/llvm-project@fdb90cef75ca
#20215 merged Dec 5, 2024
IFRT Proxy: Make Executable::Delete() and most ::Execute async.
#20173 merged Dec 5, 2024
MHLO defns for a ragged dot that permits ragged batch and contraction.
#19991 merged Dec 5, 2024
[JAX] Add end-to-end execution support in colocated Python API
#19905 merged Dec 5, 2024
Add array interoperability python bindings to xla::Literal
#19915 merged Dec 5, 2024
[xla:gpu] Update custom call config WARN to VLOG
#20211 merged Dec 5, 2024
[XLA:CPU] Add method to benchmark compile times
#20078 merged Dec 5, 2024
Remove if_oss from B100 references
#20208 merged Dec 5, 2024
Enable explicit batch dims of gather/scatter operations in GSPMD. There are two components.
#19810 merged Dec 5, 2024
[xla-auto-sharding] Add SolveRandom() baseline algorithm for a random sharding.
#20204 merged Dec 5, 2024
Reverts 1ac4eddd9ce0db29b37efe996e4309cd7e23ee9c
#20205 merged Dec 5, 2024
Integrate LLVM at llvm/llvm-project@dd7a3d4d798e
#20201 merged Dec 5, 2024
[XLA:GPU] Extend atomic_rmw to support vector updates.
#20199 merged Dec 5, 2024
Add support for Pad operation in Hhost_offload_utils::GetPredecessors().
#20151 merged Dec 5, 2024
[XLA:GPU][Emitters] Create xla_ops dialect for the platform-independent ops.
#20188 merged Dec 5, 2024
Adding dumping functionality for HloUnoptimizedSnapshot.
#20185 merged Dec 5, 2024
Automated Code Change
#20190 merged Dec 5, 2024
[XLA:GPU] Make VLOG explanation in SortRewriter more helpful.
#20198 merged Dec 5, 2024
PR #18616: [XLA:CPU][oneDNN] Refactor code that fuses Add operation with oneDNN primitives
#20200 merged Dec 5, 2024
Automated Code Change
#20191 merged Dec 5, 2024
Integrate LLVM at llvm/llvm-project@71ac1eb50955
#20186 merged Dec 5, 2024
[XLA:CPU] Implement 2D custom algorithm for strided transposed convolutions.
#19779 merged Dec 5, 2024
PR #19913: [ROCm] Do not use fast approximation for exp and log
#20187 merged Dec 5, 2024
Add support for bitcasts that add a degenerate majormost dimension when HostOffloadLegalize moves copies out of host-memory-only offloading.
#20152 merged Dec 5, 2024
Reverts 1000ed534b23a0eba634046a2b4e1cfa3d1d8bc2
#20189 merged Dec 5, 2024
Automated Code Change
#20056 merged Dec 5, 2024
[xla:collectives] NFC: Move all NCCL collectives to Collectives API
#20158 merged Dec 5, 2024
Introduce xla_gpu flag for Dumping HloUnoptimizedSnapshots
#20122 merged Dec 5, 2024
Cleanup. Use the unified GatherScatterDims for operand pass-through dims in gather/scatter instructions.
#20167 merged Dec 5, 2024
[xla:collectives] NFC: Move AllReduce into Collectives API
#20180 merged Dec 5, 2024
Automated Code Change
#20179 merged Dec 5, 2024
[xla:collectives] Move NCCL buffer registration to base Communicator Api
#20147 merged Dec 5, 2024
[xla:collectives] NFC: Move communicator splitting to Collectives API and switch gpu locking to GpuCollectives
#20143 merged Dec 5, 2024
Integrate LLVM at llvm/llvm-project@ce0f11325e0c
#20162 merged Dec 5, 2024
More thorough propagation of host linear layout. Currently linear layout on host
#19267 merged Dec 5, 2024
Fix wheel creation logic when pywrap rules are used
#20176 merged Dec 5, 2024
Replace usage of absl::InlinedVector<int64_t, 1> with DimensionVector for gather/scatter dimensions in hlo_sharding_util.
#20160 merged Dec 5, 2024
[xla:collectives] NFC: Move CommInitRanks to Collectives API
#20140 merged Dec 5, 2024
[XLA:GPU] Pipeline send/recv ops but not *-done ops in experimental PP opts
#20046 merged Dec 5, 2024
[xla][gpu] Order send/recv chains across decomposed collective permutes
#20040 merged Dec 5, 2024
[xla:collectives] NFC: Move Group calls to GpuCollectives
#20177 merged Dec 5, 2024
[xla:collectives] NFC: Move stubs from NcclApiStub to GpuCollectivesStub
#20138 merged Dec 5, 2024
[XLA:LatencyHidingScheduler] Fix issues with scheduling_group_id annotations. Added support for:
#19892 merged Dec 5, 2024
[xla:collectives] NFC: Move NCCL clique locking to collectives component
#20110 merged Dec 5, 2024
Fix some *_main deps to only appear on xla_test targets.
#20148 merged Dec 4, 2024
[XLA:Collective] Add common utility functions into
#20024 merged Dec 4, 2024
Reverts be13efa3fe6e34918cf43d8b5bb5ce51da8849e8
#20146 merged Dec 4, 2024
[XLA] Fix a struct/class tag mismatch in cutlass code.
#20142 merged Dec 4, 2024
[xla-auto-sharding] Add call path for non-MIP heuristic solvers.
#20145 merged Dec 4, 2024
[StableHLO][API] Add API to get StableHLO version from PortableArtifact
#20105 merged Dec 4, 2024
[XLA:GPU] Fix ASAN test failure in layout_assignment_test.
#20144 merged Dec 4, 2024
[XLA:GPU] Introduce xla_gpu_enable_experimental_pipeline_parallelism_opt to guard experimental prototype
#19792 merged Dec 4, 2024
[XLA:GPU] Symbolic tiling: support bitcasts that reduce rank by more than one.
#19438 merged Dec 4, 2024
Remove more unused hlo passes.
#20136 merged Dec 4, 2024
Update clone with new operands to handle ragged all-to-all.
#19459 merged Dec 4, 2024
Remove restriction to disable tensor cores for 8-bit x F32 dots.
#19338 merged Dec 4, 2024
Reshard indices when partitioning gather/scatter along index pass-through dims if the gather output / scatter update has better sharding.
#20107 merged Dec 4, 2024
Add more debug log when Shardy dumping is disabled.
#20073 merged Dec 4, 2024
[XLA:GPU] Consolidate logic to enable expensive optimisation passes.
#20082 merged Dec 4, 2024
Fix CheckAndCanonicalizeMemoryKind when used with non-addressable device list.
#20135 merged Dec 4, 2024
Reverts 0681280cf5f6b71eb09017a51a079553d98e49f2
#20134 merged Dec 4, 2024
Deduplicate getValuesFromDotOperandLayoutStruct function
#20127 merged Dec 4, 2024
Integrate CompilationProvider framework into NVPTXCompiler
#20112 merged Dec 4, 2024
[XLA:GPU][Emitters] Use DeviceDescription in lower_to_llvm.cc.
#20129 merged Dec 4, 2024
Update partitioning method for gather/scatter along implicit batch dimensions.
#20097 merged Dec 4, 2024
[XLA:GPU] Add RaggedAllToAllDecomposer to the GPU compilation pipeline.
#20068 merged Dec 4, 2024
Integrate LLVM at llvm/llvm-project@9c9d4b9e73c1
#20124 merged Dec 4, 2024
Create snapshot for HloUnoptimized module.
#20123 merged Dec 4, 2024
Automated Code Change
#20118 merged Dec 4, 2024
Automated Code Change
#20116 merged Dec 4, 2024
Add TraceMe around HLO passes and pipelines.
#20071 merged Dec 4, 2024
PR #20084: Adds a TraceMe in the blocking while loop.
#20120 merged Dec 4, 2024
Integrate LLVM at llvm/llvm-project@109e4a147faa
#20113 merged Dec 4, 2024
[XLA:GPU] Swap dot operands in certain cases.
#19922 merged Dec 4, 2024
Automated Code Change
#19979 merged Dec 4, 2024
PR #20028: [GPU][NFC] Cleanup horizontal loop fusion.
#20114 merged Dec 4, 2024
Remove ifdefs from ir_emitter_unnested
#20016 merged Dec 4, 2024
Automated Code Change
#20057 merged Dec 4, 2024
Apply layout permutation also to dynamic dimensions.
#20063 merged Dec 4, 2024
[xla:collectives] NFC: Move GetUniqueId to Collectives API
#20095 merged Dec 4, 2024
[xla:collectives] NFC: Move lockable GpuClique to collectives component
#20052 merged Dec 4, 2024
[xla:collectives] NFC: Move NcclApi::CommCount to Communicator API
#20048 merged Dec 4, 2024
Integrate LLVM at llvm/llvm-project@a201ba1b57aa
#20108 merged Dec 4, 2024
[xla:collectives] NFC: Remove unused NcclApi CommFinalize function
#20049 merged Dec 4, 2024
[IFRT] Add custom_options in ifrt::ExecuteOptions
#20103 merged Dec 4, 2024
[xla:collectives] NFC: Move CommAbort to Communicator API
#20050 merged Dec 4, 2024
This CL adds a filter mask to BFCAllocator::AddTraceMe so we can collect only memory TraceMe events.
#19863 merged Dec 4, 2024
Support int4 in most ops on CPUs/GPUs.
#20043 merged Dec 4, 2024
[xla:collectives] NFC: Move communicator error checking to Communicator API
#20051 merged Dec 4, 2024
Move tsl/platform:{subprocess,subprocess_test} to XLA
#20032 merged Dec 4, 2024
[hlo-opt] Register HWI passes and move test files to dedicated test directory
#20100 merged Dec 4, 2024
Fix documentation for CommandBufferCmdEmitter ConvertToCommands
#20098 merged Dec 4, 2024
Add _raw_platform to work around extra platform normalization logic and enable
#20091 merged Dec 4, 2024
Integrate LLVM at llvm/llvm-project@fed3a9b8f81f
#20090 merged Dec 4, 2024
[xla:collectives] NFC: Extract NcclCommunicators into GpuClique and NcclCliqueImpl
#20053 merged Dec 4, 2024
Rename NewHloTestBase to HloRunnerAgnosticTestBase.
#20018 merged Dec 3, 2024
Add inference stats sampler and grouping.
#19855 merged Dec 3, 2024
[IFRT] Add layout_mode attribute to IFRT Array type.
#19933 merged Dec 3, 2024
Integrate StableHLO at openxla/stablehlo@d1db6dfe
#20094 merged Dec 3, 2024
[xla:collectives] NFC: Extract shared Clique from NcclCommunicators
#20054 merged Dec 3, 2024
Factor out test config for better readability
#19802 merged Dec 3, 2024
Remove stale hlo_instruction comment in thunk.h
#20093 merged Dec 3, 2024
Remove obsolete TODO for fixed bug in pjrt_c_api.h.
#19857 merged Dec 3, 2024
Check whether profile info is empty to determine if the module is using profiles.
#20060 merged Dec 3, 2024
[ifrt] Fix tag mismatch for xla::ifrt::CompileOptions.
#20087 merged Dec 3, 2024
Integrate Triton up to [9732c047](https://github.com/openai/triton/commits/9732c04701bd856daca89bde38bafa4636ca56a8)
#19687 merged Dec 3, 2024
[xla:collectives] NFC: Remove NcclCliqueKey alias
#19967 merged Dec 3, 2024
Internal CI/CD change
#20021 merged Dec 3, 2024
[XLA:GPU] Log the error if parsing of Triton IR from custom call fails.
#20076 merged Dec 3, 2024
[XLA:GPU][Emitters] Remove the complex.expm1 approximation.
#19519 merged Dec 3, 2024
[XLA:GPU] Don't allow to fuse DUS with shared operands.
#20072 merged Dec 3, 2024
[XLA:GPU] Do not compute suggested combiner threshold if there are no pipelined collectives in IR.
#20007 merged Dec 3, 2024
[xla:collectives] NFC: Move NcclCliqueKey to GpuCliqueKey
#19968 merged Dec 3, 2024
Automated Code Change
#19976 merged Dec 3, 2024
[xla:collectives] NFC: Move CliqueIsCallback alias to gpu_executable_run_options
#19964 merged Dec 3, 2024
[XLA:GPU] Use Cub RaddixSort for bf16 sorts in Numpy order (NaNs go last).
#19983 merged Dec 3, 2024
Integrate LLVM at llvm/llvm-project@bd92e4620433
#20061 merged Dec 3, 2024
[xla:collectives] NFC: Introduce GpuCollectives interface
#19965 merged Dec 3, 2024
PR #19571: PJRT: assign process index and count for compilation using device assignment.
#19946 merged Dec 3, 2024
[XLA][IndexAnalysis] Move indexing_analysis to hlo/analysis.
#20017 merged Dec 3, 2024
Add AssembleCompilationProvider routine
#19973 merged Dec 3, 2024
[xla:collectives] NFC: Rename NcclCommHandleWrapper to CommunicatorHandle
#19966 merged Dec 3, 2024
[xla:collectives] NFC: Remove NcclCliqueId alias
#19959 merged Dec 3, 2024
[xla:collectives] Add a base CliqueKey class to collectives component
#19960 merged Dec 3, 2024
[xla:collectives] NFC: Migrate XLA:GPU to strongly typed RankId
#19956 merged Dec 3, 2024
[xla:collectives] NFC: Add strongly typed collective RankId
#19955 merged Dec 3, 2024
Add a test to verify that the PJRT API struct is laid out as expected.
#19460 merged Dec 3, 2024
[XLA:CPU] Use ArrayTypeSwitch to avoid having to write out the switch
#20019 merged Dec 3, 2024
Remove overly restricted checking in RemapPlan::Validate()
#20027 merged Dec 3, 2024
[xla:collectives] NFC: Move NcclCliqueId to collectives/clique_id
#19954 merged Dec 3, 2024
Rollback of "Require packed dot operands to be packed along contracting dimension."
#20037 merged Dec 3, 2024
[XLA:LatencyHidingScheduler] Code Refactor (NFC).
#19452 merged Dec 3, 2024
Mark tsl/default:criticality as nobuilder
#20033 merged Dec 3, 2024
[xla:collectives] NFC: Add xla::Collectives API and prepare for NCCL implementation
#19895 merged Dec 2, 2024
Add scope range id for gpu event's annotation.
#18877 merged Dec 2, 2024
Add explicit GPU platforms to exhaustive tests
#18628 merged Dec 2, 2024
Integrate LLVM at llvm/llvm-project@1b03747ed85c
#20022 merged Dec 2, 2024
[XLA] Update documentation with CompositeCall operation.
#20020 merged Dec 2, 2024
[XLA:GPU] Minor reformat in collective_permute_cycle_decomposer.cc
#20015 merged Dec 2, 2024
Move test to public XLA:CPU test
#19997 merged Dec 2, 2024
Un-deprecate HloTestBase.
#20010 merged Dec 2, 2024
Integrate LLVM at llvm/llvm-project@21d27b3aabf3
#20008 merged Dec 2, 2024
[XLA:GPU] Support cross-replica cps in collective-permute decomposer
#19676 merged Dec 2, 2024
Add fine grained visibility rules to XLA
#20009 merged Dec 2, 2024
Add Megascale topology stat type.
#19814 merged Dec 2, 2024
Create stub filegroups in xla/tsl/platform corresponding to filegroups in tsl/platform
#19864 merged Dec 2, 2024
Move Internal XLA:CPU to public XLA:CPU API
#19999 merged Dec 2, 2024
Move Python XLA extension to public XLA:CPU API
#19995 merged Dec 2, 2024
Move test files to XLA:CPU public API
#19990 merged Dec 2, 2024
Move XLA Tests to XLA:CPU public PJRT API
#19993 merged Dec 2, 2024
Prepare custom call tests for API_VERSION_ORIGINAL removal
#19986 merged Dec 2, 2024
Support cross-replica send/recv on GPU
#19167 merged Dec 2, 2024
[xla:cpu] Replace Thunk::FunctionRegistry with FunctionLibrary
#19953 merged Dec 2, 2024
Integrate LLVM at llvm/llvm-project@2474cf7ad123
#20001 merged Dec 2, 2024
[XLA] Move hlo_traversal.h to hlo/utils.
#19992 merged Dec 2, 2024
Provide more CUDA diagnostic information
#19985 merged Dec 2, 2024
[XLA:CPU] Disable a subset of failing test cases in complex_unary_op_test on ARM.
#19988 merged Dec 2, 2024
[XLA:GPU] Fix typo in custom_kernel_fusion.h
#19987 merged Dec 2, 2024
PR #19655: [ROCm] Make MLIR Math dialect lowering more deterministic
#19984 merged Dec 2, 2024
Integrate LLVM at llvm/llvm-project@fe1c4f0106fe
#19980 merged Dec 2, 2024
Automated Code Change
#19978 merged Dec 2, 2024
[XLA:GPU] Clean up: reuse the LoadHloModuleAndArguments directly in multihost_hlo_runner utils.
#19920 merged Dec 2, 2024
PR #19927: [cuda] Warn about ptxas versions before CUDA 12.6.3
#19944 merged Dec 2, 2024
PR #19869: [XLA] Wraps the FFI backend config opaque string with CustomCallBackendConfig
#19901 merged Dec 2, 2024
Integrate LLVM at llvm/llvm-project@b22cc5a650de
#19975 merged Dec 2, 2024
Replace gpu_asm_extra_flags string option by individual flags
#19836 merged Dec 2, 2024
Automated Code Change
#19962 merged Dec 1, 2024
C++ tree with path API
#19366 merged Dec 1, 2024
[XLA:CPU] Use absl::call_once to lazily initialize kernel and comparator functions
#19949 merged Nov 30, 2024
[xla:cpu] NFC: Move FunctionLibrary from codegen to runtime
#19951 merged Nov 30, 2024
[xla:cpu] NFC: Consistently use llvm::CodeGenOptLevel to configure opt_level
#19950 merged Nov 30, 2024
[xla:cpu] NFC: Delete SimpleOrcJit
#19909 merged Nov 30, 2024
[XLA:GPU] Make the buffer comparison code more robust
#19948 merged Nov 30, 2024
Propagate profile generation strategy all the way to AutoGrappler
#19862 merged Nov 29, 2024
[xla:cpu] Migrate CpuCompiler from SimpleOrcJit to JitCompiler
#19945 merged Nov 29, 2024
[XLA:CPU] Mark vectorized_reduce_with_no_vector_registers_test as requiring x86_64
#19893 merged Nov 29, 2024
[XLA:GPU] Fix Triton support for dot(inf, 1.0) with TF32_TF32_F32_X3 algorithm
#19937 merged Nov 29, 2024
[XLA:GPU] Fail gracefully if call function was not found in the Triton module.
#19941 merged Nov 29, 2024
Integrate LLVM at llvm/llvm-project@59f57be94f38
#19943 merged Nov 29, 2024
[XLA:GPU] Set the SortSizeThreshold in tests to zero.
#19940 merged Nov 29, 2024
[XLA] Clarify the usage of IsAsyncCollective{Start,Done}Op.
#19936 merged Nov 29, 2024
PR #19925: [ROCm] Fix build break with gcc due to 53417984
#19938 merged Nov 29, 2024
[XLA:GPU] Add a check to emitters that dynamic-(update-)slice indexes are canonicalized.
#19929 merged Nov 29, 2024
[XLA:GPU] Clean up: remove redundant code. Express one Create method in terms of the other.
#19930 merged Nov 29, 2024
[XLA:GPU] Require packed dot operands to be packed along contracting dimension.
#19190 merged Nov 29, 2024
Remove no longer used hlo passes.
#19924 merged Nov 29, 2024
Reverts 6949b216cd2a413e4a1445104cd42ebbdca29d04
#19935 merged Nov 29, 2024
[xla:cpu] Migrate CpuCompiler from SimpleOrcJit to JitCompiler
#19907 merged Nov 29, 2024
Clarify index parallel dims in gather/scatter instructions.
#19826 merged Nov 29, 2024
Review PJRT integration document
#19202 merged Nov 28, 2024
[xla:cpu] Add support for linking with external symbols to JitCompiler
#19908 merged Nov 28, 2024
[xla:cpu] NFC: Construct JitCompiler in xla::cpu::CpuCompiler
#19906 merged Nov 28, 2024
[XLA:GPU] Add documentation for XLA:GPU emitters to ToC.
#19928 merged Nov 28, 2024
[xla:cpu] Enable parallel compilation using ORC TaskDispatcher
#19867 merged Nov 28, 2024
#sdy cleanup: remove unused pipeline creation function and add Sdy namespace to ops in ops.td.
#19891 merged Nov 28, 2024
[XLA:GPU][Emitters] Add documentation for the emitters.
#19923 merged Nov 28, 2024
[XLA:GPU] Add gpu_client_mem_fraction flag to the HLO runner
#19918 merged Nov 28, 2024
[XLA:GPU] Enable F32_F32_F32 algorithm for triton gemm fusion
#19914 merged Nov 28, 2024
[Triton] Fix bug where chain-dots with num_warps = 8 would lead to assertion failure in Linear Layouts.
#19917 merged Nov 28, 2024
Integrate LLVM at llvm/llvm-project@32ef417603e1
#19919 merged Nov 28, 2024
Reverts cd11046e5007508498f140df13ba40dc08b13822
#19916 merged Nov 28, 2024
Automated Code Change
#19911 merged Nov 28, 2024
[XLA:GPU] Match custom calls to Cub Sort before running expanders for totalorder and stability. Simplify SortRewriter.
#19766 merged Nov 28, 2024
PR #19890: Use /opt/rocm/ for ROCM_INSTALL_DIR environment variable
#19912 merged Nov 28, 2024
Integrate StableHLO at openxla/stablehlo@f21104d0
#19675 merged Nov 28, 2024
Automated Code Change
#19874 merged Nov 28, 2024
Put P2PSchedulePreparation behind enable_pipelined_p2p flag
#19246 merged Nov 28, 2024
[xla:cpu] NFC: Extract RuntimeSymbolGenerator into a separate library
#19846 merged Nov 28, 2024
[xla:cpu] Add a little bit of type safety for FunctionLibrary
#19849 merged Nov 28, 2024
Make channel_id optional for send/recv ops.
#19239 merged Nov 28, 2024
Add more flexible custom hermetic Python setup
#19689 merged Nov 28, 2024
[xla:cpu] Implement JitCompiler on top of LLVM ORC stack
#19848 merged Nov 28, 2024
[xla:cpu] NFC: Update IrCompiler::TargetMachineBuilder signature to return StatusOr and move default target machine builder to JitCompiler
#19850 merged Nov 28, 2024
Temporarily remove absl dll patch to fix jaxlib windows build.
#19870 merged Nov 28, 2024
[XLA:GPU] Fix tune_ctas in GemmFusionAutotunerImpl::GetExhaustiveTritonConfigs.
#19904 merged Nov 27, 2024
[XLA] Remove unused functions from llvm_util.
#19903 merged Nov 27, 2024
[xla:cpu] NFC: Move InferTargetMachine to JitCompiler
#19852 merged Nov 27, 2024
[IFRT] Modify IfrtArrayType builders to only accept types and attributes.
#19894 merged Nov 27, 2024
[XLA:GPU] Use DeviceDescription instead of hard-coding warp size as 32
#19794 merged Nov 27, 2024
[xla:cpu] NFC: Extract cpu features filtering and detection into a separate library
#19851 merged Nov 27, 2024
[XLA:CPU] Fix test failure for the cpu case on an internal test.
#19887 merged Nov 27, 2024
This CL updates HostTracer to use Traceme Filter mask from profiling request. Also add utility functions to prepare filter mask.
#19367 merged Nov 27, 2024
Reverts 00f3b02608e672e121f5cd31dba6a8027d64fe60
#19889 merged Nov 27, 2024
Switch out the base container image for XLA.
#19713 merged Nov 27, 2024
[XLA:GPU] Remove legacy reduction emitter code
#19720 merged Nov 27, 2024
Integrate LLVM at llvm/llvm-project@f67ba5855278
#19885 merged Nov 27, 2024
IFRT proxy: Additional debug logging and xprof tracemes.
#19888 merged Nov 27, 2024
[xla:cpu] NFC: Move CompilerFunctor to backends/cpu/codegen IrCompiler
#19853 merged Nov 27, 2024
PR #19026: [NVIDIA GPU] LHS enhancement for multiple collective resources
#19835 merged Nov 27, 2024
Integrate LLVM at llvm/llvm-project@b214ca82daee
#19883 merged Nov 27, 2024
PR #19754: [ROCm] Enable gemm fusion autotuner.
#19877 merged Nov 27, 2024
[XLA:GPU] Fix a stack use after scope in horizontal fusion
#19882 merged Nov 27, 2024
PR #18988: [WhileLoopAllReduceCodeMotion] Support convert and transpose ops in setup passes.
#19876 merged Nov 27, 2024
[XLA:CPU] Update RunHloBenchmark to enable running with HLO with inferred arguments
#19698 merged Nov 27, 2024
[XLA:GPU] Allow passing extra GPU client options to PjRt environment init
#19783 merged Nov 27, 2024
[XLA:GPU] Disabling auto layout in HLO for now.
#19879 merged Nov 27, 2024
Automated Code Change
#19753 merged Nov 27, 2024
[XLA:GPU] Move SortRewriter to an earlier point in the pass pipeline.
#19657 merged Nov 27, 2024
PR #18840: [NVIDIA] Support larger head dim for cudnn fmha
#19838 merged Nov 27, 2024
[IFRT] Modify IFRT <-> VIFRT legalization to support escaped SymbolRefAttr.
#19865 merged Nov 27, 2024
[IFRT] Add pass to remove attributes that are not from IFRT or Builtin dialects.
#19861 merged Nov 27, 2024
[XLA:GPU] Fix an ASAN error
#19816 merged Nov 27, 2024
[XLA:GPU] Remove unused function RowReductionGetRowsPerWarp.
#19719 merged Nov 27, 2024
[Cleanup] Use push_back instead of emplace_back where appropriate
#19800 merged Nov 27, 2024
[XLA:GPU] Fix a bug in horizontal input fusion sorting.
#19715 merged Nov 26, 2024
Move tsl/platform/{build_config,build_config_root,rules_cc}.bzl to xla/tsl/platform
#19718 merged Nov 26, 2024
Allowing reshape on int4's in XLA:GPU
#19858 merged Nov 26, 2024
[PjRt-IFRT] Add optional global device mapping support to PjRt-IFRT
#19638 merged Nov 26, 2024
[JAX] Add Python binding for building a colocated Python program
#19811 merged Nov 26, 2024
[hlo-opt] Add a placeholder method to register passes from the CPU/GPU providers.
#19805 merged Nov 26, 2024
When computing the set of instructions to shard in the presence of SPMDShardToFullShape and SPMDFullToShardShape custom calls, handle the case where parameters of a called computation may not flow to the roots of the computation.
#19807 merged Nov 26, 2024
[XLA] Alias ragged all-to-all output with operand 1.
#19812 merged Nov 26, 2024
[XLA:GPU] Mark collectives in formatting ops as pipelined.
#19647 merged Nov 26, 2024
[XLA:GPU] Add intra-warp reduce of reduce test.
#19840 merged Nov 26, 2024
[Triton] Restricting failing configurations on certain microbenchmarks. The failures are all CUDA_ERROR_ILLEGAL_ADDRESS which seem to occur with block_m=16 / block_n=16 with num_warps=16.
#19755 merged Nov 26, 2024
Integrate LLVM at llvm/llvm-project@c0192a008c4a
#19839 merged Nov 26, 2024
Add DeferRelocatableCompilationCompilationProvider
#19831 merged Nov 26, 2024
[XLA:GPU] Remove xla_gpu_enable_heuristic_pass_configuration flag.
#19149 merged Nov 26, 2024
Reverts 27352402f3c65c41e7c897138d6ad3e015a04014
#19837 merged Nov 26, 2024
Fix PropagateShardingAlongDimsAndReplicateOthers and expose it as a public util function.
#19825 merged Nov 26, 2024
[XLA:CPU] Create kernel API for cpu runtime
#19785 merged Nov 26, 2024
[xla:cpu] Move TargetMachineFeatures to xla/backends/codegen
#19821 merged Nov 26, 2024
Add PTX CompilationProvider for compiling PTX via the CUDA driver
#19777 merged Nov 26, 2024
Limit the number of stragglers we log to avoid RESOURCE_EXHAUSTED errors in the RPC layer from sending overly verbose errors.
#19823 merged Nov 26, 2024
Automated Code Change
#19760 merged Nov 26, 2024
Update target_config to be a text proto and populate it on the
#19818 merged Nov 26, 2024
Automated Code Change
#19742 merged Nov 26, 2024
[IFRT] Add IFRT IR program SerDeRoundTrip helper method for tests.
#19815 merged Nov 26, 2024
Adds a sharding config to XLA's HloModuleConfig (as part of AutoFDO integration).
#19417 merged Nov 26, 2024
Create op_metircs_to_record to deal with Roofline Analysis
#19817 merged Nov 26, 2024
[XLA:SPMD] Fix a bug in PartitionGatherTrivialSlicedOperandDimensions.
#19806 merged Nov 26, 2024
[Cleanup] Use push_back instead of emplace_back where appropriate
#19801 merged Nov 26, 2024
Relocates all ShardingConfig <--> ShardingConfigProto conversion from platforms/ to third_party/.
#19808 merged Nov 26, 2024
[Cleanup] Use push_back instead of emplace_back where appropriate
#19799 merged Nov 26, 2024
Reverts 783d6c98e36b7d7cdabeb11b34b6c3d88e716e74
#19804 merged Nov 25, 2024
[IFRT] Add VIFRT serialization python bindings.
#19791 merged Nov 25, 2024
Reverts 046f3dc59a0c67a0ce144cc01a5af5aeac58977c
#19803 merged Nov 25, 2024
[xla:cpu] Move TargetMachineFeatures to xla/backends/codegen
#19756 merged Nov 25, 2024
Relocates ShardingConfig's nested proto definition from platforms/ to third_party/.
#19379 merged Nov 25, 2024
[Cleanup] Use push_back instead of emplace_back where appropriate
#19797 merged Nov 25, 2024
[xla:cpu] NFC: Extract XLA:CPU alignment requirements into a separate library
#19798 merged Nov 25, 2024
Integrate LLVM at llvm/llvm-project@c9e606b9cf50
#19795 merged Nov 25, 2024
[xla:cpu] NFC: Move VectorSupportLibrary to VectorIrBuilder in backends/cpu/codegen
#19793 merged Nov 25, 2024
Create a simplified cost interface that can be used in various components.
#19377 merged Nov 25, 2024
[xla:cpu] NFC: Move polynomial approximations for common math functions to xla cpu codegen folder
#19736 merged Nov 25, 2024
Clean up dependencies for matmul_utils.h.
#19789 merged Nov 25, 2024
[Triton] Cherry-picking github.com/triton-lang/triton/commit/35f1827581071a5ac3a385f8776ab1a3a784811a to fix correctness issues. This should be ported in eventually in the next Triton integration.
#19784 merged Nov 25, 2024
Add a test in hlo_evaluator_test to demonstrate how to obtain diagonal from a matrix.
#19637 merged Nov 25, 2024
[xla:cpu] NFC: Extract ContiguousSectionMemoryManager into a separate library
#19787 merged Nov 25, 2024
[xla:cpu] Initial commit for LlvmOrcJitCompiler
#19786 merged Nov 25, 2024
Delete unused xla/status.h.
#19708 merged Nov 25, 2024
Bump C64 Log tolerance in exhaustive_unary_complex_test
#19782 merged Nov 25, 2024
Add CachingCompilationProvider
#19771 merged Nov 25, 2024
Integrate LLVM at llvm/llvm-project@f81f47e3ff29
#19781 merged Nov 25, 2024
[XLA:GPU] Make SortRewriter VLOG level 2 less chatty.
#19778 merged Nov 25, 2024
Add CompositeCompilationProvider
#19769 merged Nov 25, 2024
PR #19775: [PJRT:GPU] Fix device numbering in topology creation
#19776 merged Nov 25, 2024
Add compilation provider for libnvptxcompiler
#19763 merged Nov 25, 2024
[XLA:GPU] Add support for BF16_BF16_F32[_X3,X6] dot precision algorithm in algebraic simplifier.
#19768 merged Nov 25, 2024
Reverts 885378495882835e7ddfbea137d0677924604fee
#19772 merged Nov 25, 2024
[XLA:GPU] Skip cub sort on failing types on H100
#19773 merged Nov 25, 2024
Add CompilationProvider for libnvjitlink
#19762 merged Nov 25, 2024
Enable sort_rewriter_test in OSS
#19765 merged Nov 25, 2024
Automated Code Change
#19752 merged Nov 25, 2024
[Upkeep][XLA-Code-Health] Resolve 4 instances of the following issue: Todo (resolved)
#19518 merged Nov 24, 2024
Integrate LLVM at llvm/llvm-project@2fe947b47798
#19721 merged Nov 23, 2024
Delete unused xla/statusor.h.
#19707 merged Nov 23, 2024
Move BatchedGatherScatterNormalizer from pre-SPMD for pose-SPMD.
#19415 merged Nov 23, 2024
internal visibility change
#19723 merged Nov 23, 2024
The MoveUserInstructionsIn cannot handle the conditional operations with array output and multiple users. It may trigger compilation error, such as the added test target.
#19727 merged Nov 23, 2024
Add target_config as an optional field of
#19726 merged Nov 23, 2024
[xla:collectives] NFC: Remove communicator aliases from NcclApi
#19724 merged Nov 23, 2024
Remove absl::Nonnull from AbslStringify
#19722 merged Nov 23, 2024
Update target_config to be a text proto and populate it on the
#19710 merged Nov 23, 2024
Use absl::Nonnull to indicate that sharding in xla::ifrt::ArraySpec cannot be null
#19717 merged Nov 22, 2024
[Code-Health] Resolve the following technical debt issue: Todo(resolved)
#19483 merged Nov 22, 2024
[xla:collectives] Remove unused CommDestroy
#19714 merged Nov 22, 2024
[xla:collectives] Use NcclCommunicator in NcclApi implementation
#19712 merged Nov 22, 2024
[IFRT] Add VIFRT pass for converting between VIFRT versions.
#19711 merged Nov 22, 2024
Integrate LLVM at llvm/llvm-project@556ea5265a25
#19701 merged Nov 22, 2024
[XLA:GPU] remove channel ID checks in hlo_instructions.cc
#19408 merged Nov 22, 2024
[xla:cpu] Add JitCompiler and FunctionLibrary APIs for XLA:CPU codegen
#19702 merged Nov 22, 2024
[xla:cpu] Add a KernelRunner API to codegen testlib and sketch a test for XLA:CPU
#19703 merged Nov 22, 2024
Further lower threshold for F64 in //xla/service/gpu/model:hlo_op_profiler_test
#19709 merged Nov 22, 2024
[xla:collectives] Initial xla/core/collectives component commit
#19680 merged Nov 22, 2024
PR #19660: [ROCm] switch rocm build to clang
#19705 merged Nov 22, 2024
Stop using xla/statusor.h in favor of absl/status/statusor.h directly.
#19704 merged Nov 22, 2024
Fix two issues in PartitionScatterIndexPassthroughDimensions.
#19688 merged Nov 22, 2024
[XLA] Go back to using a glob for including dialects in the mlir_interpreter.
#19697 merged Nov 22, 2024
Integrate LLVM at llvm/llvm-project@a12e79a85fc1
#19692 merged Nov 22, 2024
#sdy Refactor xla-sdy-mhlo-round-trip-shard-map-export from a ConversionPattern to a walk.
#19658 merged Nov 22, 2024
[XLA:GPU] Fusion tests don't seem to require A100, so replace tag.
#19640 merged Nov 22, 2024
[xla:cpu] Add a benchmark for creating zero-copy PjRt buffer
#19670 merged Nov 22, 2024
[xla:cpu] NFC: Remove ExecuteState alias from Thunk
#19659 merged Nov 22, 2024
PR #19577: Cleanup handling of 2 fields of ExecutableBuildOptions.
#19651 merged Nov 22, 2024
PR #19346: Bumped rules_python version to 0.39.0
#19500 merged Nov 22, 2024
PR #19679: [XLA:CPU][oneDNN] Relocate Addend Shape Validation to the Contraction Rewriter
#19686 merged Nov 22, 2024
[XLA] Propagate original_value when instructions are replaced in X64Rewriter
#19639 merged Nov 22, 2024
PR #19656: Fix implicit index handling in ScatterDeterminismExpander
#19683 merged Nov 22, 2024
Change parameter type in LinkUsingNvlink
#19682 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in command_buffer_scheduling.cc
#19599 merged Nov 22, 2024
Add a simple test for the symbol_finder
#19665 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in collective_send_recv_combiner.cc
#19609 merged Nov 22, 2024
Move LinkGpuAsm into separate file
#19642 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in collective_select_folder.cc
#19592 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in collective_permute_valid_iteration_annotator.cc
#19584 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in collective_permute_cycle_decomposer.cc
#19596 merged Nov 22, 2024
Move ptxas/nvlink compilation into separate compilation unit
#19641 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in async_wrapper_test.cc
#19590 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in async_wrapper.cc
#19622 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in all_reduce_blueconnect.cc
#19597 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in all_gather_optimizer.cc
#19583 merged Nov 22, 2024
Remove obsolete PjRtClient::AsyncSendPlaceholder API.
#19678 merged Nov 22, 2024
Move tsl/platform/profile_utils to xla/tsl/platform/profile_utils
#19674 merged Nov 22, 2024
Set implicitTrunc on APInt creation
#19677 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in all_gather_dynamic_slice_simplifier.cc
#19629 merged Nov 22, 2024
[IFRT] Implement BytecodeDialectInterface for VIFRT.
#19672 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in alias_passthrough_params.cc
#19585 merged Nov 22, 2024
Add batch tests to RemapArrays, and with different shapes.
#19442 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_transpose_fusion.cc
#19575 merged Nov 22, 2024
Stop using AsGpuStreamValue in gpu_cudamallocasync_allocator_test.
#19667 merged Nov 22, 2024
PR #16901: [XLA:GPU] Fix default device mesh for auto sharding
#19294 merged Nov 22, 2024
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_rewriter_test.cc
#19574 merged Nov 21, 2024
Eliminate static_casts in GpuCommandBuffer.
#19666 merged Nov 21, 2024
Move jax visibility inside internal_visibility call
#19566 merged Nov 21, 2024
[XLA:CollectivePipeliner-Sinking] Stop pipelining iterations if a large sunk collective is encountered.
#19526 merged Nov 21, 2024
Refactor GetGatherScatterBatchParallelDims. No behavior change.
#19581 merged Nov 21, 2024
[xla:codegen] Add a testonly KernelEmitter for testing XLA:CPU kernels
#19668 merged Nov 21, 2024
Revert: [XLA:GPU] Enable Triton normalization fusions by default.
#19662 merged Nov 21, 2024
[XLA:GPU] Dump the failing HLO fusion to a file when Triton numerics verification fails.
#19664 merged Nov 21, 2024
Cleanup. Merge GatherScatterParallelDims into GatherScatterDims.
#19572 merged Nov 21, 2024
Make the implementation of GetXlaPjrtTpuClient more similar to how Jax uses PJRT.
#19568 merged Nov 21, 2024
Reverts 06e8ef8ec494877e459ff357f9479613a472c67b
#19563 merged Nov 21, 2024
Add backend kwargs to xla tests.
#19561 merged Nov 21, 2024
Clarify the dimensions in gather/scatter dimensions. The following dimensions do NOT overlap. These dims are processed separately in spmd partitioner.
#19465 merged Nov 21, 2024
Set implicitTrunc on APInt creation
#19565 merged Nov 21, 2024
Remove unused GpuAsmOpts parameter from RedzoneAllocator
#19533 merged Nov 21, 2024
[tsl:concurrency] Fix asan error in CountDownAsyncValueRef
#19661 merged Nov 21, 2024
Remove static_casts in implementations of SetNodeExecutionEnabled.
#19573 merged Nov 21, 2024
[xla:cpu] Add a test for nanort executable with temp storage
#19580 merged Nov 21, 2024
[XLA:GPU] Change ConstraintExpression to use operator||/&& which return a new instance.
#19548 merged Nov 21, 2024
PR #19578: [doc] Fix a link to a page in the table of contents.
#19645 merged Nov 21, 2024
[xla:cpu] Optimize buffer allocations construction from se::DeviceMemoryBase
#19487 merged Nov 21, 2024
Remove remnants of GpuDriver
#19544 merged Nov 21, 2024
Refactor PjRt environment initialization to have clearer data flow
#19553 merged Nov 21, 2024
PR #18407: Fix xla-mlir failures on Windows
#19650 merged Nov 21, 2024
Integrate LLVM at llvm/llvm-project@33fcd6acc755
#19542 merged Nov 21, 2024
[XLA:GPU] Remove KernelFusionEmitterBase.
#19118 merged Nov 21, 2024
[XLA:GPU] Enable Triton normalization fusions by default.
#19569 merged Nov 21, 2024
Remove CUDA 12.1 workaround from reduction logic
#19536 merged Nov 21, 2024
Remove :cuda_runtime and :rocm_runtime targets
#19541 merged Nov 21, 2024
Use fast version of log if type is F16 or BF16.
#19535 merged Nov 21, 2024
[XLA:GPU] Use DeviceDescription instead of GetDriverVersion in NVPTXCompiler
#19539 merged Nov 21, 2024
[XLA:MSA] Allow more flexible filtering when picking instruction to schedule after/before for prefetch time override.
#19490 merged Nov 21, 2024
[XLA:TPU:MSA] Remove redundant checks for cross_program_prefetches in memory_space_assignment tests.
#18130 merged Nov 21, 2024
[xla:cpu] Use CountDownAsyncValueRef in HostKernel state
#19579 merged Nov 21, 2024
[Code-Health] Resolve 2 instances of the following issue: Todo (resolved)
#19488 merged Nov 21, 2024
[xla:cpu] Resolve constant buffers
#19508 merged Nov 21, 2024
[xla:cpu] Resolve arguments/results/temp mapping from buffer assignment
#19504 merged Nov 21, 2024
Move tsl/platform/{cloud,default,windows} to xla/tsl/platform
#19323 merged Nov 21, 2024
[XLA:TPU:MSA]
#19491 merged Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_rewriter.cc
#19576 merged Nov 21, 2024
[XLA:GPU] Remove RewriteReductionsPass
#19570 merged Nov 21, 2024
Adding step to constant_value and add support for multiplication while recursively calculating the range of an expression.
#19558 merged Nov 21, 2024
[XLA:MSA] Fixes a bug in GetInefficientAllocationSites(allocation_values). The function was previously assuming allocation_values can never be empty.
#19365 merged Nov 20, 2024
Remove unused functions from ir_emission_utils.cc
#19513 merged Nov 20, 2024
[IFRT] Add pass to legalize VIFRT into IFRT.
#19523 merged Nov 20, 2024
[NFC] hlo_op_profiler_test: Internal testing change.
#19562 merged Nov 20, 2024
[IFRT] Fix signature of CreateIfrtVerifyDonationPass
#19564 merged Nov 20, 2024
IFRT proxy optimization: Make more IFRT operations asynchronous.
#19440 merged Nov 20, 2024
Add backend_kwargs to XLA tests config.
#19560 merged Nov 20, 2024
[XLA:TPU:MSA] Refactor some utility functions from algorithm and buffer_interval_comparator into msa/utils.
#18131 merged Nov 20, 2024
[XLA:GPU] Use ShuffleOp to reverse the order of elements in a vector.
#19486 merged Nov 20, 2024
Remove unused gpu_types.h include from nccl_collective_thunk.cc
#19545 merged Nov 20, 2024
[XLA:GPU] Use HloPredicateIsOp in collective_select_folder
#19525 merged Nov 20, 2024
PR #19237: [GPU] Fix passing of key-value store handle from client to compiler.
#19254 merged Nov 20, 2024
[tsl:concurrency] Keep AsyncValueRef a part of CountDownAsyncValueRef State
#19522 merged Nov 20, 2024
Fix comments in convolution_test_1d.cc
#19529 merged Nov 20, 2024
[XLA:CPU] Add benchmarks for 2D strided convolutions
#19434 merged Nov 20, 2024
Refactor exhaustive_test_main into a separate library target
#19510 merged Nov 20, 2024
[Code-Health] Resolve the following technical debt issue:
#19485 merged Nov 20, 2024
[Code-Health] Resolve the following technical debt issue:
#19511 merged Nov 20, 2024
[XLA-Code-Health] Resolve 2 instances of the following issue: Todo (resolved)
#19509 merged Nov 20, 2024
[Code-Health] Resolve the following technical debt issue: Todo(resolved) in CUDA BUILD file.
#19489 merged Nov 20, 2024
[tsl] CountDownAsyncValueRef: enforce memory ordering around fetch_sub
#19551 merged Nov 20, 2024
[xla:cpu] Add initial implementation of NanoRt backends for XLA:CPU
#19405 merged Nov 20, 2024
Do not allow overlap between explicit and implicit batching dims in gather/scatter instructions. Implicit batching dims are also known as index parallel dims.
#19464 merged Nov 20, 2024
[XLA:GPU] Adjust GetNumWarps heuristic in Tiled Cost Model.
#19540 merged Nov 20, 2024
Merge sparsity_layout.patch into sparse_dot.patch
#19547 merged Nov 20, 2024
Remove unused and add used headers in hlo_runner_main and create_client
#19546 merged Nov 20, 2024
[XLA:ALGEBRAIC_SIMPLIFIER] Turn constant all-gather into broadcast
#19502 merged Nov 20, 2024
[Triton] Restrict block_m to be > 16 in the GEMM autotuner to resolve CUDA_ERROR_ILLEGAL_ADDRESS in (micro)benchmarks with FP8 Triton kernels during exhaustive autotuning.
#19437 merged Nov 20, 2024
Use OpTrait::DotLike to identify dot-like operations
#19142 merged Nov 20, 2024
[TritonGPU] Add DotLike trait to SparseDotOp
#19000 merged Nov 20, 2024
PR #19393: [GPU] Horizontal loop fusion: pass bitcasts when looking for fusion candidates.
#19527 merged Nov 20, 2024
PR #19363: Loop Counter Increment in Collective Pipeliner
#19532 merged Nov 20, 2024
Move SparseDotMetaEncodingAttr inside xla
#18436 merged Nov 20, 2024
[XLA:GPU] Combine pipelined instructions as much as possible by default.
#19148 merged Nov 20, 2024
delete hlo-legalize-to-memref-unranked.mlir
#19534 merged Nov 20, 2024
Remove unused GpuAsmOpts parameter from Cholesky and TriangularSolveThunks
#19531 merged Nov 20, 2024
Make g_trace_filter_bitmap atomic to avoid race across threads.
#19507 merged Nov 20, 2024
Remove custom compilation call from DynamicSharedMemoryTest
#19530 merged Nov 20, 2024
Account for optional channel ID in send/recv error message
#19325 merged Nov 20, 2024
Add a pattern matcher for ragged dot HLO.
#18675 merged Nov 20, 2024
[XLA:SPMD] Add HLO annotation to disable collective matmul in SPMD.
#19382 merged Nov 20, 2024
Cleanup. Refactor GetGatherScatterBatchParallelDims. No behavior change.
#19520 merged Nov 20, 2024
[Upkeep][XLA-Code-Health] Resolve the following technical debt issue: Todo(resolved)
#19517 merged Nov 20, 2024
Stop using gpu_types.h where it's not needed.
#19512 merged Nov 20, 2024
Remove dead ShapeContainsToken in HLO verifier
#19326 merged Nov 20, 2024
Reverts c6f26d4efa1ac071a1b39c9a81dae6214da22be3
#19515 merged Nov 20, 2024
Remove unneeded use of gpu_types.h in topk_kernel_test.cc.
#19493 merged Nov 20, 2024
[IFRT] Legalize IFRT dialect into VIFRT dialect.
#19506 merged Nov 19, 2024
Move some tests to public XLA:CPU API
#19501 merged Nov 19, 2024
Add backend kwargs to xla tests.
#19503 merged Nov 19, 2024
[Upkeep][XLA-Code-Health] Resolve 2 instances of the following issue: Todo (resolved)
#19495 merged Nov 19, 2024
Remove unneeded xla:statusor dependency.
#19496 merged Nov 19, 2024
Move stable hlo compile test to XLA:CPU public API
#19498 merged Nov 19, 2024
Replace MockGpuExecutor with MockStreamExecutor in the only use.
#19449 merged Nov 19, 2024
Refactor TF wheel build rule, common python rules and flag names.
#19409 merged Nov 19, 2024
Integrate LLVM at llvm/llvm-project@b03a747fc0fc
#19481 merged Nov 19, 2024
Add terminology page
#19270 merged Nov 19, 2024
[XLA:GPU] Allow auto layout in multihost HLO runner.
#19345 merged Nov 19, 2024
Reverts e0c3ce3f243cac08aeb22a1589587a47ba767501
#19480 merged Nov 19, 2024
PR #19426: [ROCm] Disable gpu_too_many_blocks_test for rocm
#19471 merged Nov 19, 2024
Integrate LLVM at llvm/llvm-project@6e1acdcdc1b3
#19470 merged Nov 19, 2024
[XLA:GPU] Consolidate HS flags for exec_time_optimization_effort >= 0.2.
#19147 merged Nov 19, 2024
PR #19432: [GPU] Sharded autotuning: exchange only results of the latest compilation.
#19433 merged Nov 19, 2024
PR #17593: [ROCm] Include clang-19 and clang-20 headers
#19291 merged Nov 19, 2024
[XLA:GPU] Eagerly free temporary constants created during constant folding to reduce peak heap memory.
#19369 merged Nov 19, 2024
[XLA:GPU] Plumb through AppendPipelinedInstruction.
#19106 merged Nov 19, 2024
PR #19372: [GPU] Consider small kInput fusions with concatenations in the horizontal loop fusion pass.
#19423 merged Nov 19, 2024
Better encapsulation of HloModuleConfig's fields through setters and returning references instead of pointers.
#19151 merged Nov 19, 2024
Cleanup. Refactor gather_scatter_handler. Remove unused code. Replace sort with stable_sort.
#19454 merged Nov 19, 2024
[xla:codegen] First version of KernelEmitter and KernelSpec APIs for shared codegen
#19458 merged Nov 19, 2024
Update XLA's warnings.bazelrc
#19453 merged Nov 18, 2024
Push xla test 'main' deps to leaf test targets
#19443 merged Nov 18, 2024
[tsl:concurrency] Add CountDownAsyncValueRef to concurrency library
#19403 merged Nov 18, 2024
Update gpu_test_kernels_fatbin to use runfiles
#19439 merged Nov 18, 2024
Reverts 82da46162c824df74e78e0ed20a19ff33542b283
#19447 merged Nov 18, 2024
Change launch_id in HLO runner to be 1-based.
#19445 merged Nov 18, 2024
Support explicit batch dimensions for gather/scatter in HLO evaluator.
#19400 merged Nov 18, 2024
[StableHLO] Add VhloDialect to the dependent dialects of vhlo-to-version
#19441 merged Nov 18, 2024
PR #19313: [nfc] Cleaning up dynamic slice fusion
#19402 merged Nov 18, 2024
Style improvement: use a common method to create ArraySpec
#19418 merged Nov 18, 2024
[XLA:GPU] Add xla_experimental_exec_time_optimization_effort flag.
#19111 merged Nov 18, 2024
[XLA:GPU] Delete some unnecessary asterisks from SoftmaxRewriterTriton's IsSupportedBroadcastOfParameter.
#19318 merged Nov 18, 2024
Integrate LLVM at llvm/llvm-project@64c455077abe
#19422 merged Nov 18, 2024
IFRT proxy: asynchronous and faster MakeArrayFromHostBuffer
#19407 merged Nov 17, 2024
Create HloUnaryInstruction to support result_accuracy for certain unary functions.
#19353 merged Nov 16, 2024
Move new field to end of struct PJRT_Api and update struct size accordingly.
#19401 merged Nov 16, 2024
Disable scatter_determinism_expander as it causes failure on internal test.
#19412 merged Nov 16, 2024
Bring down_cast function to ::tsl namespace
#19404 merged Nov 16, 2024
Move GetStartIndicesDimsToOutputDims to gather_scatter_utils.h
#19399 merged Nov 16, 2024
[StableHLO] Don't require inlining for shape refinement.
#18957 merged Nov 16, 2024
[XLA:MSA] Adding two debugging functions for memory space assignment to facilitate reproduction of production bugs in small tests through steering decisions at two key points in the MSA pass flow, before and after AllocateSegment() call:
#19364 merged Nov 15, 2024
PR #19116: [XLA:CPU] [oneDNN] Refactoring oneDNN Memory Util for Custom Call oneDNN Thunk Runtime Support
#19121 merged Nov 15, 2024
[XLA:MSA] Refactoring: moves AllocationValue, AllocationRequest, Result, and AliasedOffset classes under allocation_value.h. Result is renamed to AllocationResult.
#19371 merged Nov 15, 2024
Add new workaround target due to https://github.com/bazelbuild/bazel/issues/21519
#19398 merged Nov 15, 2024
Temporarily exclude xla/tsl from buildifier checks
#19397 merged Nov 15, 2024
Integrate LLVM at llvm/llvm-project@b3134fa23383
#19385 merged Nov 15, 2024
kAsyncStart is missing from CalculatePostOrderScheduleHelper() which will
#19358 merged Nov 15, 2024
PR #18825: [GPU] GEMM fusions: let fusing effective parameters and their broadcasts in the epilogues.
#19387 merged Nov 15, 2024
[IFRT] Ensure that VIFRT td file structure matches that of IFRT dialect.
#19396 merged Nov 15, 2024
PR #19275: [NVIDIA] Add fixes for supporting determinism expander for high-dimensional scatter operation and a flag to disable it
#19384 merged Nov 15, 2024
Handle ragged dot in precision config methods.
#19376 merged Nov 15, 2024
Both the xla_internal_test_main library and the gunit_main library define the symbol main.
#19320 merged Nov 15, 2024
#sdy unskip test and change to correct result now that we resolve real conflicts.
#19394 merged Nov 15, 2024
Move IFRT lib to XLA:CPU Public API
#19392 merged Nov 15, 2024
[Recoverable] Allow certain tasks to gracefully reconnect without crashing the entire cluster during transient errors (i.e. preemption).
#19238 merged Nov 15, 2024
Add IndexCastUIOp to AxisInfoAnalysis.
#19349 merged Nov 15, 2024
Move IFRT integration tests to public XLA:CPU API
#19391 merged Nov 15, 2024
[XLA] Remove redundant transposes early on
#19383 merged Nov 15, 2024
Reverts 06e520756a5d9288d8b689fe387acab810bc3de8
#19350 merged Nov 15, 2024
[XLA:GPU] Use namespace aliases for mlir::mhlo and mlir::arith in triton_fusion_emitter_legacy_matmul.cc
#19389 merged Nov 15, 2024
[XLA] Fix a race in SlowOperationAlarm and add a test
#19386 merged Nov 15, 2024
Convert any compute on host memory into host compute, including dynamic-slice.
#18624 merged Nov 15, 2024
[XLA:CPU] Add CPU scatter benchmarks
#19307 merged Nov 15, 2024
Modernize API for cuda_asm_compiler functions
#19303 merged Nov 15, 2024
[XLA:GPU] Fix bug related to usage of DynamicPadder pass.
#19295 merged Nov 15, 2024
[XLA:GPU] Support auto layouts in StreamExecutor PjRT path when using HLO as input
#19340 merged Nov 15, 2024
PR #16775: Add test for EmitReducePrecisionIR
#19381 merged Nov 15, 2024
[XLA] Add an option to disable verifier in HloExtractor
#19378 merged Nov 15, 2024
Fix AlgebraicSimplifier so that it does not eliminate host offloading copies.
#19317 merged Nov 15, 2024
PR #19112: [GPU] GEMM fusion: support more broadcasts.
#19247 merged Nov 15, 2024
PR #19342: [ROCm] Skip unsupported tests in dot_algorithms_test
#19343 merged Nov 15, 2024
Simplify MSA's BaseCosts api by removing the BytesPerSecond() method, and instead relying on CostAnalysisOptions that specify the bandwidth.
#19241 merged Nov 15, 2024
Move profiler plugin functions to a separate pybind11 module
#19234 merged Nov 15, 2024
Move layout changing and dtype changing copies out of host memory space'
#19268 merged Nov 15, 2024
[XLA] Ensure infeed/outfeed ordering
#19274 merged Nov 15, 2024
Moves our C++ ShardingConfig to third_party/ so that it can be used by hlo_module_config.h, etc.
#19362 merged Nov 15, 2024
Add PJRT_Buffer_CopyRawToHost to PJRT C API.
#19266 merged Nov 14, 2024
Integrate LLVM at llvm/llvm-project@03730cdd3d10
#19357 merged Nov 14, 2024
PR #79955: Update the curl dependency: 8.6.0 -> 8.11.0.
#19352 merged Nov 14, 2024
In a previous CL to cache the results of GetMeshDimPermutationOrderInShardingSpec, the hash and equality functions used for the cache key were incompatible. This CL fixes that.
#19361 merged Nov 14, 2024
Fix formatting in developer_guide.md
#19348 merged Nov 14, 2024
Add filter functionality to TraceMeRecorder to filter events based on filter parameter.
#19359 merged Nov 14, 2024
gemm_rewriter_test: Split and optimize to allow passing in coverage mode.
#19324 merged Nov 14, 2024
Relax the requirement of reusing remap so that it can take multiple inputs as long as each output is mapped from only one input.
#19309 merged Nov 14, 2024
Disable libnvjitlink by default in OSS.
#19355 merged Nov 14, 2024
Remove unneeded caching of parent Executor in RocmCommandBuffer class.
#19356 merged Nov 14, 2024
Remove TODOs associated with fixed bugs.
#19347 merged Nov 14, 2024
[IFRT] Add ifrt-translate mlir tool for verifying dialect conversions.
#19329 merged Nov 14, 2024
Make gpu_executor.h only used by RocmExecutor and CudaExecutor.
#19280 merged Nov 14, 2024
Integrate LLVM at llvm/llvm-project@97298853b4de
#19333 merged Nov 14, 2024
Reverts 2a7890387f812c17fb5f17eec961ee52ac3e059d
#19337 merged Nov 14, 2024
PR #18248: cuda 12.6.2
#19293 merged Nov 14, 2024
Remove unused variable (NFC).
#19332 merged Nov 14, 2024
[XLA:CPU] Add benchmarks for 1D strided convolutions
#19261 merged Nov 14, 2024
Enable libnvjitlink by default in OSS
#19257 merged Nov 14, 2024

330 Pull requests opened by 17 people

PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#19335 opened Nov 14, 2024
Integrate LLVM at llvm/llvm-project@627b8f87e2c4
#19341 opened Nov 14, 2024
Add S4/U4 support for Reshape.
#19351 opened Nov 14, 2024
Reverts 27b4c50d002c85311e5ca9ba2b6089df70c359b3
#19354 opened Nov 14, 2024
PR #16775: Add test for EmitReducePrecisionIR
#19370 opened Nov 14, 2024
HostOffloader: use HloInstructionSet/HloInstructionMap instead of absl::flat_hash_set/absl::flat_hash_map.
#19373 opened Nov 15, 2024
Integrate LLVM at llvm/llvm-project@b3134fa23383
#19374 opened Nov 15, 2024
[XLA] Move ScopedLoggingTimerAndTraceMe into OSS level
#19375 opened Nov 15, 2024
[PJRT C API] Move PJRT_Buffer_CopyRawToHost to end of PjRT API struct. Update struct size accordingly.
#19395 opened Nov 15, 2024
Reverts 590b36f89d8cb038e9e3929aeaea6e60451ef3fc
#19406 opened Nov 15, 2024
Add GPU topology proto python target.
#19410 opened Nov 16, 2024
Integrate LLVM at llvm/llvm-project@64c455077abe
#19411 opened Nov 16, 2024
Fix issues on explicit batch dimensions in xla sharding propagation and spmd partitioner.
#19414 opened Nov 16, 2024
to be removed
#19421 opened Nov 18, 2024
PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#19424 opened Nov 18, 2024
Mess-up the API.
#19446 opened Nov 18, 2024
In progress. Not ready for review.
#19448 opened Nov 18, 2024
Integrate LLVM at llvm/llvm-project@6e1acdcdc1b3
#19450 opened Nov 18, 2024
PR #19451: Setting xla_gpu_multi_streamed_windowed_einsum to true by default
#19455 opened Nov 18, 2024
In progress. Not ready for review. Approach 2.
#19456 opened Nov 19, 2024
Remove unused `tf_additional_core_deps`
#19457 opened Nov 19, 2024
Add ExplicitStreamAnnotationAsyncWrapper pass
#19462 opened Nov 19, 2024
PR #19429: Fix deterministic scatter expander pass and re-enable it by default
#19466 opened Nov 19, 2024
PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#19473 opened Nov 19, 2024
Reverts 06e8ef8ec494877e459ff357f9479613a472c67b
#19474 opened Nov 19, 2024
Enable some tests for float8 types in elemental_ir_emitter_test.cc
#19477 opened Nov 19, 2024
[XLA:GPU] Support bf16 for sort
#19479 opened Nov 19, 2024
Switch JAX to PJRT C API, for GPU.
#19492 opened Nov 19, 2024
[Upkeep][XLA-Code-Health] Resolve 3 instances of the following issue: Todo (resolved)
#19494 opened Nov 19, 2024
Split out MemorySpaceAssignmentTest class for re-use.
#19497 opened Nov 19, 2024
Integrate LLVM at llvm/llvm-project@68b7ab127f58
#19505 opened Nov 19, 2024
[Upkeep][XLA-Code-Health] Resolve 2 instances of the following issue: Todo (resolved)
#19514 opened Nov 19, 2024
Remove `py_import.bzl` and `verify_manylinux_compliance` from export after https://github.com/openxla/xla/commit/c6f26d4efa1ac071a1b39c9a81dae6214da22be3#diff-58867c8fba0bb0782bc1a4ef7e3f3c2727706be99debc019c256640ff720b3d7
#19516 opened Nov 20, 2024
Assert same channel ID if present
#19521 opened Nov 20, 2024
PR #19451: Remove xla_gpu_multi_streamed_windowed_einsum
#19537 opened Nov 20, 2024
[JAX] Ignore process index when calculating module hash.
#19550 opened Nov 20, 2024
[tsl:concurrency] NFC: Align CountDownAsyncValueRef atomics to cache line boundary
#19554 opened Nov 20, 2024
Integrate LLVM at llvm/llvm-project@68b7ab127f58
#19556 opened Nov 20, 2024
Enable a test
#19557 opened Nov 20, 2024
Noop in OSS
#19559 opened Nov 20, 2024
Bump rules_python version to 0.39.0
#19567 opened Nov 20, 2024
Revert `edf18ce` and fix launch dimension triplet
#19582 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in dot_operand_converter.cc
#19586 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in conv_rewriter.cc
#19587 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in softmax_rewriter_triton_test.cc
#19588 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in layout_assignment.cc
#19589 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in dot_dimension_sorter.cc
#19591 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_norm_rewriter.cc
#19593 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in fusion_block_level_rewriter_test.cc
#19594 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in horizontal_loop_fusion.cc
#19595 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in triton_fusion_numerics_verifier.cc
#19598 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in rename_fusions.cc
#19600 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_vectorize_convolutions.cc
#19601 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in scheduling_instruction_annotator.cc
#19602 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in pipelined_p2p_rewriter.cc
#19603 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in gpusolver_rewriter.cc
#19604 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in multi_output_fusion.cc
#19605 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in variadic_op_splitter.cc
#19606 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fusion_compiler.cc
#19607 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in schedule_postprocessing.cc
#19608 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in gemm_rewriter.cc
#19610 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in double_buffer_loop_unrolling_test.cc
#19611 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in stream_attribute_annotator.cc
#19612 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in horizontal_loop_fusion_test.cc
#19613 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in dynamic_slice_fusion_rewriter.cc
#19614 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in priority_fusion.cc
#19615 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in nest_gemm_fusion.cc
#19616 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in windowed_einsum_handler_test.cc
#19617 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in dot_normalizer.cc
#19618 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in triangular_solve_rewriter.cc
#19619 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in sanitize_constant_names.cc
#19620 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in scatter_expander.cc
#19621 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in softmax_rewriter_triton.cc
#19623 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in move_copy_to_users.cc
#19624 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in dot_algorithm_rewriter.cc
#19625 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in copy_fusion.cc
#19626 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in reduce_scatter_creator.cc
#19627 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_conv_rewriter.cc
#19628 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in windowed_einsum_handler.cc
#19630 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in command_buffer_scheduling_test.cc
#19631 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in scatter_slice_simplifier.cc
#19632 opened Nov 21, 2024
[Cleanup] Use HloPredicateIs(Not)Op in sort_rewriter.cc
#19633 opened Nov 21, 2024
Testing after force
#19636 opened Nov 21, 2024
Integrate LLVM at llvm/llvm-project@7b5b01980c3b
#19646 opened Nov 21, 2024
[ROCm] Implement hermetic rocm dependency
#19649 opened Nov 21, 2024
Reverts 93f9dda11dff8eb32aa0e287ed1350ba334ddd6d
#19653 opened Nov 21, 2024
Legalize more dialects in shardy
#19663 opened Nov 21, 2024
Integrate LLVM at llvm/llvm-project@a12e79a85fc1
#19673 opened Nov 22, 2024
[xla] Add S4/U4 support to reshape
#19681 opened Nov 22, 2024
Remove custom logging implementation from TSL
#19694 opened Nov 22, 2024
[XLA:GPU] Move dot_algorithm_rewriter from xla/servive/gpu/transforms to xla/hlo/transforms
#19695 opened Nov 22, 2024
Upgrade Abseil to latest LTS branch (lts_2024_07_22). Also update or-tools to v9.11.
#19696 opened Nov 22, 2024
Explicit stream annotation: Set ExecutionStreamId based on frontend attribute
#19699 opened Nov 22, 2024
[XLA:GPU] Consolidate sort optimizations in a dedicated compiler pass.
#19725 opened Nov 23, 2024
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#19728 opened Nov 23, 2024
[Cleanup] Use HloPredicateIs(Not)Op
#19733 opened Nov 23, 2024
Automated Code Change
#19737 opened Nov 24, 2024
Automated Code Change
#19738 opened Nov 24, 2024
Automated Code Change
#19739 opened Nov 24, 2024
Automated Code Change
#19740 opened Nov 24, 2024
Automated Code Change
#19741 opened Nov 24, 2024
Automated Code Change
#19743 opened Nov 24, 2024
Automated Code Change
#19744 opened Nov 24, 2024
Automated Code Change
#19745 opened Nov 24, 2024
Automated Code Change
#19746 opened Nov 24, 2024
Automated Code Change
#19747 opened Nov 24, 2024
Automated Code Change
#19748 opened Nov 24, 2024
Automated Code Change
#19750 opened Nov 24, 2024
Automated Code Change
#19751 opened Nov 24, 2024
Automated Code Change
#19757 opened Nov 25, 2024
Automated Code Change
#19758 opened Nov 25, 2024
Automated Code Change
#19759 opened Nov 25, 2024
Automated Code Change
#19761 opened Nov 25, 2024
Reverts changelist 633145137
#19790 opened Nov 25, 2024
Support multiple CPU tasks in a TfrtCpuClient.
#19796 opened Nov 25, 2024
Change the default partitioning method for gather and scatter to kExplicitBatch.
#19809 opened Nov 25, 2024
Add lax.composite primitive
#19813 opened Nov 26, 2024
[XLA:Collective] Support normalizing all-reduce
#19819 opened Nov 26, 2024
Automated Code Change
#19822 opened Nov 26, 2024
PR #18840: [NVIDIA] Support larger head dim for cudnn fmha
#19828 opened Nov 26, 2024
PR #19026: [NVIDIA GPU] LHS enhancement for multiple collective resources
#19833 opened Nov 26, 2024
Add support for async dynamic slice fusion
#19834 opened Nov 26, 2024
Reverts eab45d5da2fab59de8c02678c2b7b9ae69d9fef8
#19842 opened Nov 26, 2024
Integrate LLVM at llvm/llvm-project@b214ca82daee
#19844 opened Nov 26, 2024
Test new approach.
#19854 opened Nov 26, 2024
Add GC support for PjitFunctionCache
#19856 opened Nov 26, 2024
Add function to CombineInferenceStatsResults
#19859 opened Nov 26, 2024
Reenable XLA Windows build
#19860 opened Nov 26, 2024
Automated Code Change
#19872 opened Nov 27, 2024
PR #19754: [ROCm] Enable gemm fusion autotuner.
#19873 opened Nov 27, 2024
Automated Code Change
#19875 opened Nov 27, 2024
Integrate LLVM at llvm/llvm-project@b214ca82daee
#19881 opened Nov 27, 2024
PR #19669: Replace custom free-threading flag by rules_python is_py_freethreaded in Nanobind
#19886 opened Nov 27, 2024
[ROCm] fix fp8 in buffercomparator
#19896 opened Nov 27, 2024
Integrate LLVM at llvm/llvm-project@92a15dd7482f
#19897 opened Nov 27, 2024
Add function to convert Multi XSpace to InferenceStats.
#19899 opened Nov 27, 2024
Replace dependency on pre-built wheels with `py_import` dependency.
#19900 opened Nov 27, 2024
[XLA:CPU] Extend the custom algorithm for transposed convolutions
#19921 opened Nov 28, 2024
Make Tensorflow runtime handle errors returned by CudaExecutor::CreateDeviceDescription
#19926 opened Nov 28, 2024
Automated Code Change
#19932 opened Nov 29, 2024
PR #19927: [cuda] Warn about ptxas versions before CUDA 12.6.3
#19942 opened Nov 29, 2024
Automated Code Change
#19952 opened Nov 30, 2024
[XLA:GPU] Clean-up DeviceDescription
#19958 opened Nov 30, 2024
Automated Code Change
#19969 opened Dec 2, 2024
Automated Code Change
#19970 opened Dec 2, 2024
Automated Code Change
#19971 opened Dec 2, 2024
Automated Code Change
#19972 opened Dec 2, 2024
Integrate LLVM at llvm/llvm-project@010317e1731d
#19974 opened Dec 2, 2024
Automated Code Change
#19977 opened Dec 2, 2024
Add link to openxla.org homepage to README
#19981 opened Dec 2, 2024
Integrate LLVM at llvm/llvm-project@c1dcf75a7cf8
#19989 opened Dec 2, 2024
[XLA:GPU] Fix typo in custom_kernel_fusion.h
#19994 opened Dec 2, 2024
CHLO defns for a ragged dot that permits ragged batch and contraction.
#19996 opened Dec 2, 2024
Internal CI/CD change
#20000 opened Dec 2, 2024
[AllGatherCodeMotion] Add a pass that can code motion all-gathers in while loops.
#20023 opened Dec 2, 2024
Update clone with new operands to handle ragged all-to-all.
#20029 opened Dec 2, 2024
Fix resource number calculation in the latency hiding scheduler.
#20035 opened Dec 2, 2024
Integrate LLVM at llvm/llvm-project@1250a1db1a37
#20039 opened Dec 3, 2024
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#20042 opened Dec 3, 2024
<REMOVE THIS TAG ONCE DIFFBASE IS SUBMITTED>
#20047 opened Dec 3, 2024
Automated Code Change
#20055 opened Dec 3, 2024
Automated Code Change
#20058 opened Dec 3, 2024
Automated Code Change
#20059 opened Dec 3, 2024
Automated Code Change
#20062 opened Dec 3, 2024
Automated Code Change
#20065 opened Dec 3, 2024
Integrate LLVM at llvm/llvm-project@2af2634c64b1
#20066 opened Dec 3, 2024
Automated Code Change
#20067 opened Dec 3, 2024
Integrate LLVM at llvm/llvm-project@1d6ab189be03
#20069 opened Dec 3, 2024
Automated Code Change
#20074 opened Dec 3, 2024
Automated Code Change
#20075 opened Dec 3, 2024
Automated Code Change
#20077 opened Dec 3, 2024
Integrate LLVM at llvm/llvm-project@f71ea4bc1b01
#20079 opened Dec 3, 2024
PR #19096: Add F4E2M1FN and F8E8M0FNU types
#20080 opened Dec 3, 2024
#sdy support StableHLO from refining Shardy ops with polymorphic shapes
#20081 opened Dec 3, 2024
[NVIDIA GPU] Fix mem p2p init in collective permute thunk
#20086 opened Dec 3, 2024
[XLA] Add -Werror=mismatched-tags to --config=linux.
#20088 opened Dec 3, 2024
incorporate Range into while_loop_analysis
#20099 opened Dec 3, 2024
Change RBE container
#20101 opened Dec 3, 2024
Set the XlaCallModule's StableHLO payload version during deserialization so that it can be used in re-serialization.
#20106 opened Dec 4, 2024
Automated Code Change
#20111 opened Dec 4, 2024
Automated Code Change
#20115 opened Dec 4, 2024
[XLA:GPU] Enable native BF16 arithmetic ops on Ampere GPUs.
#20119 opened Dec 4, 2024
PR #20006: [XLA:GPU] Only allow horizontal loop fusion for default memory space
#20121 opened Dec 4, 2024
Make Tensorflow runtime handle errors returned by CudaExecutor::CreateDeviceDescription
#20131 opened Dec 4, 2024
Integrate LLVM at llvm/llvm-project@026fbe519e16
#20132 opened Dec 4, 2024
Enable XLA:TPU client to take in parameters
#20133 opened Dec 4, 2024
Integrate LLVM at llvm/llvm-project@026fbe519e16
#20141 opened Dec 4, 2024
Fix wrong index when inserting a copy from host to a call's parameter
#20150 opened Dec 4, 2024
Add support for emitting a kCall from an async instruction
#20155 opened Dec 4, 2024
[ROCm] Fix tests breaking due to change in threads_per_warp
#20164 opened Dec 4, 2024
PR #19699: Explicit stream annotation: Set ExecutionStreamId based on frontend attribute
#20169 opened Dec 5, 2024
[XLA:GPU:Rocm] Fix wavefront size on gfx10+
#20170 opened Dec 5, 2024
Remove obsolete TODOs and the ones associated with closed bug in platforms/xla/*
#20172 opened Dec 5, 2024
Add an interface to MSA to allow post allocation transformation on hlo module.
#20175 opened Dec 5, 2024
Map split dimensions for bitcast positions.
#20181 opened Dec 5, 2024
Introduce shape splitting into MSA.
#20182 opened Dec 5, 2024
Create shim targets for most commonly used TSL headers in preparation for updating users
#20183 opened Dec 5, 2024
Update references to JAX's GitHub repo
#20202 opened Dec 5, 2024
Run sparse tests in presubmit
#20203 opened Dec 5, 2024
PR #19669: Replace custom free-threading flag by rules_python is_py_freethreaded in Nanobind
#20207 opened Dec 5, 2024
Update TF RBE container hashes.
#20217 opened Dec 5, 2024
Added `GetAliveTasks` RPC to coordination service.
#20218 opened Dec 5, 2024
Added `jax.experimental.multihost_utils.alive_devices` API.
#20220 opened Dec 5, 2024
Integrate LLVM at llvm/llvm-project@c54616ea481a
#20224 opened Dec 5, 2024
remove use_host_argument_layout and use executable layouts.
#20227 opened Dec 5, 2024
Support evaluation in the absence of layouts when possible
#20232 opened Dec 5, 2024
Reverts 3d31c48c719d331d432132b3e0c2c5ce52650675
#20240 opened Dec 6, 2024
[hlo-opt] move the tool to hlo/tools/ directory
#20242 opened Dec 6, 2024
Automated Code Change
#20243 opened Dec 6, 2024
Integrate LLVM at llvm/llvm-project@e6cf5d2863b7
#20245 opened Dec 6, 2024
[XLA:GPU] Replace usages of `xla_experimental_exec_time_optimization_effort` with `jax_exec_time_optimization_effort`.
#20249 opened Dec 6, 2024
Integrate LLVM at llvm/llvm-project@9d2351ab9aff
#20252 opened Dec 6, 2024
#sdy remove `mhlo.tan` `CustomCallOp` from the registry as a StableHLO equivalent now exists.
#20253 opened Dec 6, 2024
[XLA:CPU] Expose better parallelism control
#20256 opened Dec 6, 2024
Integrate LLVM at llvm/llvm-project@9d2351ab9aff
#20267 opened Dec 6, 2024
In progress experimentation for supporting JAX Arrays with variable-width strings (i.e., with dtype = StringDType).
#20269 opened Dec 6, 2024
Integrate LLVM at llvm/llvm-project@12bdeba76eef
#20271 opened Dec 6, 2024
[ROCm] Emit allocas on function entry in lower_tensors.cc
#20274 opened Dec 6, 2024
Remove barely used function is_static_dimension.
#20276 opened Dec 6, 2024
Remove barely used and unnecessary XlaShape::SetProto
#20277 opened Dec 6, 2024
Add Shape::FromProto static factory method and replace usage.
#20278 opened Dec 7, 2024
change usage from constructor to factory method
#20279 opened Dec 7, 2024
Automated Code Change
#20281 opened Dec 7, 2024
Automated Code Change
#20282 opened Dec 7, 2024
Automated Code Change
#20287 opened Dec 8, 2024
cuda_root_path: Find cuda libraries when installed with conda packages
#20288 opened Dec 8, 2024
Allow multiple invocations of exchange topologies, as long as the topology is consistent across restarts.
#20292 opened Dec 9, 2024
PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#20302 opened Dec 9, 2024
[XLA:CPU] Fix crash due to OOM in XLA's custom convolution algorithm.
#20304 opened Dec 9, 2024
Support dumping unoptimised hlo snapshots with argumnets in pjrt.
#20306 opened Dec 9, 2024
[XLA:GPU] Simplify `AllocatorRetry::AllocateRaw` control flow.
#20308 opened Dec 9, 2024
Added thread annotations to variable that claimed to not need them.
#20314 opened Dec 9, 2024
Removed unused `GetCoordinationServiceInstance` code.
#20315 opened Dec 9, 2024
Fix a breaking test
#20317 opened Dec 9, 2024
Integrate LLVM at llvm/llvm-project@be2df95e9281
#20326 opened Dec 9, 2024
[XLA] Handle empty leaf nodes in an original value
#20327 opened Dec 9, 2024
Migrate gather_operation_test to always use PjRt for its test backend.
#20329 opened Dec 9, 2024
Add runtime support for host calculation of offsets in ds fusion
#20332 opened Dec 9, 2024
Copy result_accuracy when deriving new instruction.
#20333 opened Dec 9, 2024
Fix missing template value
#20340 opened Dec 9, 2024
Add Shape::FromProto static factory method to replace constructor.
#20341 opened Dec 10, 2024
Automated Code Change
#20348 opened Dec 10, 2024
Automated Code Change
#20350 opened Dec 10, 2024
Replace std::string_view with absl::string_view
#20351 opened Dec 10, 2024
Automated Code Change
#20352 opened Dec 10, 2024
Automated Code Change
#20355 opened Dec 10, 2024
Automated Code Change
#20357 opened Dec 10, 2024
Automated Code Change
#20360 opened Dec 10, 2024
Automated Code Change
#20363 opened Dec 10, 2024
[XLA:GPU] Move calls to `allocation_attr.freed_by_func` into `BFCAllocator::AllocateRawInternal`.
#20368 opened Dec 10, 2024
Automated Code Change
#20370 opened Dec 10, 2024
Automated Code Change
#20371 opened Dec 10, 2024
[XLA:GPU] Fix logging bug.
#20372 opened Dec 10, 2024
[XLA:GPU] Re-run the host-offload-legalize pass after CSE
#20374 opened Dec 10, 2024
Improve speed and collision/aliasing resistance of Absl::HashOf() on HloModule/HloComputation:
#20375 opened Dec 10, 2024
Remove `//tensorflow/tsl/platform/strcat.h`.
#20386 opened Dec 10, 2024
Add result accuracy attribute to ExpOp in StableHlo.
#20388 opened Dec 10, 2024
PR #20313: Fix async wrapper to walk child computations
#20391 opened Dec 10, 2024
Revert PR #20025: [NVIDIA GPU] LHS enhancement for collective multi-streaming
#20393 opened Dec 10, 2024
[XLA:GPU] Add NVSHMEM library and initialization test
#20395 opened Dec 10, 2024
Respect DeviceAssignment in HloRunnerPjRt.
#20398 opened Dec 10, 2024
Elementwise Ops in Collective Pipeliner
#20399 opened Dec 10, 2024
Add a HLOPrintOption to control printing of the parameter number for parameters.
#20402 opened Dec 10, 2024
Add has_megacore and has_merged_vmem in XPlane stats.
#20404 opened Dec 10, 2024
Automated Code Change
#20410 opened Dec 11, 2024
Automated Code Change
#20412 opened Dec 11, 2024
Automated Code Change
#20413 opened Dec 11, 2024
Integrate LLVM at llvm/llvm-project@eacdbc269e5f
#20414 opened Dec 11, 2024
Automated Code Change
#20423 opened Dec 11, 2024
Layout assignment: Reset memory space in result layout
#20426 opened Dec 11, 2024
[XLA:FFI] Fix C API
#20428 opened Dec 11, 2024
[PJRT] Expose `ExecutionContext` when executing a `LoadedExecutable`
#20429 opened Dec 11, 2024
Make ErrorSpec constructor `constexpr`.
#20435 opened Dec 11, 2024
PR #19649: [ROCm] Implement hermetic rocm dependency
#20437 opened Dec 11, 2024
public message needed only for shardy change, the change will be separately submitted in cl/703502551 and this message removes
#20439 opened Dec 11, 2024
Use int64_t for thread ids instead of int32_t
#20442 opened Dec 11, 2024
Add action which automatically runs CI for public OpenXLA GitHub org members
#20444 opened Dec 11, 2024
Add nozapfhahn for opt provider files, to ignore expected mutant testing warnings during submissions
#20451 opened Dec 11, 2024
Reverts 555ba9967696cdea8c768ac5b44e59a3582a9820
#20453 opened Dec 12, 2024
PR #77927: [oneDNN] upgrading oneDNN version to 3.6
#20456 opened Dec 12, 2024
Automated Code Change
#20462 opened Dec 12, 2024
[XLA:CPU] Move kernel prototype properties inside of KernelApiIrBuilder
#20467 opened Dec 12, 2024
[XLA:GPU] Rely on LLVM parser rather than objcopy to load fatbin in tests
#20474 opened Dec 12, 2024
[XLA::CPU] Update `ElementalKernelEmitter` to take HLO instruction instead of shapes.
#20476 opened Dec 12, 2024
[XLA:GPU] Force cuDNN convolutions to be assigned a `NHWC` layout from Hopper.
#20477 opened Dec 12, 2024
[XLA:CPU] Improve F8E4M3 accuracy
#20478 opened Dec 12, 2024
Integrate LLVM at llvm/llvm-project@03cbe42627c7
#20480 opened Dec 12, 2024
Migrate StableHLO Python extension to nanobind.
#20481 opened Dec 12, 2024
Remove CHECK macros from tsl/platform/default/logging.h
#20483 opened Dec 12, 2024
Add missing default for
#20485 opened Dec 12, 2024
Use CUPTI activity markers instead of nvtx driver callbacks for NVTX tracking.
#20488 opened Dec 12, 2024
Add more safety checks to BlockingCounter
#20489 opened Dec 12, 2024
PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk
#20490 opened Dec 12, 2024
Support host->device dynamic-update-slice in HostOffloader.
#20492 opened Dec 13, 2024
Update slop_factor flag desc in debug_options_flags.cc
#20494 opened Dec 13, 2024
Make max value in Range optional to allow for Unbounded Range calculations.
#20495 opened Dec 13, 2024
Make max value in Range optional to allow for Unbounded Range calculations.
#20496 opened Dec 13, 2024
Extend the visibility of //third_party/tensorflow/compiler/xla/stream_executor/gpu:gpu_init_impl to the friends group.
#20499 opened Dec 13, 2024
Integrate LLVM at llvm/llvm-project@5e53a8dadb00
#20500 opened Dec 13, 2024
Replace some usage of tsl::BlockingCounter with absl::BlockingCounter.
#20501 opened Dec 13, 2024
Update the forward propagation for slice instructions in xla::ShardingPropagation.
#20502 opened Dec 13, 2024
fixed the typos in bfloat16_propagation_test.cc
#20503 opened Dec 13, 2024
[XLA:CPU] Test elemental comparison ops
#20509 opened Dec 13, 2024
[XLA:CPU] Export codegen testlib functionality via __init__.py
#20510 opened Dec 13, 2024
PR #19649: [ROCm] Implement hermetic rocm dependency
#20512 opened Dec 13, 2024
[XLA:CPU] Move CPU ElementalIrEmitter implementation into a separate file
#20514 opened Dec 13, 2024
[XLA:CPU] Add ability to dump the LLvmIrSource to a string in the testlib
#20515 opened Dec 13, 2024
Simplify `WaitAndLogIfStuck()` to use `absl::Barrier` instead of `tsl::BlockingCounter`.
#20516 opened Dec 13, 2024
#sdy make mhlo_export_pipeline test use no rank maximal sharding.
#20518 opened Dec 13, 2024
#sdy add constraints to maximal shardings during verification.
#20519 opened Dec 13, 2024
[XLA:GPU] Synchronize compute and communication streams in NCCL group implementation
#20523 opened Dec 13, 2024
Move some ShapeUtil validation to cpp. Create generic ShapeError.
#20524 opened Dec 13, 2024
Remove CHECKs for inline vectors. Returning 0 is functionally correct.
#20525 opened Dec 13, 2024
Temporary check to avoid cases where wrong result sharding causes a RET_CHECK to fail in hlo_sharding.cc.
#20526 opened Dec 13, 2024
[XLA:GPU] update collective pipeline parallelism execution test to include nccl groups in a while loop
#20527 opened Dec 13, 2024
[XLA:GPU] update microbenchmarks to include nccl groups in a while loop
#20528 opened Dec 13, 2024
Use absl::string_view instead of std::string_view as some environments (e.g. Android) don't provide std::string_view.
#20530 opened Dec 13, 2024
Integrate LLVM at llvm/llvm-project@5f72f2c8fd6c
#20531 opened Dec 13, 2024
Make collective selet folder convert-aware
#20532 opened Dec 13, 2024
[XLA] Add S1 and U1 as data types
#20533 opened Dec 13, 2024
Remove CUDA dependencies from jaxlib wheel.
#20534 opened Dec 13, 2024
Reverts a8a0a7ec72ed3960b33a061c12e43d7e5beb9087
#20535 opened Dec 13, 2024
[Shardy] HLO ⇄ MHLO to HLO ⇄ StableHLO
#20536 opened Dec 13, 2024
Remove legacy trivial main programs
#20537 opened Dec 13, 2024
[xla:gpu] Add an option to use persistent collective cliques
#20539 opened Dec 14, 2024
Optimize XLA SPMD Slice partitioner, containing the following 3 steps.
#20540 opened Dec 14, 2024
[xla:cpu] Implement 2D and 3D loop parallelization
#20541 opened Dec 14, 2024
[xla:cpu] Add an xnn_threadpool for wrapping ParallelLoopRunner as pthreadpool API
#20542 opened Dec 14, 2024
Integrate LLVM at llvm/llvm-project@af20aff35ec3
#20543 opened Dec 14, 2024
Reverts 51658ce1da33166ef422286cbc2c194f7a464f65
#20544 opened Dec 14, 2024

11 Issues closed by 11 people

[BUG] Returning a matrix crashes the HLO runner
#9307 closed Dec 11, 2024
Does Tensorflow XLA use MLIR?
#20125 closed Dec 10, 2024
Profiling crashes when run using JAX
#4431 closed Dec 9, 2024
CPU vs GPU inconsistency with at[].set ops under jit
#19716 closed Dec 4, 2024
Nondeterminism in triton-autotuner and others leads to hangs in multi-process gpu execution
#7716 closed Dec 3, 2024
The tiles of some instructions in Triton fusion are not powers of 2. How did this issue arise?
#19478 closed Dec 3, 2024
Pallas/Triton gives incorrect result on Nvidia H100 GPU
#19780 closed Dec 2, 2024
//xla/service/cpu:vectorized_reduce_with_no_vector_registers_test fails on Apple Silicon
#19830 closed Nov 29, 2024
//xla/stream_executor/cuda:compilation_provider_test should be skipped by on CPU builds
#19829 closed Nov 26, 2024
MatmulTest.TestF32ConstantWeights failure on arm64 linux
#19416 closed Nov 21, 2024
[Support request] Concatenate fusion failed to lower to LLVM IR
#19499 closed Nov 21, 2024

12 Issues opened by 10 people

Make `cache_all_gather` configurable
#20508 opened Dec 13, 2024
Copies supersede OptimizationBarrier
#20440 opened Dec 11, 2024
Activation offloading with scan fails in XLA with same shape pinned-host/device scan outputs
#20373 opened Dec 10, 2024
Darwin CPU "LLVM ERROR: inconsistency in registered CommandLine options"
#20284 opened Dec 8, 2024
Merge XSpace profiles from different hosts.
#19902 opened Nov 27, 2024
PjRtFutures & Async Computations issue
#19898 opened Nov 27, 2024
Explicilty interface for AcquireExternalReference and WaitExternalReference
#19845 opened Nov 26, 2024
//xla/tests:complex_unary_op_test_cpu fails on macOS Apple Silicon
#19824 opened Nov 26, 2024
[ROCm] failed to legalize operation 'math.exp' for exponential op with bf16 dtype
#19700 opened Nov 22, 2024
[GPU bug] Memcpy local p2p leads to numerical issues (mem access problems).
#19555 opened Nov 20, 2024
`stablehlo.cbrt` crashes with complex numbers when StableHLO spec supports it
#19482 opened Nov 19, 2024
is set access to the previous GPUs at cudamallocasync_allocator.cc cause allocateRaw slower?
#19431 opened Nov 18, 2024

38 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add F4E2M1FN and F8E8M0FNU types
#19096 commented on Dec 12, 2024 • 37 new comments
[RematerializeLargeAllGather] Add pass to rematerialize large tensor parallel all-gathers. Allow configurable bytes.
#19163 commented on Dec 9, 2024 • 10 new comments
[NVIDIA GPU] Support multi-operand collective-permute
#18838 commented on Dec 6, 2024 • 3 new comments
[XLA:CPU][oneDNN] Custom Call oneDNN Thunk Support
#18494 commented on Dec 4, 2024 • 3 new comments
[XLA:CPU][oneDNN] Alias result to addend when feasible
#17637 commented on Nov 22, 2024 • 3 new comments
[XLA:CPU][oneDNN] Move simplification pass before oneDNN pass
#19067 commented on Nov 18, 2024 • 1 new comment
XLA documentation for Windows
#12028 commented on Dec 6, 2024 • 0 new comments
Use OpTrait::DotLike instead of op name to identify dot operations
#19145 commented on Dec 3, 2024 • 0 new comments
Bugfix sharding related logging
#19164 commented on Nov 20, 2024 • 0 new comments
[SPMD] Introduce an option to limit communication around gradient accumulation in Jax scan ops.
#19166 commented on Dec 1, 2024 • 0 new comments
[XLA:MSA] Add dynamic-slice to async conversion in msa
#19180 commented on Dec 5, 2024 • 0 new comments
[MPMD-GPU] Add `is_subslice_topology` to the IFRT Topology.
#19203 commented on Nov 18, 2024 • 0 new comments
[MPMD-GPU] Make Pathways IFRT client get GPU topology as well.
#19204 commented on Nov 18, 2024 • 0 new comments
Integrate Triton up to [9732c047](https://github.com/openai/triton/commits/9732c04701bd856daca89bde38bafa4636ca56a8)
#19259 commented on Nov 15, 2024 • 0 new comments
Refactor JAX build wheel rule and add wheel_library targets.
#19277 commented on Nov 21, 2024 • 0 new comments
StableHLO Reference Interpreter PJRT Plugin
#19287 commented on Nov 14, 2024 • 0 new comments
PR #19026: [NVIDIA GPU] LHS enhancement for multiple collective resources
#19289 commented on Nov 18, 2024 • 0 new comments
PR #19275: [NVIDIA] Add fixes for supporting determinism expander for high-dimensional scatter operation and a flag to disable it
#19296 commented on Nov 15, 2024 • 0 new comments
Create repository rule to generate a file with python wheel version.
#19308 commented on Dec 10, 2024 • 0 new comments
PR #77927: [oneDNN] upgrading oneDNN version to 3.6
#19319 commented on Nov 21, 2024 • 0 new comments
[XLA:Python] Modify DLPack behavior with unit dimensions.
#19327 commented on Nov 25, 2024 • 0 new comments
PR #19208: [gpu] Make collective permute decomposer accept no channel ids
#19328 commented on Nov 19, 2024 • 0 new comments
Error occurred while building TensorFlow
#14129 commented on Nov 27, 2024 • 0 new comments
clang: error: linker command failed with exit code 1
#14321 commented on Nov 27, 2024 • 0 new comments
JAX PJRT Client, GPU Build Fails. Linking error
#14475 commented on Nov 22, 2024 • 0 new comments
Error in compilation on GPU
#15744 commented on Nov 20, 2024 • 0 new comments
XLA flags: No speed ups on GPUs and segmentation fault
#17103 commented on Nov 14, 2024 • 0 new comments
Inadequate memory consumption when using HSDP without gradient accumulation
#18090 commented on Dec 5, 2024 • 0 new comments
How to run onednn convolution test on CPU
#18586 commented on Dec 2, 2024 • 0 new comments
XLA:CPU performance regression with the min alignment changed from 16 to 64
#18611 commented on Dec 5, 2024 • 0 new comments
pjrt_c_api_cpu_plugin - MacOS amd and arm64 compilation
#19152 commented on Nov 17, 2024 • 0 new comments
Move down_cast from the tensorflow to the tsl namespace
#5951 commented on Nov 20, 2024 • 0 new comments
[XLA:CPU][oneDNN] Absorb Transpose into matmul whenever possible
#18410 commented on Nov 18, 2024 • 0 new comments
[XLA:MSA] Re-enables synchronous copy and slice conversion to async by default.
#19035 commented on Nov 14, 2024 • 0 new comments
[XLA:CPU][oneDNN] Handle oneDNN scalar
#19066 commented on Nov 18, 2024 • 0 new comments
[StableHLO] Remove XlaCallModule's MHLO dependency
#19079 commented on Dec 13, 2024 • 0 new comments
PR #18331: [XLA:CPU] upgrading onednn version to 3.6
#19102 commented on Dec 12, 2024 • 0 new comments
[HLO->MHLO] Consolidate non-pipelined async ops into MHLO ops.
#19115 commented on Dec 5, 2024 • 0 new comments