-
Notifications
You must be signed in to change notification settings - Fork 451
Insights: openxla/xla
November 13, 2024 – December 13, 2024
Overview
Could not load contribution data
Please try again later
764 Pull requests merged by 2 people
-
Automated Code Change
#20461 merged
Dec 14, 2024 -
Configure HloPjRtTestBase with new option structs.
#20452 merged
Dec 14, 2024 -
Extend use_parameter_layout_on_device option to
ExecuteReplicated
.#20441 merged
Dec 14, 2024 -
Integrate LLVM at llvm/llvm-project@a21f9bfe29c2
#20538 merged
Dec 14, 2024 -
Remove a bunch of #if CUDA macro use in xla_compile_lib.
#20331 merged
Dec 14, 2024 -
Improve compilation time by not fusing large constants into LLVM modules for XLA::CPU.
#20257 merged
Dec 14, 2024 -
Swap output format of "MatchTrivialLoopRange" to better align with Range usage.
#20454 merged
Dec 13, 2024 -
timespan, xplane_visitor: Support operator<=.
#20405 merged
Dec 13, 2024 -
[XLA] Generalize type handling within InProcessCollectives
#20498 merged
Dec 13, 2024 -
Migrate some TSL code over to ABSL equivalents
#20457 merged
Dec 13, 2024 -
[xla:cpu] Add parallel loop runner
#20529 merged
Dec 13, 2024 -
Stop using redzone_allocator in autotuner_util.
#20520 merged
Dec 13, 2024 -
Add new
ml-build-rbe
container to the configurations of RBE used with remote config.#20459 merged
Dec 13, 2024 -
Integrate LLVM at llvm/llvm-project@bc29fc937c6c
#20511 merged
Dec 13, 2024 -
Move AutotunerUtil::CreateBuffer from a static method to a method on RedzoneAllocator.
#20517 merged
Dec 13, 2024 -
[XLA] Add some traceme annotations around XLA:CPU compilation and CPU compiler stack trace logging.
#20312 merged
Dec 13, 2024 -
[XLA:MSA] Allow cross-program prefetch for buffers that are already pinned to alternate memory.
#20316 merged
Dec 13, 2024 -
Replace TSL's BlockingCounter with absl's.
#20522 merged
Dec 13, 2024 -
Add TF package import tests in CPU presubmit, continuous and nightly jobs.
#20493 merged
Dec 13, 2024 -
Add has_megacore and has_merged_vmem in XPlane stats.
#20491 merged
Dec 13, 2024 -
[StableHLO] Add shape refinement callback to specify additional patterns.
#20321 merged
Dec 13, 2024 -
[IFRT] Add option to compile IFRT IR atom programs using Sdy
#20521 merged
Dec 13, 2024 -
[XLA:Collective] Add utility functions.
#20038 merged
Dec 13, 2024 -
[XLA:Python] Use &PyArray_Type rather than looking up numpy.ndarray via Python attrs.
#20513 merged
Dec 13, 2024 -
PR #20109: Create a workflow to run CPU benchmarks
#20475 merged
Dec 13, 2024 -
Reenable
buildifier
for all files underxla/
, fix warnings#19866 merged
Dec 13, 2024 -
Remove unneeded #ifdef'ed dependency.
#20328 merged
Dec 13, 2024 -
[xla:python] Add support for stateful FFI calls registered via Python.
#16789 merged
Dec 13, 2024 -
Integrate LLVM at llvm/llvm-project@5e53a8dadb00
#20507 merged
Dec 13, 2024 -
[XLA:CPU] Add shape method python binding to Literal
#20466 merged
Dec 13, 2024 -
Replace xla proto dependency to make TF happy
#20506 merged
Dec 13, 2024 -
[XLA:GPU] Readability and performance nits.
#20473 merged
Dec 13, 2024 -
Automated Code Change
#20504 merged
Dec 13, 2024 -
[XLA:GPU] Clean up
TF_RET_CHECK
s inTritonFusionAnalysis::ExecuteForDotFusion
.#20505 merged
Dec 13, 2024 -
Automated Code Change
#20416 merged
Dec 13, 2024 -
Use absl::Mutex::Await() instead of absl::CondVar::Wait() in XLA.
#19957 merged
Dec 13, 2024 -
Automated Code Change
#20418 merged
Dec 13, 2024 -
Automated Code Change
#20468 merged
Dec 13, 2024 -
Fix range analysis bug.
#20479 merged
Dec 13, 2024 -
[XLA] Guarantee ordering of infeeds/outfeeds across called computations
#19827 merged
Dec 13, 2024 -
[XLA:Python] Mark from_python and from_cpp methods of nanobind typecasters as noexcept.
#20482 merged
Dec 13, 2024 -
[XLA:GPU] add execution tests for NCCL group with partially pipelined send/recv instructions
#20178 merged
Dec 13, 2024 -
Fix an issue in
PartitionGatherTrivialSlicedOperandDimensions
when handling out-of-bound indices.#20401 merged
Dec 13, 2024 -
Make ReshapeOp return MHLO_AnyTensor instead of MHLO_StaticShapeTensor.
#18964 merged
Dec 13, 2024 -
Adds a "SHARDING" ProfileType to HloModuleProto.
#20283 merged
Dec 13, 2024 -
Implement flatten one level with keys in C++ and use it for the prefix/equality error printing.
#20044 merged
Dec 13, 2024 -
Switch multihost runner to public XLA:GPU target
#20486 merged
Dec 13, 2024 -
Add a tuple sharding when creating get-tuple-element(tuple(single_result)).
#20126 merged
Dec 13, 2024 -
Update
py_import
macros for the ability to unpack additional wheels in the same folder as the main wheel.#20089 merged
Dec 13, 2024 -
Update the Windows Docker CI image to include the C++ ATL library.
#20487 merged
Dec 12, 2024 -
Make CompiledMemoryStats::ToProto() const.
#20458 merged
Dec 12, 2024 -
#sdy add option to avoid escaping attribute when adding to frontend attrs.
#20472 merged
Dec 12, 2024 -
Integrate LLVM at llvm/llvm-project@0876c11ceeb0
#20465 merged
Dec 12, 2024 -
[XLA:GPU][Emitters] Move gpu/fusions/ir to backends/gpu/codegen/ir
#20471 merged
Dec 12, 2024 -
[XLA:GPU] Propagate all profiling failures to
gemm_fusion_autotuner.cc
.#20383 merged
Dec 12, 2024 -
PR #20463: Updated multiple typo's
#20470 merged
Dec 12, 2024 -
Automated Code Change
#20411 merged
Dec 12, 2024 -
[XLA:GPU][NFC] Modularize a little bit gpu_hlo_schedule.cc.
#20430 merged
Dec 12, 2024 -
PR #19099: [XLA:CPU][oneDNN] Add post-ops for oneDNN Convolutions
#20445 merged
Dec 12, 2024 -
[XLA:GPU] Remove restriction on
bitcast
s being a no-op with regards to tiling#20433 merged
Dec 12, 2024 -
[XLA:GPU] Deprecate diamond chains in
SoftmaxRewriterTriton
.#20431 merged
Dec 12, 2024 -
[XLA:GPU] Rollback introduction of
EmitterLocOpBuilder
.#20464 merged
Dec 12, 2024 -
[XLA:CPU] Implement ElementalKernelEmitter
#20378 merged
Dec 12, 2024 -
#sdy Add unique module name in Shardy dumps
#20436 merged
Dec 12, 2024 -
Fix build failure in nvjitlink_impl.cc
#20460 merged
Dec 12, 2024 -
Integrate LLVM at llvm/llvm-project@19bc282320ba
#20450 merged
Dec 12, 2024 -
Automated Code Change
#20285 merged
Dec 12, 2024 -
[XLA:GPU] Add a nested builder arg to the EmitXlaLoop builder.
#20443 merged
Dec 12, 2024 -
Automated Code Change
#20362 merged
Dec 12, 2024 -
Automated Code Change
#20361 merged
Dec 12, 2024 -
Skip inserting into frontend attr if the (key, value) pair already exists and override if key exists
#20255 merged
Dec 12, 2024 -
[Mosaic] Pad trailing transposes chunks with zeros.
#20390 merged
Dec 12, 2024 -
Split ROCm-specific backend calls into their own targets.
#20337 merged
Dec 12, 2024 -
[xla-auto-sharding] Fix potential dangling pointer (reference) bug.
#20446 merged
Dec 12, 2024 -
[xla:cpu] Add missing files from openxla/xla#16438
#20447 merged
Dec 11, 2024 -
[XLA:GPU] Schedule send/recv early if pipeline parallelism ops enabled
#20403 merged
Dec 11, 2024 -
Internal CI/CD change
#20212 merged
Dec 11, 2024 -
[XLA] Avoid redundant lookup in ConsumeResource
#20434 merged
Dec 11, 2024 -
Remove the
test_hlo_pjrt_runner
tag.#20392 merged
Dec 11, 2024 -
Add B100 to default Nvidia gpu backends
#20407 merged
Dec 11, 2024 -
Migrate broadcast_test to always use PjRt for its test backend.
#20345 merged
Dec 11, 2024 -
Add a default error spec field to HloRunnerAgnosticTestBase.
#20397 merged
Dec 11, 2024 -
[XLA:CPU] Benchmark for grouped strided convolutions
#20425 merged
Dec 11, 2024 -
[XLA GPU] Add additional unit tests for
IsPtxRegisterAllocationError
.#20424 merged
Dec 11, 2024 -
Integrate LLVM at llvm/llvm-project@eacdbc269e5f
#20420 merged
Dec 11, 2024 -
[XLA:GPU] Implement NcclRaggedAllToAllThunk.
#20265 merged
Dec 11, 2024 -
Fix infinite loop in TopKSplitter
#20422 merged
Dec 11, 2024 -
PR #20214: Evaluate simple offset values, if possible
#20303 merged
Dec 11, 2024 -
PR #20313: Fix async wrapper to walk child computations
#20421 merged
Dec 11, 2024 -
#sdy Swap XLA Shardy passes to use StableHLO instead of MHLO as much as possible.
#19939 merged
Dec 11, 2024 -
Internal: add missing dependency on numpy
#20298 merged
Dec 11, 2024 -
[XLA:GPU] Use
absl::Status
payload to more precisely identify register allocation errors.#20396 merged
Dec 11, 2024 -
[XLA:CPU] Use KernelApiIrBuilder in IrEmitter2
#20380 merged
Dec 11, 2024 -
[XLA:CPU] Add new KernelApiIrBuilder
#20379 merged
Dec 11, 2024 -
PR #20334: [nfc] clang-format is failing on unrelated PRs because of this
#20419 merged
Dec 11, 2024 -
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#20347 merged
Dec 11, 2024 -
[XLA:GPU] Decrease
VLOG
levels to start logging at level2
insoftmax_rewriter_triton.cc
.#20415 merged
Dec 11, 2024 -
fix audit wheel compliance issues for pywrap rules
#20356 merged
Dec 11, 2024 -
Cleanup inconsistent names/comments
#20406 merged
Dec 11, 2024 -
Remove unused ErrorSpec.
#20400 merged
Dec 11, 2024 -
[XLA:GPU] Guard send/recv schedule manipulation behind xla_gpu_enable_pipelined_p2p flag
#20382 merged
Dec 10, 2024 -
Migrate all_reduce_test to always use PjRt for its test backend.
#20344 merged
Dec 10, 2024 -
Replace std::string_view with absl::string_view
#20349 merged
Dec 10, 2024 -
Respect DeviceAssignment in HloRunnerPjRt.
#20389 merged
Dec 10, 2024 -
Migrate gather_operation_test to always use PjRt for its test backend.
#20339 merged
Dec 10, 2024 -
[HLO->MHLO] Consolidate non-pipelined async ops into MHLO ops.
#20309 merged
Dec 10, 2024 -
Add implicit device step tracking.
#20261 merged
Dec 10, 2024 -
Migrate copy_test to always use PjRt for its test backend.
#20343 merged
Dec 10, 2024 -
[XLA:GPU] Remove
--xla_gpu_experimental_enable_triton_softmax_priority_fusion
.#20384 merged
Dec 10, 2024 -
Set implicitTrunc on APInt creation
#20387 merged
Dec 10, 2024 -
Integrate LLVM at llvm/llvm-project@0f7b3a9407d2
#20377 merged
Dec 10, 2024 -
Replace std::string_view with absl::string_view
#20322 merged
Dec 10, 2024 -
Add MHLO
mhlo.custom_call @ragged_all_to_all
-> HLO RaggedAllToAll pass#20354 merged
Dec 10, 2024 -
[XLA:CPU] Update ShapeToIrType & PrimitiveTypeToIrType to take a LLVMContext
#20381 merged
Dec 10, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op
#19734 merged
Dec 10, 2024 -
[xla:gpu] Extracted
CreateTritonPipeline
into a separate target#20367 merged
Dec 10, 2024 -
[xla:cpu] Add an object pool for efficient xnnpack object pooling
#20307 merged
Dec 10, 2024 -
Add
test_migrated_to_hlo_runner_pjrt
tag toxla_test
.#20338 merged
Dec 10, 2024 -
[xla-auto-sharding] Add BRKGA heuristic as an XLA auto-sharding option.
#20330 merged
Dec 10, 2024 -
Integrate LLVM at llvm/llvm-project@be2df95e9281
#20365 merged
Dec 10, 2024 -
Automated Code Change
#20364 merged
Dec 10, 2024 -
PR #16438: aarch64: implement onednn matmul operator with explicit reorders
#20369 merged
Dec 10, 2024 -
[XLA:GPU] Disable cutlass dynamic-update-slice rewrite on V100.
#20366 merged
Dec 10, 2024 -
PR #20025: [NVIDIA GPU] LHS enhancement for collective multi-streaming
#20213 merged
Dec 10, 2024 -
[XLA:GPU] Use absl instead of tensorflow functions/types.
#20311 merged
Dec 10, 2024 -
Add debug option for failing the PTX compilation on register spilling
#20358 merged
Dec 10, 2024 -
Automated Code Change
#20290 merged
Dec 10, 2024 -
Replace std::string_view with absl::string_view
#20353 merged
Dec 10, 2024 -
[XLA] Don't bail when encountering complex loop pipelining patterns
#20258 merged
Dec 10, 2024 -
Add an interpreter PjRt client registry for testing.
#20336 merged
Dec 10, 2024 -
Add a new test base class for a default PjRt test runner w/ SE interpreter.
#20320 merged
Dec 10, 2024 -
Add RegisterMlirToHloDependentDialects to register required dependent dialects
#20318 merged
Dec 10, 2024 -
Register WhileLoopAllReduceCodeMotion pass to the opt tool
#20342 merged
Dec 10, 2024 -
Create OpStatsToRooflineModel, in preparation of Roofline Model creation
#20280 merged
Dec 10, 2024 -
IFRT proxy: Add profiler spans to all entrypoints at the client.
#20325 merged
Dec 9, 2024 -
[XLA] Fix latency hiding scheduler when faced with annotated no-op instructions.
#20324 merged
Dec 9, 2024 -
Add CPU specific passes for hlo-opt tool.
#20154 merged
Dec 9, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op
#20012 merged
Dec 9, 2024 -
Split up cusolver_context into CUDA-specific and ROCM-specific parts.
#20036 merged
Dec 9, 2024 -
[XLA:GPU] Remove unused
xla_experimental_exec_time_optimization_effort
flag.#20297 merged
Dec 9, 2024 -
Add support for CUDA 12.6.3 and CUDNN 9.5.1/9.6.0.
#20310 merged
Dec 9, 2024 -
[xla:gpu] Removed redundant parameter from
CompileTritonToLLVM
#20301 merged
Dec 9, 2024 -
[XLA:CPU] Add a Python extension for KernelRunner.
#20196 merged
Dec 9, 2024 -
[XLA:Python] Use nanobind::isinstance from upstream nanobind, delete xla::nb_isinstance.
#20250 merged
Dec 9, 2024 -
Reland of PR #19571. Fix test FunctionalHloRunnerTest.ShardedAutotuningWorks
#20197 merged
Dec 9, 2024 -
[XLA:GPU] Use
absl::Microseconds
instead of doing duration arithmetic.#20305 merged
Dec 9, 2024 -
Automated Code Change
#20194 merged
Dec 9, 2024 -
PR #18989: [AllGatherCSE] Add a pass that CSEs all-gathers on parameters.
#20300 merged
Dec 9, 2024 -
PR #20241: Updated Typo's in multiple documents
#20291 merged
Dec 9, 2024 -
[xla] Update warnings.bazelrc
#20299 merged
Dec 9, 2024 -
Reverts 26df9b97719020df83e882b31bbe4a7f2cbbdff5
#20293 merged
Dec 9, 2024 -
Automated Code Change
#20289 merged
Dec 9, 2024 -
Automated Code Change
#20286 merged
Dec 8, 2024 -
Add a public UpdateEntryComputationLayout method
#20262 merged
Dec 7, 2024 -
[xla:cpu] Add xnnpack dependency to xla:cpu runtime
#20264 merged
Dec 7, 2024 -
Add ability to disable TargetConfig metadata for se_gpu_pjrt_client.
#20275 merged
Dec 6, 2024 -
Allow platform-specific relaxation of fusion restrictions on in-place update ops.
#20273 merged
Dec 6, 2024 -
Add method for HloRunnerAgnosticTestBase implementations to preprocess modules.
#20229 merged
Dec 6, 2024 -
[XLA:GPU:ROCm] Restore threads per warp behavior
#20270 merged
Dec 6, 2024 -
[Cleanup] Do not std::move on return
#19735 merged
Dec 6, 2024 -
hlo_original_value: Don't blow up when printing empty values.
#20266 merged
Dec 6, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op
#19730 merged
Dec 6, 2024 -
Remove the use of TENSORFLOW_USE_ROCM from convolution_thunk.cc.
#20259 merged
Dec 6, 2024 -
[tsl] Deprecate tsl::mutex and tsl::condition_variable, make tsl::Condition an alias of absl::Condition
#20239 merged
Dec 6, 2024 -
Integrate StableHLO at openxla/stablehlo@b3d3cacd
#20263 merged
Dec 6, 2024 -
[xla-auto-sharding] Add SolveGreedy() heuristic to make local greedy choices.
#20260 merged
Dec 6, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op to unify opcode checking across XLA
#19731 merged
Dec 6, 2024 -
#sdy Fix MHLO<->HLO translation bug with multi result host offload functions.
#20130 merged
Dec 6, 2024 -
Allow programatic override of the default values for the gcs file system cache
#20254 merged
Dec 6, 2024 -
Re-enable current StableHLO current version attribute in PJRT
#20234 merged
Dec 6, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op
#19729 merged
Dec 6, 2024 -
[StableHLO] Refactor XlaCallModule to use more upstream StableHLO machinery.
#19132 merged
Dec 6, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op
#19732 merged
Dec 6, 2024 -
[XLA:GPU] Drop unnecessary bitcast from the chain convert(s4)->bitcast->dot operation
#20117 merged
Dec 6, 2024 -
[xla-auto-sharding] Generalize multilevel flag to heuristic solver option.
#20225 merged
Dec 6, 2024 -
Fix type stub for register_node to note that to_iterable_with_keys is optional.
#20251 merged
Dec 6, 2024 -
Exclude more broken tilings from Triton exhaustive autotuning
#20222 merged
Dec 6, 2024 -
Integrate LLVM at llvm/llvm-project@2ccf7ed277df
#20246 merged
Dec 6, 2024 -
[XLA:GPU] Add a test for RaggedAllToAll that runs on 8 GPUs.
#20139 merged
Dec 6, 2024 -
[xla:ffi] Add num_threads() API to external FFI thread pool
#20247 merged
Dec 6, 2024 -
Reverts 120fbf9e88b159bed64addb4da6de08b3c6ea5bc
#20244 merged
Dec 6, 2024 -
[XLA:LatencyHidingScheduler] Do not allow target-defined resources to have a concurrency limit of 0.
#20235 merged
Dec 6, 2024 -
[XLA] Support splitting ragged all-to-all into async start and done.
#20030 merged
Dec 6, 2024 -
Automated Code Change
#20192 merged
Dec 6, 2024 -
[xla:gpu] NFC: Replace all users of NcclApi with GpuCollectives
#20221 merged
Dec 6, 2024 -
Demote log from ERROR to VLOG(2).
#20237 merged
Dec 6, 2024 -
Correct the order of arguments in comment for RaggedAllToAll.
#20233 merged
Dec 6, 2024 -
[Cleanup] Use push_back instead of emplace_back where appropriate
#20011 merged
Dec 6, 2024 -
[xla] Add detailed tracing to RendezvousSingle
#20238 merged
Dec 6, 2024 -
[Cleanup] Use absl::StrCat
#20013 merged
Dec 6, 2024 -
Integrate LLVM at llvm/llvm-project@698d83218565
#20230 merged
Dec 6, 2024 -
Remove nsync from TensorFlow
#19931 merged
Dec 6, 2024 -
[xla:collectives] Make NcclApi an alias for GpuCollectives
#20219 merged
Dec 6, 2024 -
Add wrapper for PM Sampling metrics to be added from std::vector<PmSamples> to XLines
#19368 merged
Dec 6, 2024 -
Add minimal test case to collective pipeliner
#20102 merged
Dec 6, 2024 -
Remove obsolete TODOs and the ones associated with closed bug in third_party/tensorflow/compiler/*
#20168 merged
Dec 5, 2024 -
Remove #if TENSORFLOW_USE_ROCM from gpu_executable.cc.
#20166 merged
Dec 5, 2024 -
Use default HloParserOptions in HLO runner.
#20041 merged
Dec 5, 2024 -
Reverts ee92ce1bc50fb94024dc945e08c8c3d7aa0837bc
#20149 merged
Dec 5, 2024 -
Remove #if GOOGLE_CUDA from matmul_utils.cc.
#20161 merged
Dec 5, 2024 -
[hlo-opt] Register gpu passes and add "--list-passes" option
#20104 merged
Dec 5, 2024 -
Integrate LLVM at llvm/llvm-project@fdb90cef75ca
#20215 merged
Dec 5, 2024 -
IFRT Proxy: Make
Executable::Delete()
and most::Execute
async.#20173 merged
Dec 5, 2024 -
MHLO defns for a ragged dot that permits ragged batch and contraction.
#19991 merged
Dec 5, 2024 -
[JAX] Add end-to-end execution support in colocated Python API
#19905 merged
Dec 5, 2024 -
Add array interoperability python bindings to
xla::Literal
#19915 merged
Dec 5, 2024 -
[xla:gpu] Update custom call config WARN to VLOG
#20211 merged
Dec 5, 2024 -
[XLA:CPU] Add method to benchmark compile times
#20078 merged
Dec 5, 2024 -
Remove if_oss from B100 references
#20208 merged
Dec 5, 2024 -
Enable explicit batch dims of gather/scatter operations in GSPMD. There are two components.
#19810 merged
Dec 5, 2024 -
[xla-auto-sharding] Add SolveRandom() baseline algorithm for a random sharding.
#20204 merged
Dec 5, 2024 -
Reverts 1ac4eddd9ce0db29b37efe996e4309cd7e23ee9c
#20205 merged
Dec 5, 2024 -
Integrate LLVM at llvm/llvm-project@dd7a3d4d798e
#20201 merged
Dec 5, 2024 -
[XLA:GPU] Extend atomic_rmw to support vector updates.
#20199 merged
Dec 5, 2024 -
Add support for Pad operation in Hhost_offload_utils::GetPredecessors().
#20151 merged
Dec 5, 2024 -
[XLA:GPU][Emitters] Create
xla_ops
dialect for the platform-independent ops.#20188 merged
Dec 5, 2024 -
Adding dumping functionality for HloUnoptimizedSnapshot.
#20185 merged
Dec 5, 2024 -
Automated Code Change
#20190 merged
Dec 5, 2024 -
[XLA:GPU] Make VLOG explanation in SortRewriter more helpful.
#20198 merged
Dec 5, 2024 -
PR #18616: [XLA:CPU][oneDNN] Refactor code that fuses Add operation with oneDNN primitives
#20200 merged
Dec 5, 2024 -
Automated Code Change
#20191 merged
Dec 5, 2024 -
Integrate LLVM at llvm/llvm-project@71ac1eb50955
#20186 merged
Dec 5, 2024 -
[XLA:CPU] Implement 2D custom algorithm for strided transposed convolutions.
#19779 merged
Dec 5, 2024 -
PR #19913: [ROCm] Do not use fast approximation for exp and log
#20187 merged
Dec 5, 2024 -
Reverts 1000ed534b23a0eba634046a2b4e1cfa3d1d8bc2
#20189 merged
Dec 5, 2024 -
Automated Code Change
#20056 merged
Dec 5, 2024 -
[xla:collectives] NFC: Move all NCCL collectives to Collectives API
#20158 merged
Dec 5, 2024 -
Introduce xla_gpu flag for Dumping HloUnoptimizedSnapshots
#20122 merged
Dec 5, 2024 -
Cleanup. Use the unified
GatherScatterDims
for operand pass-through dims in gather/scatter instructions.#20167 merged
Dec 5, 2024 -
[xla:collectives] NFC: Move AllReduce into Collectives API
#20180 merged
Dec 5, 2024 -
Automated Code Change
#20179 merged
Dec 5, 2024 -
[xla:collectives] Move NCCL buffer registration to base Communicator Api
#20147 merged
Dec 5, 2024 -
Integrate LLVM at llvm/llvm-project@ce0f11325e0c
#20162 merged
Dec 5, 2024 -
More thorough propagation of host linear layout. Currently linear layout on host
#19267 merged
Dec 5, 2024 -
Fix wheel creation logic when pywrap rules are used
#20176 merged
Dec 5, 2024 -
[xla:collectives] NFC: Move CommInitRanks to Collectives API
#20140 merged
Dec 5, 2024 -
[XLA:GPU] Pipeline send/recv ops but not *-done ops in experimental PP opts
#20046 merged
Dec 5, 2024 -
[xla][gpu] Order send/recv chains across decomposed collective permutes
#20040 merged
Dec 5, 2024 -
[xla:collectives] NFC: Move Group calls to GpuCollectives
#20177 merged
Dec 5, 2024 -
[xla:collectives] NFC: Move stubs from NcclApiStub to GpuCollectivesStub
#20138 merged
Dec 5, 2024 -
[XLA:LatencyHidingScheduler] Fix issues with
scheduling_group_id
annotations. Added support for:#19892 merged
Dec 5, 2024 -
[xla:collectives] NFC: Move NCCL clique locking to collectives component
#20110 merged
Dec 5, 2024 -
Fix some *_main deps to only appear on xla_test targets.
#20148 merged
Dec 4, 2024 -
[XLA:Collective] Add common utility functions into
#20024 merged
Dec 4, 2024 -
Reverts be13efa3fe6e34918cf43d8b5bb5ce51da8849e8
#20146 merged
Dec 4, 2024 -
[XLA] Fix a struct/class tag mismatch in cutlass code.
#20142 merged
Dec 4, 2024 -
[xla-auto-sharding] Add call path for non-MIP heuristic solvers.
#20145 merged
Dec 4, 2024 -
[StableHLO][API] Add API to get StableHLO version from PortableArtifact
#20105 merged
Dec 4, 2024 -
[XLA:GPU] Fix ASAN test failure in layout_assignment_test.
#20144 merged
Dec 4, 2024 -
[XLA:GPU] Introduce xla_gpu_enable_experimental_pipeline_parallelism_opt to guard experimental prototype
#19792 merged
Dec 4, 2024 -
[XLA:GPU] Symbolic tiling: support bitcasts that reduce rank by more than one.
#19438 merged
Dec 4, 2024 -
Remove more unused hlo passes.
#20136 merged
Dec 4, 2024 -
Update clone with new operands to handle ragged all-to-all.
#19459 merged
Dec 4, 2024 -
Remove restriction to disable tensor cores for 8-bit x F32 dots.
#19338 merged
Dec 4, 2024 -
Add more debug log when Shardy dumping is disabled.
#20073 merged
Dec 4, 2024 -
[XLA:GPU] Consolidate logic to enable expensive optimisation passes.
#20082 merged
Dec 4, 2024 -
Fix CheckAndCanonicalizeMemoryKind when used with non-addressable device list.
#20135 merged
Dec 4, 2024 -
Reverts 0681280cf5f6b71eb09017a51a079553d98e49f2
#20134 merged
Dec 4, 2024 -
Deduplicate getValuesFromDotOperandLayoutStruct function
#20127 merged
Dec 4, 2024 -
Integrate CompilationProvider framework into NVPTXCompiler
#20112 merged
Dec 4, 2024 -
[XLA:GPU][Emitters] Use DeviceDescription in lower_to_llvm.cc.
#20129 merged
Dec 4, 2024 -
Update partitioning method for gather/scatter along implicit batch dimensions.
#20097 merged
Dec 4, 2024 -
[XLA:GPU] Add RaggedAllToAllDecomposer to the GPU compilation pipeline.
#20068 merged
Dec 4, 2024 -
Integrate LLVM at llvm/llvm-project@9c9d4b9e73c1
#20124 merged
Dec 4, 2024 -
Create snapshot for HloUnoptimized module.
#20123 merged
Dec 4, 2024 -
Automated Code Change
#20118 merged
Dec 4, 2024 -
Automated Code Change
#20116 merged
Dec 4, 2024 -
Add TraceMe around HLO passes and pipelines.
#20071 merged
Dec 4, 2024 -
PR #20084: Adds a TraceMe in the blocking while loop.
#20120 merged
Dec 4, 2024 -
Integrate LLVM at llvm/llvm-project@109e4a147faa
#20113 merged
Dec 4, 2024 -
[XLA:GPU] Swap dot operands in certain cases.
#19922 merged
Dec 4, 2024 -
Automated Code Change
#19979 merged
Dec 4, 2024 -
PR #20028: [GPU][NFC] Cleanup horizontal loop fusion.
#20114 merged
Dec 4, 2024 -
Remove ifdefs from ir_emitter_unnested
#20016 merged
Dec 4, 2024 -
Automated Code Change
#20057 merged
Dec 4, 2024 -
Apply layout permutation also to dynamic dimensions.
#20063 merged
Dec 4, 2024 -
[xla:collectives] NFC: Move GetUniqueId to Collectives API
#20095 merged
Dec 4, 2024 -
[xla:collectives] NFC: Move lockable GpuClique to collectives component
#20052 merged
Dec 4, 2024 -
[xla:collectives] NFC: Move NcclApi::CommCount to Communicator API
#20048 merged
Dec 4, 2024 -
Integrate LLVM at llvm/llvm-project@a201ba1b57aa
#20108 merged
Dec 4, 2024 -
[xla:collectives] NFC: Remove unused NcclApi CommFinalize function
#20049 merged
Dec 4, 2024 -
[IFRT] Add
custom_options
inifrt::ExecuteOptions
#20103 merged
Dec 4, 2024 -
[xla:collectives] NFC: Move CommAbort to Communicator API
#20050 merged
Dec 4, 2024 -
This CL adds a filter mask to BFCAllocator::AddTraceMe so we can collect only memory TraceMe events.
#19863 merged
Dec 4, 2024 -
Support int4 in most ops on CPUs/GPUs.
#20043 merged
Dec 4, 2024 -
[xla:collectives] NFC: Move communicator error checking to Communicator API
#20051 merged
Dec 4, 2024 -
Move
tsl/platform:{subprocess,subprocess_test}
to XLA#20032 merged
Dec 4, 2024 -
[hlo-opt] Register HWI passes and move test files to dedicated test directory
#20100 merged
Dec 4, 2024 -
Fix documentation for CommandBufferCmdEmitter ConvertToCommands
#20098 merged
Dec 4, 2024 -
Add _raw_platform to work around extra platform normalization logic and enable
#20091 merged
Dec 4, 2024 -
Integrate LLVM at llvm/llvm-project@fed3a9b8f81f
#20090 merged
Dec 4, 2024 -
[xla:collectives] NFC: Extract NcclCommunicators into GpuClique and NcclCliqueImpl
#20053 merged
Dec 4, 2024 -
Rename
NewHloTestBase
toHloRunnerAgnosticTestBase
.#20018 merged
Dec 3, 2024 -
Add inference stats sampler and grouping.
#19855 merged
Dec 3, 2024 -
[IFRT] Add layout_mode attribute to IFRT Array type.
#19933 merged
Dec 3, 2024 -
Integrate StableHLO at openxla/stablehlo@d1db6dfe
#20094 merged
Dec 3, 2024 -
[xla:collectives] NFC: Extract shared Clique from NcclCommunicators
#20054 merged
Dec 3, 2024 -
Factor out test config for better readability
#19802 merged
Dec 3, 2024 -
Remove stale hlo_instruction comment in thunk.h
#20093 merged
Dec 3, 2024 -
Remove obsolete TODO for fixed bug in
pjrt_c_api.h
.#19857 merged
Dec 3, 2024 -
Check whether profile info is empty to determine if the module is using profiles.
#20060 merged
Dec 3, 2024 -
[ifrt] Fix tag mismatch for xla::ifrt::CompileOptions.
#20087 merged
Dec 3, 2024 -
[xla:collectives] NFC: Remove NcclCliqueKey alias
#19967 merged
Dec 3, 2024 -
Internal CI/CD change
#20021 merged
Dec 3, 2024 -
[XLA:GPU] Log the error if parsing of Triton IR from custom call fails.
#20076 merged
Dec 3, 2024 -
[XLA:GPU][Emitters] Remove the complex.expm1 approximation.
#19519 merged
Dec 3, 2024 -
[XLA:GPU] Don't allow to fuse DUS with shared operands.
#20072 merged
Dec 3, 2024 -
[XLA:GPU] Do not compute suggested combiner threshold if there are no pipelined collectives in IR.
#20007 merged
Dec 3, 2024 -
[xla:collectives] NFC: Move NcclCliqueKey to GpuCliqueKey
#19968 merged
Dec 3, 2024 -
Automated Code Change
#19976 merged
Dec 3, 2024 -
[xla:collectives] NFC: Move CliqueIsCallback alias to gpu_executable_run_options
#19964 merged
Dec 3, 2024 -
[XLA:GPU] Use Cub RaddixSort for bf16 sorts in Numpy order (NaNs go last).
#19983 merged
Dec 3, 2024 -
Integrate LLVM at llvm/llvm-project@bd92e4620433
#20061 merged
Dec 3, 2024 -
[xla:collectives] NFC: Introduce GpuCollectives interface
#19965 merged
Dec 3, 2024 -
PR #19571: PJRT: assign process index and count for compilation using device assignment.
#19946 merged
Dec 3, 2024 -
[XLA][IndexAnalysis] Move indexing_analysis to hlo/analysis.
#20017 merged
Dec 3, 2024 -
Add AssembleCompilationProvider routine
#19973 merged
Dec 3, 2024 -
[xla:collectives] NFC: Rename NcclCommHandleWrapper to CommunicatorHandle
#19966 merged
Dec 3, 2024 -
[xla:collectives] NFC: Remove NcclCliqueId alias
#19959 merged
Dec 3, 2024 -
[xla:collectives] Add a base CliqueKey class to collectives component
#19960 merged
Dec 3, 2024 -
[xla:collectives] NFC: Migrate XLA:GPU to strongly typed RankId
#19956 merged
Dec 3, 2024 -
[xla:collectives] NFC: Add strongly typed collective RankId
#19955 merged
Dec 3, 2024 -
Add a test to verify that the PJRT API struct is laid out as expected.
#19460 merged
Dec 3, 2024 -
[XLA:CPU] Use
ArrayTypeSwitch
to avoid having to write out the switch#20019 merged
Dec 3, 2024 -
Remove overly restricted checking in RemapPlan::Validate()
#20027 merged
Dec 3, 2024 -
[xla:collectives] NFC: Move NcclCliqueId to collectives/clique_id
#19954 merged
Dec 3, 2024 -
Rollback of "Require packed dot operands to be packed along contracting dimension."
#20037 merged
Dec 3, 2024 -
[XLA:LatencyHidingScheduler] Code Refactor (NFC).
#19452 merged
Dec 3, 2024 -
Mark
tsl/default:criticality
asnobuilder
#20033 merged
Dec 3, 2024 -
[xla:collectives] NFC: Add xla::Collectives API and prepare for NCCL implementation
#19895 merged
Dec 2, 2024 -
Add scope range id for gpu event's annotation.
#18877 merged
Dec 2, 2024 -
Add explicit GPU platforms to exhaustive tests
#18628 merged
Dec 2, 2024 -
Integrate LLVM at llvm/llvm-project@1b03747ed85c
#20022 merged
Dec 2, 2024 -
[XLA] Update documentation with CompositeCall operation.
#20020 merged
Dec 2, 2024 -
[XLA:GPU] Minor reformat in collective_permute_cycle_decomposer.cc
#20015 merged
Dec 2, 2024 -
Move test to public XLA:CPU test
#19997 merged
Dec 2, 2024 -
Un-deprecate
HloTestBase
.#20010 merged
Dec 2, 2024 -
Integrate LLVM at llvm/llvm-project@21d27b3aabf3
#20008 merged
Dec 2, 2024 -
[XLA:GPU] Support cross-replica cps in collective-permute decomposer
#19676 merged
Dec 2, 2024 -
Add fine grained visibility rules to XLA
#20009 merged
Dec 2, 2024 -
Add Megascale topology stat type.
#19814 merged
Dec 2, 2024 -
Create stub filegroups in
xla/tsl/platform
corresponding to filegroups intsl/platform
#19864 merged
Dec 2, 2024 -
Move Internal XLA:CPU to public XLA:CPU API
#19999 merged
Dec 2, 2024 -
Move Python XLA extension to public XLA:CPU API
#19995 merged
Dec 2, 2024 -
Move test files to XLA:CPU public API
#19990 merged
Dec 2, 2024 -
Move XLA Tests to XLA:CPU public PJRT API
#19993 merged
Dec 2, 2024 -
Prepare custom call tests for API_VERSION_ORIGINAL removal
#19986 merged
Dec 2, 2024 -
Support cross-replica send/recv on GPU
#19167 merged
Dec 2, 2024 -
[xla:cpu] Replace Thunk::FunctionRegistry with FunctionLibrary
#19953 merged
Dec 2, 2024 -
Integrate LLVM at llvm/llvm-project@2474cf7ad123
#20001 merged
Dec 2, 2024 -
[XLA] Move hlo_traversal.h to hlo/utils.
#19992 merged
Dec 2, 2024 -
Provide more CUDA diagnostic information
#19985 merged
Dec 2, 2024 -
[XLA:CPU] Disable a subset of failing test cases in complex_unary_op_test on ARM.
#19988 merged
Dec 2, 2024 -
[XLA:GPU] Fix typo in custom_kernel_fusion.h
#19987 merged
Dec 2, 2024 -
PR #19655: [ROCm] Make MLIR Math dialect lowering more deterministic
#19984 merged
Dec 2, 2024 -
Integrate LLVM at llvm/llvm-project@fe1c4f0106fe
#19980 merged
Dec 2, 2024 -
Automated Code Change
#19978 merged
Dec 2, 2024 -
[XLA:GPU] Clean up: reuse the
LoadHloModuleAndArguments
directly inmultihost_hlo_runner
utils.#19920 merged
Dec 2, 2024 -
PR #19927: [cuda] Warn about ptxas versions before CUDA 12.6.3
#19944 merged
Dec 2, 2024 -
PR #19869: [XLA] Wraps the FFI backend config opaque string with CustomCallBackendConfig
#19901 merged
Dec 2, 2024 -
Integrate LLVM at llvm/llvm-project@b22cc5a650de
#19975 merged
Dec 2, 2024 -
Replace gpu_asm_extra_flags string option by individual flags
#19836 merged
Dec 2, 2024 -
Automated Code Change
#19962 merged
Dec 1, 2024 -
C++ tree with path API
#19366 merged
Dec 1, 2024 -
[XLA:CPU] Use
absl::call_once
to lazily initialize kernel and comparator functions#19949 merged
Nov 30, 2024 -
[xla:cpu] NFC: Move FunctionLibrary from codegen to runtime
#19951 merged
Nov 30, 2024 -
[xla:cpu] NFC: Consistently use llvm::CodeGenOptLevel to configure opt_level
#19950 merged
Nov 30, 2024 -
[xla:cpu] NFC: Delete SimpleOrcJit
#19909 merged
Nov 30, 2024 -
[XLA:GPU] Make the buffer comparison code more robust
#19948 merged
Nov 30, 2024 -
Propagate profile generation strategy all the way to AutoGrappler
#19862 merged
Nov 29, 2024 -
[xla:cpu] Migrate CpuCompiler from SimpleOrcJit to JitCompiler
#19945 merged
Nov 29, 2024 -
[XLA:CPU] Mark vectorized_reduce_with_no_vector_registers_test as requiring x86_64
#19893 merged
Nov 29, 2024 -
[XLA:GPU] Fix Triton support for dot(inf, 1.0) with TF32_TF32_F32_X3 algorithm
#19937 merged
Nov 29, 2024 -
[XLA:GPU] Fail gracefully if call function was not found in the Triton module.
#19941 merged
Nov 29, 2024 -
Integrate LLVM at llvm/llvm-project@59f57be94f38
#19943 merged
Nov 29, 2024 -
[XLA:GPU] Set the
SortSizeThreshold
in tests to zero.#19940 merged
Nov 29, 2024 -
[XLA] Clarify the usage of IsAsyncCollective{Start,Done}Op.
#19936 merged
Nov 29, 2024 -
PR #19925: [ROCm] Fix build break with gcc due to
53417984
#19938 merged
Nov 29, 2024 -
[XLA:GPU] Add a check to emitters that dynamic-(update-)slice indexes are canonicalized.
#19929 merged
Nov 29, 2024 -
[XLA:GPU] Clean up: remove redundant code. Express one
Create
method in terms of the other.#19930 merged
Nov 29, 2024 -
[XLA:GPU] Require packed dot operands to be packed along contracting dimension.
#19190 merged
Nov 29, 2024 -
Remove no longer used hlo passes.
#19924 merged
Nov 29, 2024 -
Reverts 6949b216cd2a413e4a1445104cd42ebbdca29d04
#19935 merged
Nov 29, 2024 -
[xla:cpu] Migrate CpuCompiler from SimpleOrcJit to JitCompiler
#19907 merged
Nov 29, 2024 -
Clarify index parallel dims in gather/scatter instructions.
#19826 merged
Nov 29, 2024 -
Review PJRT integration document
#19202 merged
Nov 28, 2024 -
[xla:cpu] Add support for linking with external symbols to JitCompiler
#19908 merged
Nov 28, 2024 -
[xla:cpu] NFC: Construct JitCompiler in xla::cpu::CpuCompiler
#19906 merged
Nov 28, 2024 -
[XLA:GPU] Add documentation for XLA:GPU emitters to ToC.
#19928 merged
Nov 28, 2024 -
[xla:cpu] Enable parallel compilation using ORC TaskDispatcher
#19867 merged
Nov 28, 2024 -
#sdy cleanup: remove unused pipeline creation function and add
Sdy
namespace to ops inops.td
.#19891 merged
Nov 28, 2024 -
[XLA:GPU][Emitters] Add documentation for the emitters.
#19923 merged
Nov 28, 2024 -
[XLA:GPU] Add gpu_client_mem_fraction flag to the HLO runner
#19918 merged
Nov 28, 2024 -
[XLA:GPU] Enable F32_F32_F32 algorithm for triton gemm fusion
#19914 merged
Nov 28, 2024 -
[Triton] Fix bug where chain-dots with num_warps = 8 would lead to assertion failure in Linear Layouts.
#19917 merged
Nov 28, 2024 -
Integrate LLVM at llvm/llvm-project@32ef417603e1
#19919 merged
Nov 28, 2024 -
Reverts cd11046e5007508498f140df13ba40dc08b13822
#19916 merged
Nov 28, 2024 -
Automated Code Change
#19911 merged
Nov 28, 2024 -
PR #19890: Use
/opt/rocm/
for ROCM_INSTALL_DIR environment variable#19912 merged
Nov 28, 2024 -
Integrate StableHLO at openxla/stablehlo@f21104d0
#19675 merged
Nov 28, 2024 -
Automated Code Change
#19874 merged
Nov 28, 2024 -
Put P2PSchedulePreparation behind
enable_pipelined_p2p
flag#19246 merged
Nov 28, 2024 -
[xla:cpu] NFC: Extract RuntimeSymbolGenerator into a separate library
#19846 merged
Nov 28, 2024 -
[xla:cpu] Add a little bit of type safety for FunctionLibrary
#19849 merged
Nov 28, 2024 -
Make channel_id optional for send/recv ops.
#19239 merged
Nov 28, 2024 -
Add more flexible custom hermetic Python setup
#19689 merged
Nov 28, 2024 -
[xla:cpu] Implement JitCompiler on top of LLVM ORC stack
#19848 merged
Nov 28, 2024 -
Temporarily remove absl dll patch to fix jaxlib windows build.
#19870 merged
Nov 28, 2024 -
[XLA:GPU] Fix
tune_ctas
in GemmFusionAutotunerImpl::GetExhaustiveTritonConfigs.#19904 merged
Nov 27, 2024 -
[XLA] Remove unused functions from llvm_util.
#19903 merged
Nov 27, 2024 -
[xla:cpu] NFC: Move InferTargetMachine to JitCompiler
#19852 merged
Nov 27, 2024 -
[IFRT] Modify IfrtArrayType builders to only accept types and attributes.
#19894 merged
Nov 27, 2024 -
[XLA:GPU] Use DeviceDescription instead of hard-coding warp size as 32
#19794 merged
Nov 27, 2024 -
[xla:cpu] NFC: Extract cpu features filtering and detection into a separate library
#19851 merged
Nov 27, 2024 -
[XLA:CPU] Fix test failure for the cpu case on an internal test.
#19887 merged
Nov 27, 2024 -
Reverts 00f3b02608e672e121f5cd31dba6a8027d64fe60
#19889 merged
Nov 27, 2024 -
Switch out the base container image for XLA.
#19713 merged
Nov 27, 2024 -
[XLA:GPU] Remove legacy reduction emitter code
#19720 merged
Nov 27, 2024 -
Integrate LLVM at llvm/llvm-project@f67ba5855278
#19885 merged
Nov 27, 2024 -
IFRT proxy: Additional debug logging and xprof tracemes.
#19888 merged
Nov 27, 2024 -
[xla:cpu] NFC: Move CompilerFunctor to backends/cpu/codegen IrCompiler
#19853 merged
Nov 27, 2024 -
PR #19026: [NVIDIA GPU] LHS enhancement for multiple collective resources
#19835 merged
Nov 27, 2024 -
Integrate LLVM at llvm/llvm-project@b214ca82daee
#19883 merged
Nov 27, 2024 -
PR #19754: [ROCm] Enable gemm fusion autotuner.
#19877 merged
Nov 27, 2024 -
[XLA:GPU] Fix a stack use after scope in horizontal fusion
#19882 merged
Nov 27, 2024 -
PR #18988: [WhileLoopAllReduceCodeMotion] Support convert and transpose ops in setup passes.
#19876 merged
Nov 27, 2024 -
[XLA:CPU] Update RunHloBenchmark to enable running with HLO with inferred arguments
#19698 merged
Nov 27, 2024 -
[XLA:GPU] Allow passing extra GPU client options to PjRt environment init
#19783 merged
Nov 27, 2024 -
[XLA:GPU] Disabling auto layout in HLO for now.
#19879 merged
Nov 27, 2024 -
Automated Code Change
#19753 merged
Nov 27, 2024 -
[XLA:GPU] Move
SortRewriter
to an earlier point in the pass pipeline.#19657 merged
Nov 27, 2024 -
PR #18840: [NVIDIA] Support larger head dim for cudnn fmha
#19838 merged
Nov 27, 2024 -
[IFRT] Modify IFRT <-> VIFRT legalization to support escaped SymbolRefAttr.
#19865 merged
Nov 27, 2024 -
[IFRT] Add pass to remove attributes that are not from IFRT or Builtin dialects.
#19861 merged
Nov 27, 2024 -
[XLA:GPU] Fix an ASAN error
#19816 merged
Nov 27, 2024 -
[XLA:GPU] Remove unused function RowReductionGetRowsPerWarp.
#19719 merged
Nov 27, 2024 -
[Cleanup] Use push_back instead of emplace_back where appropriate
#19800 merged
Nov 27, 2024 -
[XLA:GPU] Fix a bug in horizontal input fusion sorting.
#19715 merged
Nov 26, 2024 -
Move
tsl/platform/{build_config,build_config_root,rules_cc}.bzl
toxla/tsl/platform
#19718 merged
Nov 26, 2024 -
Allowing reshape on int4's in XLA:GPU
#19858 merged
Nov 26, 2024 -
[PjRt-IFRT] Add optional global device mapping support to PjRt-IFRT
#19638 merged
Nov 26, 2024 -
[JAX] Add Python binding for building a colocated Python program
#19811 merged
Nov 26, 2024 -
[hlo-opt] Add a placeholder method to register passes from the CPU/GPU providers.
#19805 merged
Nov 26, 2024 -
[XLA] Alias ragged all-to-all output with operand 1.
#19812 merged
Nov 26, 2024 -
[XLA:GPU] Mark collectives in formatting ops as pipelined.
#19647 merged
Nov 26, 2024 -
[XLA:GPU] Add intra-warp reduce of reduce test.
#19840 merged
Nov 26, 2024 -
Integrate LLVM at llvm/llvm-project@c0192a008c4a
#19839 merged
Nov 26, 2024 -
Add DeferRelocatableCompilationCompilationProvider
#19831 merged
Nov 26, 2024 -
[XLA:GPU] Remove
xla_gpu_enable_heuristic_pass_configuration
flag.#19149 merged
Nov 26, 2024 -
Reverts 27352402f3c65c41e7c897138d6ad3e015a04014
#19837 merged
Nov 26, 2024 -
Fix
PropagateShardingAlongDimsAndReplicateOthers
and expose it as a public util function.#19825 merged
Nov 26, 2024 -
[XLA:CPU] Create kernel API for cpu runtime
#19785 merged
Nov 26, 2024 -
[xla:cpu] Move TargetMachineFeatures to xla/backends/codegen
#19821 merged
Nov 26, 2024 -
Add PTX CompilationProvider for compiling PTX via the CUDA driver
#19777 merged
Nov 26, 2024 -
Automated Code Change
#19760 merged
Nov 26, 2024 -
Update target_config to be a text proto and populate it on the
#19818 merged
Nov 26, 2024 -
Automated Code Change
#19742 merged
Nov 26, 2024 -
[IFRT] Add IFRT IR program SerDeRoundTrip helper method for tests.
#19815 merged
Nov 26, 2024 -
Adds a sharding config to XLA's HloModuleConfig (as part of AutoFDO integration).
#19417 merged
Nov 26, 2024 -
Create op_metircs_to_record to deal with Roofline Analysis
#19817 merged
Nov 26, 2024 -
[XLA:SPMD] Fix a bug in
PartitionGatherTrivialSlicedOperandDimensions
.#19806 merged
Nov 26, 2024 -
[Cleanup] Use push_back instead of emplace_back where appropriate
#19801 merged
Nov 26, 2024 -
Relocates all ShardingConfig <--> ShardingConfigProto conversion from platforms/ to third_party/.
#19808 merged
Nov 26, 2024 -
[Cleanup] Use push_back instead of emplace_back where appropriate
#19799 merged
Nov 26, 2024 -
Reverts 783d6c98e36b7d7cdabeb11b34b6c3d88e716e74
#19804 merged
Nov 25, 2024 -
[IFRT] Add VIFRT serialization python bindings.
#19791 merged
Nov 25, 2024 -
Reverts 046f3dc59a0c67a0ce144cc01a5af5aeac58977c
#19803 merged
Nov 25, 2024 -
[xla:cpu] Move TargetMachineFeatures to xla/backends/codegen
#19756 merged
Nov 25, 2024 -
Relocates ShardingConfig's nested proto definition from platforms/ to third_party/.
#19379 merged
Nov 25, 2024 -
[Cleanup] Use push_back instead of emplace_back where appropriate
#19797 merged
Nov 25, 2024 -
[xla:cpu] NFC: Extract XLA:CPU alignment requirements into a separate library
#19798 merged
Nov 25, 2024 -
Integrate LLVM at llvm/llvm-project@c9e606b9cf50
#19795 merged
Nov 25, 2024 -
[xla:cpu] NFC: Move VectorSupportLibrary to VectorIrBuilder in backends/cpu/codegen
#19793 merged
Nov 25, 2024 -
Create a simplified cost interface that can be used in various components.
#19377 merged
Nov 25, 2024 -
[xla:cpu] NFC: Move polynomial approximations for common math functions to xla cpu codegen folder
#19736 merged
Nov 25, 2024 -
Clean up dependencies for matmul_utils.h.
#19789 merged
Nov 25, 2024 -
Add a test in hlo_evaluator_test to demonstrate how to obtain diagonal from a matrix.
#19637 merged
Nov 25, 2024 -
[xla:cpu] NFC: Extract ContiguousSectionMemoryManager into a separate library
#19787 merged
Nov 25, 2024 -
[xla:cpu] Initial commit for LlvmOrcJitCompiler
#19786 merged
Nov 25, 2024 -
Delete unused xla/status.h.
#19708 merged
Nov 25, 2024 -
Bump C64 Log tolerance in exhaustive_unary_complex_test
#19782 merged
Nov 25, 2024 -
Add CachingCompilationProvider
#19771 merged
Nov 25, 2024 -
Integrate LLVM at llvm/llvm-project@f81f47e3ff29
#19781 merged
Nov 25, 2024 -
[XLA:GPU] Make SortRewriter VLOG level 2 less chatty.
#19778 merged
Nov 25, 2024 -
Add CompositeCompilationProvider
#19769 merged
Nov 25, 2024 -
PR #19775: [PJRT:GPU] Fix device numbering in topology creation
#19776 merged
Nov 25, 2024 -
Add compilation provider for libnvptxcompiler
#19763 merged
Nov 25, 2024 -
[XLA:GPU] Add support for BF16_BF16_F32[_X3,X6] dot precision algorithm in algebraic simplifier.
#19768 merged
Nov 25, 2024 -
Reverts 885378495882835e7ddfbea137d0677924604fee
#19772 merged
Nov 25, 2024 -
[XLA:GPU] Skip cub sort on failing types on H100
#19773 merged
Nov 25, 2024 -
Add CompilationProvider for libnvjitlink
#19762 merged
Nov 25, 2024 -
Enable sort_rewriter_test in OSS
#19765 merged
Nov 25, 2024 -
Automated Code Change
#19752 merged
Nov 25, 2024 -
[Upkeep][XLA-Code-Health] Resolve 4 instances of the following issue: Todo (resolved)
#19518 merged
Nov 24, 2024 -
Integrate LLVM at llvm/llvm-project@2fe947b47798
#19721 merged
Nov 23, 2024 -
Delete unused xla/statusor.h.
#19707 merged
Nov 23, 2024 -
Move BatchedGatherScatterNormalizer from pre-SPMD for pose-SPMD.
#19415 merged
Nov 23, 2024 -
internal visibility change
#19723 merged
Nov 23, 2024 -
Add target_config as an optional field of
#19726 merged
Nov 23, 2024 -
[xla:collectives] NFC: Remove communicator aliases from NcclApi
#19724 merged
Nov 23, 2024 -
Remove absl::Nonnull from AbslStringify
#19722 merged
Nov 23, 2024 -
Update target_config to be a text proto and populate it on the
#19710 merged
Nov 23, 2024 -
Use absl::Nonnull to indicate that sharding in xla::ifrt::ArraySpec cannot be null
#19717 merged
Nov 22, 2024 -
[Code-Health] Resolve the following technical debt issue: Todo(resolved)
#19483 merged
Nov 22, 2024 -
[xla:collectives] Remove unused CommDestroy
#19714 merged
Nov 22, 2024 -
[xla:collectives] Use NcclCommunicator in NcclApi implementation
#19712 merged
Nov 22, 2024 -
[IFRT] Add VIFRT pass for converting between VIFRT versions.
#19711 merged
Nov 22, 2024 -
Integrate LLVM at llvm/llvm-project@556ea5265a25
#19701 merged
Nov 22, 2024 -
[XLA:GPU] remove channel ID checks in hlo_instructions.cc
#19408 merged
Nov 22, 2024 -
[xla:cpu] Add JitCompiler and FunctionLibrary APIs for XLA:CPU codegen
#19702 merged
Nov 22, 2024 -
[xla:cpu] Add a KernelRunner API to codegen testlib and sketch a test for XLA:CPU
#19703 merged
Nov 22, 2024 -
Further lower threshold for F64 in //xla/service/gpu/model:hlo_op_profiler_test
#19709 merged
Nov 22, 2024 -
[xla:collectives] Initial xla/core/collectives component commit
#19680 merged
Nov 22, 2024 -
PR #19660: [ROCm] switch rocm build to clang
#19705 merged
Nov 22, 2024 -
Stop using xla/statusor.h in favor of absl/status/statusor.h directly.
#19704 merged
Nov 22, 2024 -
Fix two issues in
PartitionScatterIndexPassthroughDimensions
.#19688 merged
Nov 22, 2024 -
[XLA] Go back to using a glob for including dialects in the
mlir_interpreter
.#19697 merged
Nov 22, 2024 -
Integrate LLVM at llvm/llvm-project@a12e79a85fc1
#19692 merged
Nov 22, 2024 -
#sdy Refactor
xla-sdy-mhlo-round-trip-shard-map-export
from aConversionPattern
to a walk.#19658 merged
Nov 22, 2024 -
[XLA:GPU] Fusion tests don't seem to require A100, so replace tag.
#19640 merged
Nov 22, 2024 -
[xla:cpu] Add a benchmark for creating zero-copy PjRt buffer
#19670 merged
Nov 22, 2024 -
[xla:cpu] NFC: Remove ExecuteState alias from Thunk
#19659 merged
Nov 22, 2024 -
PR #19577: Cleanup handling of 2 fields of ExecutableBuildOptions.
#19651 merged
Nov 22, 2024 -
PR #19346: Bumped rules_python version to 0.39.0
#19500 merged
Nov 22, 2024 -
PR #19679: [XLA:CPU][oneDNN] Relocate Addend Shape Validation to the Contraction Rewriter
#19686 merged
Nov 22, 2024 -
[XLA] Propagate original_value when instructions are replaced in X64Rewriter
#19639 merged
Nov 22, 2024 -
PR #19656: Fix implicit index handling in ScatterDeterminismExpander
#19683 merged
Nov 22, 2024 -
Change parameter type in LinkUsingNvlink
#19682 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in command_buffer_scheduling.cc
#19599 merged
Nov 22, 2024 -
Add a simple test for the symbol_finder
#19665 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in collective_send_recv_combiner.cc
#19609 merged
Nov 22, 2024 -
Move LinkGpuAsm into separate file
#19642 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in collective_select_folder.cc
#19592 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in collective_permute_valid_iteration_annotator.cc
#19584 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in collective_permute_cycle_decomposer.cc
#19596 merged
Nov 22, 2024 -
Move ptxas/nvlink compilation into separate compilation unit
#19641 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in async_wrapper_test.cc
#19590 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in async_wrapper.cc
#19622 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in all_reduce_blueconnect.cc
#19597 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in all_gather_optimizer.cc
#19583 merged
Nov 22, 2024 -
Remove obsolete PjRtClient::AsyncSendPlaceholder API.
#19678 merged
Nov 22, 2024 -
Move
tsl/platform/profile_utils
toxla/tsl/platform/profile_utils
#19674 merged
Nov 22, 2024 -
Set implicitTrunc on APInt creation
#19677 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in all_gather_dynamic_slice_simplifier.cc
#19629 merged
Nov 22, 2024 -
[IFRT] Implement BytecodeDialectInterface for VIFRT.
#19672 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in alias_passthrough_params.cc
#19585 merged
Nov 22, 2024 -
Add batch tests to RemapArrays, and with different shapes.
#19442 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_transpose_fusion.cc
#19575 merged
Nov 22, 2024 -
Stop using AsGpuStreamValue in gpu_cudamallocasync_allocator_test.
#19667 merged
Nov 22, 2024 -
PR #16901: [XLA:GPU] Fix default device mesh for auto sharding
#19294 merged
Nov 22, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_rewriter_test.cc
#19574 merged
Nov 21, 2024 -
Eliminate static_casts in GpuCommandBuffer.
#19666 merged
Nov 21, 2024 -
Move
jax
visibility insideinternal_visibility
call#19566 merged
Nov 21, 2024 -
[XLA:CollectivePipeliner-Sinking] Stop pipelining iterations if a large sunk collective is encountered.
#19526 merged
Nov 21, 2024 -
Refactor
GetGatherScatterBatchParallelDims
. No behavior change.#19581 merged
Nov 21, 2024 -
[xla:codegen] Add a testonly KernelEmitter for testing XLA:CPU kernels
#19668 merged
Nov 21, 2024 -
Revert: [XLA:GPU] Enable Triton normalization fusions by default.
#19662 merged
Nov 21, 2024 -
[XLA:GPU] Dump the failing HLO fusion to a file when Triton numerics verification fails.
#19664 merged
Nov 21, 2024 -
Cleanup. Merge
GatherScatterParallelDims
intoGatherScatterDims
.#19572 merged
Nov 21, 2024 -
Make the implementation of GetXlaPjrtTpuClient more similar to how Jax uses PJRT.
#19568 merged
Nov 21, 2024 -
Reverts 06e8ef8ec494877e459ff357f9479613a472c67b
#19563 merged
Nov 21, 2024 -
Add backend kwargs to xla tests.
#19561 merged
Nov 21, 2024 -
Set implicitTrunc on APInt creation
#19565 merged
Nov 21, 2024 -
Remove unused GpuAsmOpts parameter from RedzoneAllocator
#19533 merged
Nov 21, 2024 -
[tsl:concurrency] Fix asan error in CountDownAsyncValueRef
#19661 merged
Nov 21, 2024 -
Remove static_casts in implementations of SetNodeExecutionEnabled.
#19573 merged
Nov 21, 2024 -
[xla:cpu] Add a test for nanort executable with temp storage
#19580 merged
Nov 21, 2024 -
[XLA:GPU] Change
ConstraintExpression
to use operator||/&& which return a new instance.#19548 merged
Nov 21, 2024 -
PR #19578: [doc] Fix a link to a page in the table of contents.
#19645 merged
Nov 21, 2024 -
[xla:cpu] Optimize buffer allocations construction from se::DeviceMemoryBase
#19487 merged
Nov 21, 2024 -
Remove remnants of GpuDriver
#19544 merged
Nov 21, 2024 -
Refactor PjRt environment initialization to have clearer data flow
#19553 merged
Nov 21, 2024 -
PR #18407: Fix xla-mlir failures on Windows
#19650 merged
Nov 21, 2024 -
Integrate LLVM at llvm/llvm-project@33fcd6acc755
#19542 merged
Nov 21, 2024 -
[XLA:GPU] Remove KernelFusionEmitterBase.
#19118 merged
Nov 21, 2024 -
[XLA:GPU] Enable Triton normalization fusions by default.
#19569 merged
Nov 21, 2024 -
Remove CUDA 12.1 workaround from reduction logic
#19536 merged
Nov 21, 2024 -
Remove :cuda_runtime and :rocm_runtime targets
#19541 merged
Nov 21, 2024 -
Use fast version of log if type is F16 or BF16.
#19535 merged
Nov 21, 2024 -
[XLA:GPU] Use DeviceDescription instead of GetDriverVersion in NVPTXCompiler
#19539 merged
Nov 21, 2024 -
[XLA:TPU:MSA] Remove redundant checks for cross_program_prefetches in memory_space_assignment tests.
#18130 merged
Nov 21, 2024 -
[xla:cpu] Use CountDownAsyncValueRef in HostKernel state
#19579 merged
Nov 21, 2024 -
[Code-Health] Resolve 2 instances of the following issue: Todo (resolved)
#19488 merged
Nov 21, 2024 -
[xla:cpu] Resolve constant buffers
#19508 merged
Nov 21, 2024 -
[xla:cpu] Resolve arguments/results/temp mapping from buffer assignment
#19504 merged
Nov 21, 2024 -
Move
tsl/platform/{cloud,default,windows}
toxla/tsl/platform
#19323 merged
Nov 21, 2024 -
[XLA:TPU:MSA]
#19491 merged
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_mha_rewriter.cc
#19576 merged
Nov 21, 2024 -
[XLA:GPU] Remove RewriteReductionsPass
#19570 merged
Nov 21, 2024 -
Remove unused functions from ir_emission_utils.cc
#19513 merged
Nov 20, 2024 -
[IFRT] Add pass to legalize VIFRT into IFRT.
#19523 merged
Nov 20, 2024 -
[NFC] hlo_op_profiler_test: Internal testing change.
#19562 merged
Nov 20, 2024 -
[IFRT] Fix signature of CreateIfrtVerifyDonationPass
#19564 merged
Nov 20, 2024 -
IFRT proxy optimization: Make more IFRT operations asynchronous.
#19440 merged
Nov 20, 2024 -
Add backend_kwargs to XLA tests config.
#19560 merged
Nov 20, 2024 -
[XLA:TPU:MSA] Refactor some utility functions from algorithm and buffer_interval_comparator into msa/utils.
#18131 merged
Nov 20, 2024 -
[XLA:GPU] Use ShuffleOp to reverse the order of elements in a vector.
#19486 merged
Nov 20, 2024 -
Remove unused gpu_types.h include from nccl_collective_thunk.cc
#19545 merged
Nov 20, 2024 -
[XLA:GPU] Use HloPredicateIsOp in collective_select_folder
#19525 merged
Nov 20, 2024 -
PR #19237: [GPU] Fix passing of key-value store handle from client to compiler.
#19254 merged
Nov 20, 2024 -
[tsl:concurrency] Keep AsyncValueRef a part of CountDownAsyncValueRef State
#19522 merged
Nov 20, 2024 -
Fix comments in
convolution_test_1d.cc
#19529 merged
Nov 20, 2024 -
[XLA:CPU] Add benchmarks for 2D strided convolutions
#19434 merged
Nov 20, 2024 -
Refactor exhaustive_test_main into a separate library target
#19510 merged
Nov 20, 2024 -
[Code-Health] Resolve the following technical debt issue:
#19485 merged
Nov 20, 2024 -
[Code-Health] Resolve the following technical debt issue:
#19511 merged
Nov 20, 2024 -
[XLA-Code-Health] Resolve 2 instances of the following issue: Todo (resolved)
#19509 merged
Nov 20, 2024 -
[Code-Health] Resolve the following technical debt issue: Todo(resolved) in CUDA BUILD file.
#19489 merged
Nov 20, 2024 -
[tsl] CountDownAsyncValueRef: enforce memory ordering around fetch_sub
#19551 merged
Nov 20, 2024 -
[xla:cpu] Add initial implementation of NanoRt backends for XLA:CPU
#19405 merged
Nov 20, 2024 -
[XLA:GPU] Adjust GetNumWarps heuristic in Tiled Cost Model.
#19540 merged
Nov 20, 2024 -
Merge sparsity_layout.patch into sparse_dot.patch
#19547 merged
Nov 20, 2024 -
Remove unused and add used headers in hlo_runner_main and create_client
#19546 merged
Nov 20, 2024 -
[XLA:ALGEBRAIC_SIMPLIFIER] Turn constant all-gather into broadcast
#19502 merged
Nov 20, 2024 -
Use OpTrait::DotLike to identify dot-like operations
#19142 merged
Nov 20, 2024 -
[TritonGPU] Add DotLike trait to SparseDotOp
#19000 merged
Nov 20, 2024 -
PR #19393: [GPU] Horizontal loop fusion: pass bitcasts when looking for fusion candidates.
#19527 merged
Nov 20, 2024 -
PR #19363: Loop Counter Increment in Collective Pipeliner
#19532 merged
Nov 20, 2024 -
Move SparseDotMetaEncodingAttr inside xla
#18436 merged
Nov 20, 2024 -
[XLA:GPU] Combine pipelined instructions as much as possible by default.
#19148 merged
Nov 20, 2024 -
delete hlo-legalize-to-memref-unranked.mlir
#19534 merged
Nov 20, 2024 -
Remove unused GpuAsmOpts parameter from Cholesky and TriangularSolveThunks
#19531 merged
Nov 20, 2024 -
Make g_trace_filter_bitmap atomic to avoid race across threads.
#19507 merged
Nov 20, 2024 -
Remove custom compilation call from DynamicSharedMemoryTest
#19530 merged
Nov 20, 2024 -
Account for optional channel ID in send/recv error message
#19325 merged
Nov 20, 2024 -
Add a pattern matcher for ragged dot HLO.
#18675 merged
Nov 20, 2024 -
[XLA:SPMD] Add HLO annotation to disable collective matmul in SPMD.
#19382 merged
Nov 20, 2024 -
Cleanup. Refactor GetGatherScatterBatchParallelDims. No behavior change.
#19520 merged
Nov 20, 2024 -
[Upkeep][XLA-Code-Health] Resolve the following technical debt issue: Todo(resolved)
#19517 merged
Nov 20, 2024 -
Stop using gpu_types.h where it's not needed.
#19512 merged
Nov 20, 2024 -
Remove dead ShapeContainsToken in HLO verifier
#19326 merged
Nov 20, 2024 -
Reverts c6f26d4efa1ac071a1b39c9a81dae6214da22be3
#19515 merged
Nov 20, 2024 -
Remove unneeded use of gpu_types.h in topk_kernel_test.cc.
#19493 merged
Nov 20, 2024 -
[IFRT] Legalize IFRT dialect into VIFRT dialect.
#19506 merged
Nov 19, 2024 -
Move some tests to public XLA:CPU API
#19501 merged
Nov 19, 2024 -
Add backend kwargs to xla tests.
#19503 merged
Nov 19, 2024 -
[Upkeep][XLA-Code-Health] Resolve 2 instances of the following issue: Todo (resolved)
#19495 merged
Nov 19, 2024 -
Remove unneeded xla:statusor dependency.
#19496 merged
Nov 19, 2024 -
Move stable hlo compile test to XLA:CPU public API
#19498 merged
Nov 19, 2024 -
Replace MockGpuExecutor with MockStreamExecutor in the only use.
#19449 merged
Nov 19, 2024 -
Refactor TF wheel build rule, common python rules and flag names.
#19409 merged
Nov 19, 2024 -
Integrate LLVM at llvm/llvm-project@b03a747fc0fc
#19481 merged
Nov 19, 2024 -
Add terminology page
#19270 merged
Nov 19, 2024 -
[XLA:GPU] Allow auto layout in multihost HLO runner.
#19345 merged
Nov 19, 2024 -
Reverts e0c3ce3f243cac08aeb22a1589587a47ba767501
#19480 merged
Nov 19, 2024 -
PR #19426: [ROCm] Disable gpu_too_many_blocks_test for rocm
#19471 merged
Nov 19, 2024 -
Integrate LLVM at llvm/llvm-project@6e1acdcdc1b3
#19470 merged
Nov 19, 2024 -
[XLA:GPU] Consolidate HS flags for
exec_time_optimization_effort
>= 0.2.#19147 merged
Nov 19, 2024 -
PR #19432: [GPU] Sharded autotuning: exchange only results of the latest compilation.
#19433 merged
Nov 19, 2024 -
PR #17593: [ROCm] Include clang-19 and clang-20 headers
#19291 merged
Nov 19, 2024 -
[XLA:GPU] Eagerly free temporary constants created during constant folding to reduce peak heap memory.
#19369 merged
Nov 19, 2024 -
[XLA:GPU] Plumb through AppendPipelinedInstruction.
#19106 merged
Nov 19, 2024 -
PR #19372: [GPU] Consider small kInput fusions with concatenations in the horizontal loop fusion pass.
#19423 merged
Nov 19, 2024 -
Cleanup. Refactor gather_scatter_handler. Remove unused code. Replace sort with stable_sort.
#19454 merged
Nov 19, 2024 -
[xla:codegen] First version of KernelEmitter and KernelSpec APIs for shared codegen
#19458 merged
Nov 19, 2024 -
Update XLA's
warnings.bazelrc
#19453 merged
Nov 18, 2024 -
Push xla test 'main' deps to leaf test targets
#19443 merged
Nov 18, 2024 -
[tsl:concurrency] Add CountDownAsyncValueRef to concurrency library
#19403 merged
Nov 18, 2024 -
Update gpu_test_kernels_fatbin to use runfiles
#19439 merged
Nov 18, 2024 -
Reverts 82da46162c824df74e78e0ed20a19ff33542b283
#19447 merged
Nov 18, 2024 -
Change launch_id in HLO runner to be 1-based.
#19445 merged
Nov 18, 2024 -
Support explicit batch dimensions for gather/scatter in HLO evaluator.
#19400 merged
Nov 18, 2024 -
[StableHLO] Add VhloDialect to the dependent dialects of vhlo-to-version
#19441 merged
Nov 18, 2024 -
PR #19313: [nfc] Cleaning up dynamic slice fusion
#19402 merged
Nov 18, 2024 -
Style improvement: use a common method to create ArraySpec
#19418 merged
Nov 18, 2024 -
[XLA:GPU] Add
xla_experimental_exec_time_optimization_effort
flag.#19111 merged
Nov 18, 2024 -
[XLA:GPU] Delete some unnecessary asterisks from
SoftmaxRewriterTriton
'sIsSupportedBroadcastOfParameter
.#19318 merged
Nov 18, 2024 -
Integrate LLVM at llvm/llvm-project@64c455077abe
#19422 merged
Nov 18, 2024 -
IFRT proxy: asynchronous and faster
MakeArrayFromHostBuffer
#19407 merged
Nov 17, 2024 -
Create HloUnaryInstruction to support result_accuracy for certain unary functions.
#19353 merged
Nov 16, 2024 -
Move new field to end of struct PJRT_Api and update struct size accordingly.
#19401 merged
Nov 16, 2024 -
Disable scatter_determinism_expander as it causes failure on internal test.
#19412 merged
Nov 16, 2024 -
Bring down_cast function to ::tsl namespace
#19404 merged
Nov 16, 2024 -
Move
GetStartIndicesDimsToOutputDims
togather_scatter_utils.h
#19399 merged
Nov 16, 2024 -
[StableHLO] Don't require inlining for shape refinement.
#18957 merged
Nov 16, 2024 -
PR #19116: [XLA:CPU] [oneDNN] Refactoring oneDNN Memory Util for Custom Call oneDNN Thunk Runtime Support
#19121 merged
Nov 15, 2024 -
Temporarily exclude
xla/tsl
from buildifier checks#19397 merged
Nov 15, 2024 -
Integrate LLVM at llvm/llvm-project@b3134fa23383
#19385 merged
Nov 15, 2024 -
kAsyncStart is missing from CalculatePostOrderScheduleHelper() which will
#19358 merged
Nov 15, 2024 -
PR #18825: [GPU] GEMM fusions: let fusing effective parameters and their broadcasts in the epilogues.
#19387 merged
Nov 15, 2024 -
[IFRT] Ensure that VIFRT td file structure matches that of IFRT dialect.
#19396 merged
Nov 15, 2024 -
Handle ragged dot in precision config methods.
#19376 merged
Nov 15, 2024 -
Both the xla_internal_test_main library and the gunit_main library define the symbol main.
#19320 merged
Nov 15, 2024 -
#sdy unskip test and change to correct result now that we resolve real conflicts.
#19394 merged
Nov 15, 2024 -
Move IFRT lib to XLA:CPU Public API
#19392 merged
Nov 15, 2024 -
Add IndexCastUIOp to AxisInfoAnalysis.
#19349 merged
Nov 15, 2024 -
Move IFRT integration tests to public XLA:CPU API
#19391 merged
Nov 15, 2024 -
[XLA] Remove redundant transposes early on
#19383 merged
Nov 15, 2024 -
Reverts 06e520756a5d9288d8b689fe387acab810bc3de8
#19350 merged
Nov 15, 2024 -
[XLA:GPU] Use namespace aliases for mlir::mhlo and mlir::arith in triton_fusion_emitter_legacy_matmul.cc
#19389 merged
Nov 15, 2024 -
[XLA] Fix a race in
SlowOperationAlarm
and add a test#19386 merged
Nov 15, 2024 -
Convert any compute on host memory into host compute, including dynamic-slice.
#18624 merged
Nov 15, 2024 -
[XLA:CPU] Add CPU scatter benchmarks
#19307 merged
Nov 15, 2024 -
Modernize API for cuda_asm_compiler functions
#19303 merged
Nov 15, 2024 -
[XLA:GPU] Fix bug related to usage of DynamicPadder pass.
#19295 merged
Nov 15, 2024 -
[XLA:GPU] Support auto layouts in StreamExecutor PjRT path when using HLO as input
#19340 merged
Nov 15, 2024 -
PR #16775: Add test for EmitReducePrecisionIR
#19381 merged
Nov 15, 2024 -
[XLA] Add an option to disable verifier in HloExtractor
#19378 merged
Nov 15, 2024 -
Fix AlgebraicSimplifier so that it does not eliminate host offloading copies.
#19317 merged
Nov 15, 2024 -
PR #19112: [GPU] GEMM fusion: support more broadcasts.
#19247 merged
Nov 15, 2024 -
PR #19342: [ROCm] Skip unsupported tests in dot_algorithms_test
#19343 merged
Nov 15, 2024 -
Move profiler plugin functions to a separate pybind11 module
#19234 merged
Nov 15, 2024 -
Move layout changing and dtype changing copies out of host memory space'
#19268 merged
Nov 15, 2024 -
[XLA] Ensure infeed/outfeed ordering
#19274 merged
Nov 15, 2024 -
Moves our C++ ShardingConfig to third_party/ so that it can be used by hlo_module_config.h, etc.
#19362 merged
Nov 15, 2024 -
Add PJRT_Buffer_CopyRawToHost to PJRT C API.
#19266 merged
Nov 14, 2024 -
Integrate LLVM at llvm/llvm-project@03730cdd3d10
#19357 merged
Nov 14, 2024 -
PR #79955: Update the curl dependency: 8.6.0 -> 8.11.0.
#19352 merged
Nov 14, 2024 -
Fix formatting in developer_guide.md
#19348 merged
Nov 14, 2024 -
Add filter functionality to TraceMeRecorder to filter events based on filter parameter.
#19359 merged
Nov 14, 2024 -
gemm_rewriter_test: Split and optimize to allow passing in coverage mode.
#19324 merged
Nov 14, 2024 -
Disable libnvjitlink by default in OSS.
#19355 merged
Nov 14, 2024 -
Remove unneeded caching of parent Executor in RocmCommandBuffer class.
#19356 merged
Nov 14, 2024 -
Remove TODOs associated with fixed bugs.
#19347 merged
Nov 14, 2024 -
[IFRT] Add ifrt-translate mlir tool for verifying dialect conversions.
#19329 merged
Nov 14, 2024 -
Make gpu_executor.h only used by RocmExecutor and CudaExecutor.
#19280 merged
Nov 14, 2024 -
Integrate LLVM at llvm/llvm-project@97298853b4de
#19333 merged
Nov 14, 2024 -
Reverts 2a7890387f812c17fb5f17eec961ee52ac3e059d
#19337 merged
Nov 14, 2024 -
PR #18248: cuda 12.6.2
#19293 merged
Nov 14, 2024 -
Remove unused variable (NFC).
#19332 merged
Nov 14, 2024 -
[XLA:CPU] Add benchmarks for 1D strided convolutions
#19261 merged
Nov 14, 2024 -
Enable libnvjitlink by default in OSS
#19257 merged
Nov 14, 2024
330 Pull requests opened by 17 people
-
PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#19335 opened
Nov 14, 2024 -
Integrate LLVM at llvm/llvm-project@627b8f87e2c4
#19341 opened
Nov 14, 2024 -
Add S4/U4 support for Reshape.
#19351 opened
Nov 14, 2024 -
Reverts 27b4c50d002c85311e5ca9ba2b6089df70c359b3
#19354 opened
Nov 14, 2024 -
PR #16775: Add test for EmitReducePrecisionIR
#19370 opened
Nov 14, 2024 -
HostOffloader: use HloInstructionSet/HloInstructionMap instead of absl::flat_hash_set/absl::flat_hash_map.
#19373 opened
Nov 15, 2024 -
Integrate LLVM at llvm/llvm-project@b3134fa23383
#19374 opened
Nov 15, 2024 -
[XLA] Move ScopedLoggingTimerAndTraceMe into OSS level
#19375 opened
Nov 15, 2024 -
[PJRT C API] Move PJRT_Buffer_CopyRawToHost to end of PjRT API struct. Update struct size accordingly.
#19395 opened
Nov 15, 2024 -
Reverts 590b36f89d8cb038e9e3929aeaea6e60451ef3fc
#19406 opened
Nov 15, 2024 -
Add GPU topology proto python target.
#19410 opened
Nov 16, 2024 -
Integrate LLVM at llvm/llvm-project@64c455077abe
#19411 opened
Nov 16, 2024 -
Fix issues on explicit batch dimensions in xla sharding propagation and spmd partitioner.
#19414 opened
Nov 16, 2024 -
to be removed
#19421 opened
Nov 18, 2024 -
PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#19424 opened
Nov 18, 2024 -
Mess-up the API.
#19446 opened
Nov 18, 2024 -
In progress. Not ready for review.
#19448 opened
Nov 18, 2024 -
Integrate LLVM at llvm/llvm-project@6e1acdcdc1b3
#19450 opened
Nov 18, 2024 -
PR #19451: Setting xla_gpu_multi_streamed_windowed_einsum to true by default
#19455 opened
Nov 18, 2024 -
In progress. Not ready for review. Approach 2.
#19456 opened
Nov 19, 2024 -
Remove unused `tf_additional_core_deps`
#19457 opened
Nov 19, 2024 -
Add ExplicitStreamAnnotationAsyncWrapper pass
#19462 opened
Nov 19, 2024 -
PR #19429: Fix deterministic scatter expander pass and re-enable it by default
#19466 opened
Nov 19, 2024 -
PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#19473 opened
Nov 19, 2024 -
Reverts 06e8ef8ec494877e459ff357f9479613a472c67b
#19474 opened
Nov 19, 2024 -
Enable some tests for float8 types in elemental_ir_emitter_test.cc
#19477 opened
Nov 19, 2024 -
[XLA:GPU] Support bf16 for sort
#19479 opened
Nov 19, 2024 -
Switch JAX to PJRT C API, for GPU.
#19492 opened
Nov 19, 2024 -
[Upkeep][XLA-Code-Health] Resolve 3 instances of the following issue: Todo (resolved)
#19494 opened
Nov 19, 2024 -
Split out MemorySpaceAssignmentTest class for re-use.
#19497 opened
Nov 19, 2024 -
Integrate LLVM at llvm/llvm-project@68b7ab127f58
#19505 opened
Nov 19, 2024 -
[Upkeep][XLA-Code-Health] Resolve 2 instances of the following issue: Todo (resolved)
#19514 opened
Nov 19, 2024 -
Assert same channel ID if present
#19521 opened
Nov 20, 2024 -
PR #19451: Remove xla_gpu_multi_streamed_windowed_einsum
#19537 opened
Nov 20, 2024 -
[JAX] Ignore process index when calculating module hash.
#19550 opened
Nov 20, 2024 -
[tsl:concurrency] NFC: Align CountDownAsyncValueRef atomics to cache line boundary
#19554 opened
Nov 20, 2024 -
Integrate LLVM at llvm/llvm-project@68b7ab127f58
#19556 opened
Nov 20, 2024 -
Enable a test
#19557 opened
Nov 20, 2024 -
Noop in OSS
#19559 opened
Nov 20, 2024 -
Bump rules_python version to 0.39.0
#19567 opened
Nov 20, 2024 -
Revert `edf18ce` and fix launch dimension triplet
#19582 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in dot_operand_converter.cc
#19586 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in conv_rewriter.cc
#19587 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in softmax_rewriter_triton_test.cc
#19588 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in layout_assignment.cc
#19589 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in dot_dimension_sorter.cc
#19591 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_norm_rewriter.cc
#19593 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in fusion_block_level_rewriter_test.cc
#19594 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in horizontal_loop_fusion.cc
#19595 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in triton_fusion_numerics_verifier.cc
#19598 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in rename_fusions.cc
#19600 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_vectorize_convolutions.cc
#19601 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in scheduling_instruction_annotator.cc
#19602 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in pipelined_p2p_rewriter.cc
#19603 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in gpusolver_rewriter.cc
#19604 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in multi_output_fusion.cc
#19605 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in variadic_op_splitter.cc
#19606 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fusion_compiler.cc
#19607 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in schedule_postprocessing.cc
#19608 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in gemm_rewriter.cc
#19610 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in double_buffer_loop_unrolling_test.cc
#19611 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in stream_attribute_annotator.cc
#19612 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in horizontal_loop_fusion_test.cc
#19613 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in dynamic_slice_fusion_rewriter.cc
#19614 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in priority_fusion.cc
#19615 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in nest_gemm_fusion.cc
#19616 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in windowed_einsum_handler_test.cc
#19617 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in dot_normalizer.cc
#19618 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in triangular_solve_rewriter.cc
#19619 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in sanitize_constant_names.cc
#19620 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in scatter_expander.cc
#19621 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in softmax_rewriter_triton.cc
#19623 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in move_copy_to_users.cc
#19624 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in dot_algorithm_rewriter.cc
#19625 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in copy_fusion.cc
#19626 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in reduce_scatter_creator.cc
#19627 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in cudnn_fused_conv_rewriter.cc
#19628 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in windowed_einsum_handler.cc
#19630 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in command_buffer_scheduling_test.cc
#19631 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in scatter_slice_simplifier.cc
#19632 opened
Nov 21, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op in sort_rewriter.cc
#19633 opened
Nov 21, 2024 -
Testing after force
#19636 opened
Nov 21, 2024 -
Integrate LLVM at llvm/llvm-project@7b5b01980c3b
#19646 opened
Nov 21, 2024 -
[ROCm] Implement hermetic rocm dependency
#19649 opened
Nov 21, 2024 -
Reverts 93f9dda11dff8eb32aa0e287ed1350ba334ddd6d
#19653 opened
Nov 21, 2024 -
Legalize more dialects in shardy
#19663 opened
Nov 21, 2024 -
Integrate LLVM at llvm/llvm-project@a12e79a85fc1
#19673 opened
Nov 22, 2024 -
[xla] Add S4/U4 support to reshape
#19681 opened
Nov 22, 2024 -
Remove custom logging implementation from TSL
#19694 opened
Nov 22, 2024 -
[XLA:GPU] Move dot_algorithm_rewriter from xla/servive/gpu/transforms to xla/hlo/transforms
#19695 opened
Nov 22, 2024 -
Upgrade Abseil to latest LTS branch (lts_2024_07_22). Also update or-tools to v9.11.
#19696 opened
Nov 22, 2024 -
Explicit stream annotation: Set ExecutionStreamId based on frontend attribute
#19699 opened
Nov 22, 2024 -
[XLA:GPU] Consolidate sort optimizations in a dedicated compiler pass.
#19725 opened
Nov 23, 2024 -
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#19728 opened
Nov 23, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op
#19733 opened
Nov 23, 2024 -
Automated Code Change
#19737 opened
Nov 24, 2024 -
Automated Code Change
#19738 opened
Nov 24, 2024 -
Automated Code Change
#19739 opened
Nov 24, 2024 -
Automated Code Change
#19740 opened
Nov 24, 2024 -
Automated Code Change
#19741 opened
Nov 24, 2024 -
Automated Code Change
#19743 opened
Nov 24, 2024 -
Automated Code Change
#19744 opened
Nov 24, 2024 -
Automated Code Change
#19745 opened
Nov 24, 2024 -
Automated Code Change
#19746 opened
Nov 24, 2024 -
Automated Code Change
#19747 opened
Nov 24, 2024 -
Automated Code Change
#19748 opened
Nov 24, 2024 -
Automated Code Change
#19750 opened
Nov 24, 2024 -
Automated Code Change
#19751 opened
Nov 24, 2024 -
Automated Code Change
#19757 opened
Nov 25, 2024 -
Automated Code Change
#19758 opened
Nov 25, 2024 -
Automated Code Change
#19759 opened
Nov 25, 2024 -
Automated Code Change
#19761 opened
Nov 25, 2024 -
Reverts changelist 633145137
#19790 opened
Nov 25, 2024 -
Support multiple CPU tasks in a TfrtCpuClient.
#19796 opened
Nov 25, 2024 -
Change the default partitioning method for gather and scatter to kExplicitBatch.
#19809 opened
Nov 25, 2024 -
Add lax.composite primitive
#19813 opened
Nov 26, 2024 -
[XLA:Collective] Support normalizing all-reduce
#19819 opened
Nov 26, 2024 -
Automated Code Change
#19822 opened
Nov 26, 2024 -
PR #18840: [NVIDIA] Support larger head dim for cudnn fmha
#19828 opened
Nov 26, 2024 -
PR #19026: [NVIDIA GPU] LHS enhancement for multiple collective resources
#19833 opened
Nov 26, 2024 -
Add support for async dynamic slice fusion
#19834 opened
Nov 26, 2024 -
Reverts eab45d5da2fab59de8c02678c2b7b9ae69d9fef8
#19842 opened
Nov 26, 2024 -
Integrate LLVM at llvm/llvm-project@b214ca82daee
#19844 opened
Nov 26, 2024 -
Test new approach.
#19854 opened
Nov 26, 2024 -
Add GC support for PjitFunctionCache
#19856 opened
Nov 26, 2024 -
Add function to CombineInferenceStatsResults
#19859 opened
Nov 26, 2024 -
Reenable XLA Windows build
#19860 opened
Nov 26, 2024 -
Automated Code Change
#19872 opened
Nov 27, 2024 -
PR #19754: [ROCm] Enable gemm fusion autotuner.
#19873 opened
Nov 27, 2024 -
Automated Code Change
#19875 opened
Nov 27, 2024 -
Integrate LLVM at llvm/llvm-project@b214ca82daee
#19881 opened
Nov 27, 2024 -
PR #19669: Replace custom free-threading flag by rules_python is_py_freethreaded in Nanobind
#19886 opened
Nov 27, 2024 -
[ROCm] fix fp8 in buffercomparator
#19896 opened
Nov 27, 2024 -
Integrate LLVM at llvm/llvm-project@92a15dd7482f
#19897 opened
Nov 27, 2024 -
Add function to convert Multi XSpace to InferenceStats.
#19899 opened
Nov 27, 2024 -
Replace dependency on pre-built wheels with `py_import` dependency.
#19900 opened
Nov 27, 2024 -
[XLA:CPU] Extend the custom algorithm for transposed convolutions
#19921 opened
Nov 28, 2024 -
Make Tensorflow runtime handle errors returned by CudaExecutor::CreateDeviceDescription
#19926 opened
Nov 28, 2024 -
Automated Code Change
#19932 opened
Nov 29, 2024 -
PR #19927: [cuda] Warn about ptxas versions before CUDA 12.6.3
#19942 opened
Nov 29, 2024 -
Automated Code Change
#19952 opened
Nov 30, 2024 -
[XLA:GPU] Clean-up DeviceDescription
#19958 opened
Nov 30, 2024 -
Automated Code Change
#19969 opened
Dec 2, 2024 -
Automated Code Change
#19970 opened
Dec 2, 2024 -
Automated Code Change
#19971 opened
Dec 2, 2024 -
Automated Code Change
#19972 opened
Dec 2, 2024 -
Integrate LLVM at llvm/llvm-project@010317e1731d
#19974 opened
Dec 2, 2024 -
Automated Code Change
#19977 opened
Dec 2, 2024 -
Add link to openxla.org homepage to README
#19981 opened
Dec 2, 2024 -
Integrate LLVM at llvm/llvm-project@c1dcf75a7cf8
#19989 opened
Dec 2, 2024 -
[XLA:GPU] Fix typo in custom_kernel_fusion.h
#19994 opened
Dec 2, 2024 -
CHLO defns for a ragged dot that permits ragged batch and contraction.
#19996 opened
Dec 2, 2024 -
Internal CI/CD change
#20000 opened
Dec 2, 2024 -
[AllGatherCodeMotion] Add a pass that can code motion all-gathers in while loops.
#20023 opened
Dec 2, 2024 -
Update clone with new operands to handle ragged all-to-all.
#20029 opened
Dec 2, 2024 -
Fix resource number calculation in the latency hiding scheduler.
#20035 opened
Dec 2, 2024 -
Integrate LLVM at llvm/llvm-project@1250a1db1a37
#20039 opened
Dec 3, 2024 -
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#20042 opened
Dec 3, 2024 -
<REMOVE THIS TAG ONCE DIFFBASE IS SUBMITTED>
#20047 opened
Dec 3, 2024 -
Automated Code Change
#20055 opened
Dec 3, 2024 -
Automated Code Change
#20058 opened
Dec 3, 2024 -
Automated Code Change
#20059 opened
Dec 3, 2024 -
Automated Code Change
#20062 opened
Dec 3, 2024 -
Automated Code Change
#20065 opened
Dec 3, 2024 -
Integrate LLVM at llvm/llvm-project@2af2634c64b1
#20066 opened
Dec 3, 2024 -
Automated Code Change
#20067 opened
Dec 3, 2024 -
Integrate LLVM at llvm/llvm-project@1d6ab189be03
#20069 opened
Dec 3, 2024 -
Automated Code Change
#20074 opened
Dec 3, 2024 -
Automated Code Change
#20075 opened
Dec 3, 2024 -
Automated Code Change
#20077 opened
Dec 3, 2024 -
Integrate LLVM at llvm/llvm-project@f71ea4bc1b01
#20079 opened
Dec 3, 2024 -
PR #19096: Add F4E2M1FN and F8E8M0FNU types
#20080 opened
Dec 3, 2024 -
#sdy support StableHLO from refining Shardy ops with polymorphic shapes
#20081 opened
Dec 3, 2024 -
[NVIDIA GPU] Fix mem p2p init in collective permute thunk
#20086 opened
Dec 3, 2024 -
[XLA] Add -Werror=mismatched-tags to --config=linux.
#20088 opened
Dec 3, 2024 -
incorporate Range into while_loop_analysis
#20099 opened
Dec 3, 2024 -
Change RBE container
#20101 opened
Dec 3, 2024 -
Automated Code Change
#20111 opened
Dec 4, 2024 -
Automated Code Change
#20115 opened
Dec 4, 2024 -
[XLA:GPU] Enable native BF16 arithmetic ops on Ampere GPUs.
#20119 opened
Dec 4, 2024 -
PR #20006: [XLA:GPU] Only allow horizontal loop fusion for default memory space
#20121 opened
Dec 4, 2024 -
Make Tensorflow runtime handle errors returned by CudaExecutor::CreateDeviceDescription
#20131 opened
Dec 4, 2024 -
Integrate LLVM at llvm/llvm-project@026fbe519e16
#20132 opened
Dec 4, 2024 -
Enable XLA:TPU client to take in parameters
#20133 opened
Dec 4, 2024 -
Integrate LLVM at llvm/llvm-project@026fbe519e16
#20141 opened
Dec 4, 2024 -
Fix wrong index when inserting a copy from host to a call's parameter
#20150 opened
Dec 4, 2024 -
Add support for emitting a kCall from an async instruction
#20155 opened
Dec 4, 2024 -
[ROCm] Fix tests breaking due to change in threads_per_warp
#20164 opened
Dec 4, 2024 -
PR #19699: Explicit stream annotation: Set ExecutionStreamId based on frontend attribute
#20169 opened
Dec 5, 2024 -
[XLA:GPU:Rocm] Fix wavefront size on gfx10+
#20170 opened
Dec 5, 2024 -
Remove obsolete TODOs and the ones associated with closed bug in platforms/xla/*
#20172 opened
Dec 5, 2024 -
Add an interface to MSA to allow post allocation transformation on hlo module.
#20175 opened
Dec 5, 2024 -
Map split dimensions for bitcast positions.
#20181 opened
Dec 5, 2024 -
Introduce shape splitting into MSA.
#20182 opened
Dec 5, 2024 -
Create shim targets for most commonly used TSL headers in preparation for updating users
#20183 opened
Dec 5, 2024 -
Update references to JAX's GitHub repo
#20202 opened
Dec 5, 2024 -
Run sparse tests in presubmit
#20203 opened
Dec 5, 2024 -
PR #19669: Replace custom free-threading flag by rules_python is_py_freethreaded in Nanobind
#20207 opened
Dec 5, 2024 -
Update TF RBE container hashes.
#20217 opened
Dec 5, 2024 -
Added `GetAliveTasks` RPC to coordination service.
#20218 opened
Dec 5, 2024 -
Added `jax.experimental.multihost_utils.alive_devices` API.
#20220 opened
Dec 5, 2024 -
Integrate LLVM at llvm/llvm-project@c54616ea481a
#20224 opened
Dec 5, 2024 -
remove use_host_argument_layout and use executable layouts.
#20227 opened
Dec 5, 2024 -
Support evaluation in the absence of layouts when possible
#20232 opened
Dec 5, 2024 -
Reverts 3d31c48c719d331d432132b3e0c2c5ce52650675
#20240 opened
Dec 6, 2024 -
[hlo-opt] move the tool to hlo/tools/ directory
#20242 opened
Dec 6, 2024 -
Automated Code Change
#20243 opened
Dec 6, 2024 -
Integrate LLVM at llvm/llvm-project@e6cf5d2863b7
#20245 opened
Dec 6, 2024 -
Integrate LLVM at llvm/llvm-project@9d2351ab9aff
#20252 opened
Dec 6, 2024 -
#sdy remove `mhlo.tan` `CustomCallOp` from the registry as a StableHLO equivalent now exists.
#20253 opened
Dec 6, 2024 -
[XLA:CPU] Expose better parallelism control
#20256 opened
Dec 6, 2024 -
Integrate LLVM at llvm/llvm-project@9d2351ab9aff
#20267 opened
Dec 6, 2024 -
Integrate LLVM at llvm/llvm-project@12bdeba76eef
#20271 opened
Dec 6, 2024 -
[ROCm] Emit allocas on function entry in lower_tensors.cc
#20274 opened
Dec 6, 2024 -
Remove barely used function is_static_dimension.
#20276 opened
Dec 6, 2024 -
Remove barely used and unnecessary XlaShape::SetProto
#20277 opened
Dec 6, 2024 -
Add Shape::FromProto static factory method and replace usage.
#20278 opened
Dec 7, 2024 -
change usage from constructor to factory method
#20279 opened
Dec 7, 2024 -
Automated Code Change
#20281 opened
Dec 7, 2024 -
Automated Code Change
#20282 opened
Dec 7, 2024 -
Automated Code Change
#20287 opened
Dec 8, 2024 -
cuda_root_path: Find cuda libraries when installed with conda packages
#20288 opened
Dec 8, 2024 -
PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#20302 opened
Dec 9, 2024 -
[XLA:CPU] Fix crash due to OOM in XLA's custom convolution algorithm.
#20304 opened
Dec 9, 2024 -
Support dumping unoptimised hlo snapshots with argumnets in pjrt.
#20306 opened
Dec 9, 2024 -
[XLA:GPU] Simplify `AllocatorRetry::AllocateRaw` control flow.
#20308 opened
Dec 9, 2024 -
Added thread annotations to variable that claimed to not need them.
#20314 opened
Dec 9, 2024 -
Removed unused `GetCoordinationServiceInstance` code.
#20315 opened
Dec 9, 2024 -
Fix a breaking test
#20317 opened
Dec 9, 2024 -
Integrate LLVM at llvm/llvm-project@be2df95e9281
#20326 opened
Dec 9, 2024 -
[XLA] Handle empty leaf nodes in an original value
#20327 opened
Dec 9, 2024 -
Migrate gather_operation_test to always use PjRt for its test backend.
#20329 opened
Dec 9, 2024 -
Add runtime support for host calculation of offsets in ds fusion
#20332 opened
Dec 9, 2024 -
Copy result_accuracy when deriving new instruction.
#20333 opened
Dec 9, 2024 -
Fix missing template value
#20340 opened
Dec 9, 2024 -
Add Shape::FromProto static factory method to replace constructor.
#20341 opened
Dec 10, 2024 -
Automated Code Change
#20348 opened
Dec 10, 2024 -
Automated Code Change
#20350 opened
Dec 10, 2024 -
Replace std::string_view with absl::string_view
#20351 opened
Dec 10, 2024 -
Automated Code Change
#20352 opened
Dec 10, 2024 -
Automated Code Change
#20355 opened
Dec 10, 2024 -
Automated Code Change
#20357 opened
Dec 10, 2024 -
Automated Code Change
#20360 opened
Dec 10, 2024 -
Automated Code Change
#20363 opened
Dec 10, 2024 -
[XLA:GPU] Move calls to `allocation_attr.freed_by_func` into `BFCAllocator::AllocateRawInternal`.
#20368 opened
Dec 10, 2024 -
Automated Code Change
#20370 opened
Dec 10, 2024 -
Automated Code Change
#20371 opened
Dec 10, 2024 -
[XLA:GPU] Fix logging bug.
#20372 opened
Dec 10, 2024 -
[XLA:GPU] Re-run the host-offload-legalize pass after CSE
#20374 opened
Dec 10, 2024 -
Improve speed and collision/aliasing resistance of Absl::HashOf() on HloModule/HloComputation:
#20375 opened
Dec 10, 2024 -
Remove `//tensorflow/tsl/platform/strcat.h`.
#20386 opened
Dec 10, 2024 -
Add result accuracy attribute to ExpOp in StableHlo.
#20388 opened
Dec 10, 2024 -
PR #20313: Fix async wrapper to walk child computations
#20391 opened
Dec 10, 2024 -
Revert PR #20025: [NVIDIA GPU] LHS enhancement for collective multi-streaming
#20393 opened
Dec 10, 2024 -
[XLA:GPU] Add NVSHMEM library and initialization test
#20395 opened
Dec 10, 2024 -
Respect DeviceAssignment in HloRunnerPjRt.
#20398 opened
Dec 10, 2024 -
Elementwise Ops in Collective Pipeliner
#20399 opened
Dec 10, 2024 -
Add a HLOPrintOption to control printing of the parameter number for parameters.
#20402 opened
Dec 10, 2024 -
Add has_megacore and has_merged_vmem in XPlane stats.
#20404 opened
Dec 10, 2024 -
Automated Code Change
#20410 opened
Dec 11, 2024 -
Automated Code Change
#20412 opened
Dec 11, 2024 -
Automated Code Change
#20413 opened
Dec 11, 2024 -
Integrate LLVM at llvm/llvm-project@eacdbc269e5f
#20414 opened
Dec 11, 2024 -
Automated Code Change
#20423 opened
Dec 11, 2024 -
Layout assignment: Reset memory space in result layout
#20426 opened
Dec 11, 2024 -
[XLA:FFI] Fix C API
#20428 opened
Dec 11, 2024 -
[PJRT] Expose `ExecutionContext` when executing a `LoadedExecutable`
#20429 opened
Dec 11, 2024 -
Make ErrorSpec constructor `constexpr`.
#20435 opened
Dec 11, 2024 -
PR #19649: [ROCm] Implement hermetic rocm dependency
#20437 opened
Dec 11, 2024 -
Use int64_t for thread ids instead of int32_t
#20442 opened
Dec 11, 2024 -
Add action which automatically runs CI for public OpenXLA GitHub org members
#20444 opened
Dec 11, 2024 -
Add nozapfhahn for opt provider files, to ignore expected mutant testing warnings during submissions
#20451 opened
Dec 11, 2024 -
Reverts 555ba9967696cdea8c768ac5b44e59a3582a9820
#20453 opened
Dec 12, 2024 -
PR #77927: [oneDNN] upgrading oneDNN version to 3.6
#20456 opened
Dec 12, 2024 -
Automated Code Change
#20462 opened
Dec 12, 2024 -
[XLA:CPU] Move kernel prototype properties inside of KernelApiIrBuilder
#20467 opened
Dec 12, 2024 -
[XLA:GPU] Rely on LLVM parser rather than objcopy to load fatbin in tests
#20474 opened
Dec 12, 2024 -
[XLA::CPU] Update `ElementalKernelEmitter` to take HLO instruction instead of shapes.
#20476 opened
Dec 12, 2024 -
[XLA:GPU] Force cuDNN convolutions to be assigned a `NHWC` layout from Hopper.
#20477 opened
Dec 12, 2024 -
[XLA:CPU] Improve F8E4M3 accuracy
#20478 opened
Dec 12, 2024 -
Integrate LLVM at llvm/llvm-project@03cbe42627c7
#20480 opened
Dec 12, 2024 -
Migrate StableHLO Python extension to nanobind.
#20481 opened
Dec 12, 2024 -
Remove CHECK macros from tsl/platform/default/logging.h
#20483 opened
Dec 12, 2024 -
Add missing default for
#20485 opened
Dec 12, 2024 -
Use CUPTI activity markers instead of nvtx driver callbacks for NVTX tracking.
#20488 opened
Dec 12, 2024 -
Add more safety checks to BlockingCounter
#20489 opened
Dec 12, 2024 -
PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk
#20490 opened
Dec 12, 2024 -
Support host->device dynamic-update-slice in HostOffloader.
#20492 opened
Dec 13, 2024 -
Update slop_factor flag desc in debug_options_flags.cc
#20494 opened
Dec 13, 2024 -
Make max value in Range optional to allow for Unbounded Range calculations.
#20495 opened
Dec 13, 2024 -
Make max value in Range optional to allow for Unbounded Range calculations.
#20496 opened
Dec 13, 2024 -
Integrate LLVM at llvm/llvm-project@5e53a8dadb00
#20500 opened
Dec 13, 2024 -
Replace some usage of tsl::BlockingCounter with absl::BlockingCounter.
#20501 opened
Dec 13, 2024 -
Update the forward propagation for slice instructions in xla::ShardingPropagation.
#20502 opened
Dec 13, 2024 -
fixed the typos in bfloat16_propagation_test.cc
#20503 opened
Dec 13, 2024 -
[XLA:CPU] Test elemental comparison ops
#20509 opened
Dec 13, 2024 -
[XLA:CPU] Export codegen testlib functionality via __init__.py
#20510 opened
Dec 13, 2024 -
PR #19649: [ROCm] Implement hermetic rocm dependency
#20512 opened
Dec 13, 2024 -
[XLA:CPU] Move CPU ElementalIrEmitter implementation into a separate file
#20514 opened
Dec 13, 2024 -
[XLA:CPU] Add ability to dump the LLvmIrSource to a string in the testlib
#20515 opened
Dec 13, 2024 -
Simplify `WaitAndLogIfStuck()` to use `absl::Barrier` instead of `tsl::BlockingCounter`.
#20516 opened
Dec 13, 2024 -
#sdy make mhlo_export_pipeline test use no rank maximal sharding.
#20518 opened
Dec 13, 2024 -
#sdy add constraints to maximal shardings during verification.
#20519 opened
Dec 13, 2024 -
[XLA:GPU] Synchronize compute and communication streams in NCCL group implementation
#20523 opened
Dec 13, 2024 -
Move some ShapeUtil validation to cpp. Create generic ShapeError.
#20524 opened
Dec 13, 2024 -
Remove CHECKs for inline vectors. Returning 0 is functionally correct.
#20525 opened
Dec 13, 2024 -
Temporary check to avoid cases where wrong result sharding causes a RET_CHECK to fail in hlo_sharding.cc.
#20526 opened
Dec 13, 2024 -
[XLA:GPU] update collective pipeline parallelism execution test to include nccl groups in a while loop
#20527 opened
Dec 13, 2024 -
[XLA:GPU] update microbenchmarks to include nccl groups in a while loop
#20528 opened
Dec 13, 2024 -
Integrate LLVM at llvm/llvm-project@5f72f2c8fd6c
#20531 opened
Dec 13, 2024 -
Make collective selet folder convert-aware
#20532 opened
Dec 13, 2024 -
[XLA] Add S1 and U1 as data types
#20533 opened
Dec 13, 2024 -
Remove CUDA dependencies from jaxlib wheel.
#20534 opened
Dec 13, 2024 -
Reverts a8a0a7ec72ed3960b33a061c12e43d7e5beb9087
#20535 opened
Dec 13, 2024 -
[Shardy] HLO ⇄ MHLO to HLO ⇄ StableHLO
#20536 opened
Dec 13, 2024 -
Remove legacy trivial main programs
#20537 opened
Dec 13, 2024 -
[xla:gpu] Add an option to use persistent collective cliques
#20539 opened
Dec 14, 2024 -
Optimize XLA SPMD Slice partitioner, containing the following 3 steps.
#20540 opened
Dec 14, 2024 -
[xla:cpu] Implement 2D and 3D loop parallelization
#20541 opened
Dec 14, 2024 -
[xla:cpu] Add an xnn_threadpool for wrapping ParallelLoopRunner as pthreadpool API
#20542 opened
Dec 14, 2024 -
Integrate LLVM at llvm/llvm-project@af20aff35ec3
#20543 opened
Dec 14, 2024 -
Reverts 51658ce1da33166ef422286cbc2c194f7a464f65
#20544 opened
Dec 14, 2024
11 Issues closed by 11 people
-
[BUG] Returning a matrix crashes the HLO runner
#9307 closed
Dec 11, 2024 -
Does Tensorflow XLA use MLIR?
#20125 closed
Dec 10, 2024 -
Profiling crashes when run using JAX
#4431 closed
Dec 9, 2024 -
CPU vs GPU inconsistency with at[].set ops under jit
#19716 closed
Dec 4, 2024 -
Nondeterminism in triton-autotuner and others leads to hangs in multi-process gpu execution
#7716 closed
Dec 3, 2024 -
The tiles of some instructions in Triton fusion are not powers of 2. How did this issue arise?
#19478 closed
Dec 3, 2024 -
Pallas/Triton gives incorrect result on Nvidia H100 GPU
#19780 closed
Dec 2, 2024 -
//xla/service/cpu:vectorized_reduce_with_no_vector_registers_test fails on Apple Silicon
#19830 closed
Nov 29, 2024 -
//xla/stream_executor/cuda:compilation_provider_test should be skipped by on CPU builds
#19829 closed
Nov 26, 2024 -
MatmulTest.TestF32ConstantWeights failure on arm64 linux
#19416 closed
Nov 21, 2024 -
[Support request] Concatenate fusion failed to lower to LLVM IR
#19499 closed
Nov 21, 2024
12 Issues opened by 10 people
-
Make `cache_all_gather` configurable
#20508 opened
Dec 13, 2024 -
Copies supersede OptimizationBarrier
#20440 opened
Dec 11, 2024 -
Activation offloading with scan fails in XLA with same shape pinned-host/device scan outputs
#20373 opened
Dec 10, 2024 -
Darwin CPU "LLVM ERROR: inconsistency in registered CommandLine options"
#20284 opened
Dec 8, 2024 -
Merge XSpace profiles from different hosts.
#19902 opened
Nov 27, 2024 -
PjRtFutures & Async Computations issue
#19898 opened
Nov 27, 2024 -
Explicilty interface for AcquireExternalReference and WaitExternalReference
#19845 opened
Nov 26, 2024 -
//xla/tests:complex_unary_op_test_cpu fails on macOS Apple Silicon
#19824 opened
Nov 26, 2024 -
[ROCm] failed to legalize operation 'math.exp' for exponential op with bf16 dtype
#19700 opened
Nov 22, 2024 -
[GPU bug] Memcpy local p2p leads to numerical issues (mem access problems).
#19555 opened
Nov 20, 2024 -
`stablehlo.cbrt` crashes with complex numbers when StableHLO spec supports it
#19482 opened
Nov 19, 2024 -
is set access to the previous GPUs at cudamallocasync_allocator.cc cause allocateRaw slower?
#19431 opened
Nov 18, 2024
38 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add F4E2M1FN and F8E8M0FNU types
#19096 commented on
Dec 12, 2024 • 37 new comments -
[RematerializeLargeAllGather] Add pass to rematerialize large tensor parallel all-gathers. Allow configurable bytes.
#19163 commented on
Dec 9, 2024 • 10 new comments -
[NVIDIA GPU] Support multi-operand collective-permute
#18838 commented on
Dec 6, 2024 • 3 new comments -
[XLA:CPU][oneDNN] Custom Call oneDNN Thunk Support
#18494 commented on
Dec 4, 2024 • 3 new comments -
[XLA:CPU][oneDNN] Alias result to addend when feasible
#17637 commented on
Nov 22, 2024 • 3 new comments -
[XLA:CPU][oneDNN] Move simplification pass before oneDNN pass
#19067 commented on
Nov 18, 2024 • 1 new comment -
XLA documentation for Windows
#12028 commented on
Dec 6, 2024 • 0 new comments -
Use OpTrait::DotLike instead of op name to identify dot operations
#19145 commented on
Dec 3, 2024 • 0 new comments -
Bugfix sharding related logging
#19164 commented on
Nov 20, 2024 • 0 new comments -
[SPMD] Introduce an option to limit communication around gradient accumulation in Jax scan ops.
#19166 commented on
Dec 1, 2024 • 0 new comments -
[XLA:MSA] Add dynamic-slice to async conversion in msa
#19180 commented on
Dec 5, 2024 • 0 new comments -
[MPMD-GPU] Add `is_subslice_topology` to the IFRT Topology.
#19203 commented on
Nov 18, 2024 • 0 new comments -
[MPMD-GPU] Make Pathways IFRT client get GPU topology as well.
#19204 commented on
Nov 18, 2024 • 0 new comments -
Integrate Triton up to [9732c047](https://github.com/openai/triton/commits/9732c04701bd856daca89bde38bafa4636ca56a8)
#19259 commented on
Nov 15, 2024 • 0 new comments -
Refactor JAX build wheel rule and add wheel_library targets.
#19277 commented on
Nov 21, 2024 • 0 new comments -
StableHLO Reference Interpreter PJRT Plugin
#19287 commented on
Nov 14, 2024 • 0 new comments -
PR #19026: [NVIDIA GPU] LHS enhancement for multiple collective resources
#19289 commented on
Nov 18, 2024 • 0 new comments -
PR #19275: [NVIDIA] Add fixes for supporting determinism expander for high-dimensional scatter operation and a flag to disable it
#19296 commented on
Nov 15, 2024 • 0 new comments -
Create repository rule to generate a file with python wheel version.
#19308 commented on
Dec 10, 2024 • 0 new comments -
PR #77927: [oneDNN] upgrading oneDNN version to 3.6
#19319 commented on
Nov 21, 2024 • 0 new comments -
[XLA:Python] Modify DLPack behavior with unit dimensions.
#19327 commented on
Nov 25, 2024 • 0 new comments -
PR #19208: [gpu] Make collective permute decomposer accept no channel ids
#19328 commented on
Nov 19, 2024 • 0 new comments -
Error occurred while building TensorFlow
#14129 commented on
Nov 27, 2024 • 0 new comments -
clang: error: linker command failed with exit code 1
#14321 commented on
Nov 27, 2024 • 0 new comments -
JAX PJRT Client, GPU Build Fails. Linking error
#14475 commented on
Nov 22, 2024 • 0 new comments -
Error in compilation on GPU
#15744 commented on
Nov 20, 2024 • 0 new comments -
XLA flags: No speed ups on GPUs and segmentation fault
#17103 commented on
Nov 14, 2024 • 0 new comments -
Inadequate memory consumption when using HSDP without gradient accumulation
#18090 commented on
Dec 5, 2024 • 0 new comments -
How to run onednn convolution test on CPU
#18586 commented on
Dec 2, 2024 • 0 new comments -
XLA:CPU performance regression with the min alignment changed from 16 to 64
#18611 commented on
Dec 5, 2024 • 0 new comments -
pjrt_c_api_cpu_plugin - MacOS amd and arm64 compilation
#19152 commented on
Nov 17, 2024 • 0 new comments -
Move down_cast from the tensorflow to the tsl namespace
#5951 commented on
Nov 20, 2024 • 0 new comments -
[XLA:CPU][oneDNN] Absorb Transpose into matmul whenever possible
#18410 commented on
Nov 18, 2024 • 0 new comments -
[XLA:MSA] Re-enables synchronous copy and slice conversion to async by default.
#19035 commented on
Nov 14, 2024 • 0 new comments -
[XLA:CPU][oneDNN] Handle oneDNN scalar
#19066 commented on
Nov 18, 2024 • 0 new comments -
[StableHLO] Remove XlaCallModule's MHLO dependency
#19079 commented on
Dec 13, 2024 • 0 new comments -
PR #18331: [XLA:CPU] upgrading onednn version to 3.6
#19102 commented on
Dec 12, 2024 • 0 new comments -
[HLO->MHLO] Consolidate non-pipelined async ops into MHLO ops.
#19115 commented on
Dec 5, 2024 • 0 new comments