Pulse · openxla/xla · GitHub

December 6, 2024 – December 13, 2024

Overview

267 Active pull requests

7 Active issues

160 Pull requests merged by 1 person

Configure HloPjRtTestBase with new option structs.
#20452 merged Dec 14, 2024
Extend use_parameter_layout_on_device option to ExecuteReplicated.
#20441 merged Dec 14, 2024
Integrate LLVM at llvm/llvm-project@a21f9bfe29c2
#20538 merged Dec 14, 2024
Remove a bunch of #if CUDA macro use in xla_compile_lib.
#20331 merged Dec 14, 2024
Improve compilation time by not fusing large constants into LLVM modules for XLA::CPU.
#20257 merged Dec 14, 2024
Swap output format of "MatchTrivialLoopRange" to better align with Range usage.
#20454 merged Dec 13, 2024
timespan, xplane_visitor: Support operator<=.
#20405 merged Dec 13, 2024
[XLA] Generalize type handling within InProcessCollectives
#20498 merged Dec 13, 2024
Migrate some TSL code over to ABSL equivalents
#20457 merged Dec 13, 2024
[xla:cpu] Add parallel loop runner
#20529 merged Dec 13, 2024
Stop using redzone_allocator in autotuner_util.
#20520 merged Dec 13, 2024
Add new ml-build-rbe container to the configurations of RBE used with remote config.
#20459 merged Dec 13, 2024
Integrate LLVM at llvm/llvm-project@bc29fc937c6c
#20511 merged Dec 13, 2024
Move AutotunerUtil::CreateBuffer from a static method to a method on RedzoneAllocator.
#20517 merged Dec 13, 2024
[XLA] Add some traceme annotations around XLA:CPU compilation and CPU compiler stack trace logging.
#20312 merged Dec 13, 2024
[XLA:MSA] Allow cross-program prefetch for buffers that are already pinned to alternate memory.
#20316 merged Dec 13, 2024
Replace TSL's BlockingCounter with absl's.
#20522 merged Dec 13, 2024
Add TF package import tests in CPU presubmit, continuous and nightly jobs.
#20493 merged Dec 13, 2024
Add has_megacore and has_merged_vmem in XPlane stats.
#20491 merged Dec 13, 2024
[StableHLO] Add shape refinement callback to specify additional patterns.
#20321 merged Dec 13, 2024
[XLA] Return the number of overlapping chunks instead of chunks themselves for tracking outstanding prefetches/evictions
#20438 merged Dec 13, 2024
[IFRT] Add option to compile IFRT IR atom programs using Sdy
#20521 merged Dec 13, 2024
[XLA:Collective] Add utility functions.
#20038 merged Dec 13, 2024
[XLA:Python] Use &PyArray_Type rather than looking up numpy.ndarray via Python attrs.
#20513 merged Dec 13, 2024
PR #20109: Create a workflow to run CPU benchmarks
#20475 merged Dec 13, 2024
Reenable buildifier for all files under xla/, fix warnings
#19866 merged Dec 13, 2024
Remove unneeded #ifdef'ed dependency.
#20328 merged Dec 13, 2024
[xla:python] Add support for stateful FFI calls registered via Python.
#16789 merged Dec 13, 2024
Integrate LLVM at llvm/llvm-project@5e53a8dadb00
#20507 merged Dec 13, 2024
[XLA:CPU] Add shape method python binding to Literal
#20466 merged Dec 13, 2024
Replace xla proto dependency to make TF happy
#20506 merged Dec 13, 2024
[XLA:GPU] Readability and performance nits.
#20473 merged Dec 13, 2024
Automated Code Change
#20504 merged Dec 13, 2024
[XLA:GPU] Clean up TF_RET_CHECKs in TritonFusionAnalysis::ExecuteForDotFusion.
#20505 merged Dec 13, 2024
Automated Code Change
#20416 merged Dec 13, 2024
Use absl::Mutex::Await() instead of absl::CondVar::Wait() in XLA.
#19957 merged Dec 13, 2024
Automated Code Change
#20418 merged Dec 13, 2024
Automated Code Change
#20468 merged Dec 13, 2024
Fix range analysis bug.
#20479 merged Dec 13, 2024
[XLA] Guarantee ordering of infeeds/outfeeds across called computations
#19827 merged Dec 13, 2024
Add dynamic_arg_layouts to C++ cache and add a test in JAX which checks for cache miss if layouts of inputs arguments are different to the same jitted function.
#20484 merged Dec 13, 2024
[XLA:Python] Mark from_python and from_cpp methods of nanobind typecasters as noexcept.
#20482 merged Dec 13, 2024
[XLA:GPU] add execution tests for NCCL group with partially pipelined send/recv instructions
#20178 merged Dec 13, 2024
Fix an issue in PartitionGatherTrivialSlicedOperandDimensions when handling out-of-bound indices.
#20401 merged Dec 13, 2024
Make ReshapeOp return MHLO_AnyTensor instead of MHLO_StaticShapeTensor.
#18964 merged Dec 13, 2024
Adds a "SHARDING" ProfileType to HloModuleProto.
#20283 merged Dec 13, 2024
Implement flatten one level with keys in C++ and use it for the prefix/equality error printing.
#20044 merged Dec 13, 2024
Switch multihost runner to public XLA:GPU target
#20486 merged Dec 13, 2024
Add a tuple sharding when creating get-tuple-element(tuple(single_result)).
#20126 merged Dec 13, 2024
Update py_import macros for the ability to unpack additional wheels in the same folder as the main wheel.
#20089 merged Dec 13, 2024
Update the Windows Docker CI image to include the C++ ATL library.
#20487 merged Dec 12, 2024
Make CompiledMemoryStats::ToProto() const.
#20458 merged Dec 12, 2024
#sdy add option to avoid escaping attribute when adding to frontend attrs.
#20472 merged Dec 12, 2024
Integrate LLVM at llvm/llvm-project@0876c11ceeb0
#20465 merged Dec 12, 2024
[XLA:GPU][Emitters] Move gpu/fusions/ir to backends/gpu/codegen/ir
#20471 merged Dec 12, 2024
[XLA:GPU] Propagate all profiling failures to gemm_fusion_autotuner.cc.
#20383 merged Dec 12, 2024
PR #20463: Updated multiple typo's
#20470 merged Dec 12, 2024
Automated Code Change
#20411 merged Dec 12, 2024
[XLA:GPU][NFC] Modularize a little bit gpu_hlo_schedule.cc.
#20430 merged Dec 12, 2024
PR #19099: [XLA:CPU][oneDNN] Add post-ops for oneDNN Convolutions
#20445 merged Dec 12, 2024
[XLA:GPU] Remove restriction on bitcasts being a no-op with regards to tiling
#20433 merged Dec 12, 2024
[XLA:GPU] Deprecate diamond chains in SoftmaxRewriterTriton.
#20431 merged Dec 12, 2024
[XLA:GPU] Rollback introduction of EmitterLocOpBuilder.
#20464 merged Dec 12, 2024
[XLA:CPU] Implement ElementalKernelEmitter
#20378 merged Dec 12, 2024
#sdy Add unique module name in Shardy dumps
#20436 merged Dec 12, 2024
Fix build failure in nvjitlink_impl.cc
#20460 merged Dec 12, 2024
Integrate LLVM at llvm/llvm-project@19bc282320ba
#20450 merged Dec 12, 2024
Automated Code Change
#20285 merged Dec 12, 2024
[XLA:GPU] Add a nested builder arg to the EmitXlaLoop builder.
#20443 merged Dec 12, 2024
Automated Code Change
#20362 merged Dec 12, 2024
Automated Code Change
#20361 merged Dec 12, 2024
Skip inserting into frontend attr if the (key, value) pair already exists and override if key exists
#20255 merged Dec 12, 2024
[Mosaic] Pad trailing transposes chunks with zeros.
#20390 merged Dec 12, 2024
Split ROCm-specific backend calls into their own targets.
#20337 merged Dec 12, 2024
[xla-auto-sharding] Fix potential dangling pointer (reference) bug.
#20446 merged Dec 12, 2024
[xla:cpu] Add missing files from openxla/xla#16438
#20447 merged Dec 11, 2024
[XLA:GPU] Schedule send/recv early if pipeline parallelism ops enabled
#20403 merged Dec 11, 2024
Internal CI/CD change
#20212 merged Dec 11, 2024
[XLA] Avoid redundant lookup in ConsumeResource
#20434 merged Dec 11, 2024
Remove the test_hlo_pjrt_runner tag.
#20392 merged Dec 11, 2024
Add B100 to default Nvidia gpu backends
#20407 merged Dec 11, 2024
Migrate broadcast_test to always use PjRt for its test backend.
#20345 merged Dec 11, 2024
Add a default error spec field to HloRunnerAgnosticTestBase.
#20397 merged Dec 11, 2024
[XLA:CPU] Benchmark for grouped strided convolutions
#20425 merged Dec 11, 2024
[XLA GPU] Add additional unit tests for IsPtxRegisterAllocationError.
#20424 merged Dec 11, 2024
Integrate LLVM at llvm/llvm-project@eacdbc269e5f
#20420 merged Dec 11, 2024
[XLA:GPU] Implement NcclRaggedAllToAllThunk.
#20265 merged Dec 11, 2024
Fix infinite loop in TopKSplitter
#20422 merged Dec 11, 2024
PR #20214: Evaluate simple offset values, if possible
#20303 merged Dec 11, 2024
PR #20313: Fix async wrapper to walk child computations
#20421 merged Dec 11, 2024
#sdy Swap XLA Shardy passes to use StableHLO instead of MHLO as much as possible.
#19939 merged Dec 11, 2024
Internal: add missing dependency on numpy
#20298 merged Dec 11, 2024
[XLA:GPU] Use absl::Status payload to more precisely identify register allocation errors.
#20396 merged Dec 11, 2024
[XLA:CPU] Use KernelApiIrBuilder in IrEmitter2
#20380 merged Dec 11, 2024
[XLA:GPU] Introduce EmitterLocOpBuilder that could annotate the mlir with the file:line annotations that are visible in the triton dump
#19472 merged Dec 11, 2024
[XLA:CPU] Add new KernelApiIrBuilder
#20379 merged Dec 11, 2024
PR #20334: [nfc] clang-format is failing on unrelated PRs because of this
#20419 merged Dec 11, 2024
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#20347 merged Dec 11, 2024
[XLA:GPU] Decrease VLOG levels to start logging at level 2 in softmax_rewriter_triton.cc.
#20415 merged Dec 11, 2024
fix audit wheel compliance issues for pywrap rules
#20356 merged Dec 11, 2024
[XLA:LatencyHidingScheduler] Fix crash with non-standard async ops whose done op does not consume the respective start op and that they might have a reverse data dependency (e.g., done -> ops -> start).
#20408 merged Dec 11, 2024
Add LayoutModeToXlaShape util to header so that users can get xla::Shape with layout without an XlaComputation.
#20409 merged Dec 11, 2024
Cleanup inconsistent names/comments
#20406 merged Dec 11, 2024
Remove unused ErrorSpec.
#20400 merged Dec 11, 2024
[XLA:GPU] Guard send/recv schedule manipulation behind xla_gpu_enable_pipelined_p2p flag
#20382 merged Dec 10, 2024
Migrate all_reduce_test to always use PjRt for its test backend.
#20344 merged Dec 10, 2024
Replace std::string_view with absl::string_view
#20349 merged Dec 10, 2024
Respect DeviceAssignment in HloRunnerPjRt.
#20389 merged Dec 10, 2024
Migrate gather_operation_test to always use PjRt for its test backend.
#20339 merged Dec 10, 2024
[HLO->MHLO] Consolidate non-pipelined async ops into MHLO ops.
#20309 merged Dec 10, 2024
Add implicit device step tracking.
#20261 merged Dec 10, 2024
Migrate copy_test to always use PjRt for its test backend.
#20343 merged Dec 10, 2024
[XLA:GPU] Remove --xla_gpu_experimental_enable_triton_softmax_priority_fusion.
#20384 merged Dec 10, 2024
Set implicitTrunc on APInt creation
#20387 merged Dec 10, 2024
Integrate LLVM at llvm/llvm-project@0f7b3a9407d2
#20377 merged Dec 10, 2024
Replace std::string_view with absl::string_view
#20322 merged Dec 10, 2024
Add MHLO mhlo.custom_call @ragged_all_to_all -> HLO RaggedAllToAll pass
#20354 merged Dec 10, 2024
[XLA:CPU] Update ShapeToIrType & PrimitiveTypeToIrType to take a LLVMContext
#20381 merged Dec 10, 2024
[Cleanup] Use HloPredicateIs(Not)Op
#19734 merged Dec 10, 2024
[xla:gpu] Extracted CreateTritonPipeline into a separate target
#20367 merged Dec 10, 2024
[xla:cpu] Add an object pool for efficient xnnpack object pooling
#20307 merged Dec 10, 2024
Add test_migrated_to_hlo_runner_pjrt tag to xla_test.
#20338 merged Dec 10, 2024
Simplify GetGatherScatterOperandPassthroughDims since offset_dims or inserted_window_dims are sorted in gather/scatter operations.
#20335 merged Dec 10, 2024
[xla-auto-sharding] Add BRKGA heuristic as an XLA auto-sharding option.
#20330 merged Dec 10, 2024
Integrate LLVM at llvm/llvm-project@be2df95e9281
#20365 merged Dec 10, 2024
Automated Code Change
#20364 merged Dec 10, 2024
PR #16438: aarch64: implement onednn matmul operator with explicit reorders
#20369 merged Dec 10, 2024
[XLA:GPU] Disable cutlass dynamic-update-slice rewrite on V100.
#20366 merged Dec 10, 2024
PR #20025: [NVIDIA GPU] LHS enhancement for collective multi-streaming
#20213 merged Dec 10, 2024
[XLA:GPU] Use absl instead of tensorflow functions/types.
#20311 merged Dec 10, 2024
Add debug option for failing the PTX compilation on register spilling
#20358 merged Dec 10, 2024
Automated Code Change
#20290 merged Dec 10, 2024
Replace std::string_view with absl::string_view
#20353 merged Dec 10, 2024
[XLA] Don't bail when encountering complex loop pipelining patterns
#20258 merged Dec 10, 2024
Add an interpreter PjRt client registry for testing.
#20336 merged Dec 10, 2024
Add a new test base class for a default PjRt test runner w/ SE interpreter.
#20320 merged Dec 10, 2024
Add RegisterMlirToHloDependentDialects to register required dependent dialects
#20318 merged Dec 10, 2024
Register WhileLoopAllReduceCodeMotion pass to the opt tool
#20342 merged Dec 10, 2024
Create OpStatsToRooflineModel, in preparation of Roofline Model creation
#20280 merged Dec 10, 2024
IFRT proxy: Add profiler spans to all entrypoints at the client.
#20325 merged Dec 9, 2024
[XLA] Fix latency hiding scheduler when faced with annotated no-op instructions.
#20324 merged Dec 9, 2024
Modify XlaOp Exp to accept result accuracy as an argument. We want to be able to select implementation of exp depending on this config.
#19820 merged Dec 9, 2024
Stop using DISABLED_ON_GPU_ROCM for GPU tests, and instead just use GTEST_SKIP with a runtime check for ROCm.
#20323 merged Dec 9, 2024
Add CPU specific passes for hlo-opt tool.
#20154 merged Dec 9, 2024
[Cleanup] Use HloPredicateIs(Not)Op
#20012 merged Dec 9, 2024
Split up cusolver_context into CUDA-specific and ROCM-specific parts.
#20036 merged Dec 9, 2024
[XLA:GPU] Remove unused xla_experimental_exec_time_optimization_effort flag.
#20297 merged Dec 9, 2024
Add support for CUDA 12.6.3 and CUDNN 9.5.1/9.6.0.
#20310 merged Dec 9, 2024
[xla:gpu] Removed redundant parameter from CompileTritonToLLVM
#20301 merged Dec 9, 2024
[XLA:CPU] Add a Python extension for KernelRunner.
#20196 merged Dec 9, 2024
[XLA:Python] Use nanobind::isinstance from upstream nanobind, delete xla::nb_isinstance.
#20250 merged Dec 9, 2024
Reland of PR #19571. Fix test FunctionalHloRunnerTest.ShardedAutotuningWorks
#20197 merged Dec 9, 2024
[XLA:GPU] Use absl::Microseconds instead of doing duration arithmetic.
#20305 merged Dec 9, 2024
Automated Code Change
#20194 merged Dec 9, 2024
PR #18989: [AllGatherCSE] Add a pass that CSEs all-gathers on parameters.
#20300 merged Dec 9, 2024
PR #20241: Updated Typo's in multiple documents
#20291 merged Dec 9, 2024
[xla] Update warnings.bazelrc
#20299 merged Dec 9, 2024
Reverts 26df9b97719020df83e882b31bbe4a7f2cbbdff5
#20293 merged Dec 9, 2024
Automated Code Change
#20289 merged Dec 9, 2024
Automated Code Change
#20286 merged Dec 8, 2024

107 Pull requests opened by 10 people

Automated Code Change
#20281 opened Dec 7, 2024
Automated Code Change
#20282 opened Dec 7, 2024
Automated Code Change
#20287 opened Dec 8, 2024
cuda_root_path: Find cuda libraries when installed with conda packages
#20288 opened Dec 8, 2024
Allow multiple invocations of exchange topologies, as long as the topology is consistent across restarts.
#20292 opened Dec 9, 2024
PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#20302 opened Dec 9, 2024
[XLA:CPU] Fix crash due to OOM in XLA's custom convolution algorithm.
#20304 opened Dec 9, 2024
Support dumping unoptimised hlo snapshots with argumnets in pjrt.
#20306 opened Dec 9, 2024
[XLA:GPU] Simplify `AllocatorRetry::AllocateRaw` control flow.
#20308 opened Dec 9, 2024
Added thread annotations to variable that claimed to not need them.
#20314 opened Dec 9, 2024
Removed unused `GetCoordinationServiceInstance` code.
#20315 opened Dec 9, 2024
Fix a breaking test
#20317 opened Dec 9, 2024
Integrate LLVM at llvm/llvm-project@be2df95e9281
#20326 opened Dec 9, 2024
[XLA] Handle empty leaf nodes in an original value
#20327 opened Dec 9, 2024
Migrate gather_operation_test to always use PjRt for its test backend.
#20329 opened Dec 9, 2024
Add runtime support for host calculation of offsets in ds fusion
#20332 opened Dec 9, 2024
Copy result_accuracy when deriving new instruction.
#20333 opened Dec 9, 2024
Fix missing template value
#20340 opened Dec 9, 2024
Add Shape::FromProto static factory method to replace constructor.
#20341 opened Dec 10, 2024
Automated Code Change
#20348 opened Dec 10, 2024
Automated Code Change
#20350 opened Dec 10, 2024
Replace std::string_view with absl::string_view
#20351 opened Dec 10, 2024
Automated Code Change
#20352 opened Dec 10, 2024
Automated Code Change
#20355 opened Dec 10, 2024
Automated Code Change
#20357 opened Dec 10, 2024
Automated Code Change
#20360 opened Dec 10, 2024
Automated Code Change
#20363 opened Dec 10, 2024
[XLA:GPU] Move calls to `allocation_attr.freed_by_func` into `BFCAllocator::AllocateRawInternal`.
#20368 opened Dec 10, 2024
Automated Code Change
#20370 opened Dec 10, 2024
Automated Code Change
#20371 opened Dec 10, 2024
[XLA:GPU] Fix logging bug.
#20372 opened Dec 10, 2024
[XLA:GPU] Re-run the host-offload-legalize pass after CSE
#20374 opened Dec 10, 2024
Improve speed and collision/aliasing resistance of Absl::HashOf() on HloModule/HloComputation:
#20375 opened Dec 10, 2024
Remove `//tensorflow/tsl/platform/strcat.h`.
#20386 opened Dec 10, 2024
Add result accuracy attribute to ExpOp in StableHlo.
#20388 opened Dec 10, 2024
PR #20313: Fix async wrapper to walk child computations
#20391 opened Dec 10, 2024
Revert PR #20025: [NVIDIA GPU] LHS enhancement for collective multi-streaming
#20393 opened Dec 10, 2024
[XLA:GPU] Add NVSHMEM library and initialization test
#20395 opened Dec 10, 2024
Respect DeviceAssignment in HloRunnerPjRt.
#20398 opened Dec 10, 2024
Elementwise Ops in Collective Pipeliner
#20399 opened Dec 10, 2024
Add a HLOPrintOption to control printing of the parameter number for parameters.
#20402 opened Dec 10, 2024
Add has_megacore and has_merged_vmem in XPlane stats.
#20404 opened Dec 10, 2024
Automated Code Change
#20410 opened Dec 11, 2024
Automated Code Change
#20412 opened Dec 11, 2024
Automated Code Change
#20413 opened Dec 11, 2024
Integrate LLVM at llvm/llvm-project@eacdbc269e5f
#20414 opened Dec 11, 2024
Automated Code Change
#20423 opened Dec 11, 2024
Layout assignment: Reset memory space in result layout
#20426 opened Dec 11, 2024
[XLA:FFI] Fix C API
#20428 opened Dec 11, 2024
[PJRT] Expose `ExecutionContext` when executing a `LoadedExecutable`
#20429 opened Dec 11, 2024
Make ErrorSpec constructor `constexpr`.
#20435 opened Dec 11, 2024
PR #19649: [ROCm] Implement hermetic rocm dependency
#20437 opened Dec 11, 2024
public message needed only for shardy change, the change will be separately submitted in cl/703502551 and this message removes
#20439 opened Dec 11, 2024
Use int64_t for thread ids instead of int32_t
#20442 opened Dec 11, 2024
Add action which automatically runs CI for public OpenXLA GitHub org members
#20444 opened Dec 11, 2024
Add nozapfhahn for opt provider files, to ignore expected mutant testing warnings during submissions
#20451 opened Dec 11, 2024
Reverts 555ba9967696cdea8c768ac5b44e59a3582a9820
#20453 opened Dec 12, 2024
PR #77927: [oneDNN] upgrading oneDNN version to 3.6
#20456 opened Dec 12, 2024
Automated Code Change
#20461 opened Dec 12, 2024
Automated Code Change
#20462 opened Dec 12, 2024
[XLA:CPU] Move kernel prototype properties inside of KernelApiIrBuilder
#20467 opened Dec 12, 2024
[XLA:GPU] Rely on LLVM parser rather than objcopy to load fatbin in tests
#20474 opened Dec 12, 2024
[XLA::CPU] Update `ElementalKernelEmitter` to take HLO instruction instead of shapes.
#20476 opened Dec 12, 2024
[XLA:GPU] Force cuDNN convolutions to be assigned a `NHWC` layout from Hopper.
#20477 opened Dec 12, 2024
[XLA:CPU] Improve F8E4M3 accuracy
#20478 opened Dec 12, 2024
Integrate LLVM at llvm/llvm-project@03cbe42627c7
#20480 opened Dec 12, 2024
Migrate StableHLO Python extension to nanobind.
#20481 opened Dec 12, 2024
Remove CHECK macros from tsl/platform/default/logging.h
#20483 opened Dec 12, 2024
Add missing default for
#20485 opened Dec 12, 2024
Use CUPTI activity markers instead of nvtx driver callbacks for NVTX tracking.
#20488 opened Dec 12, 2024
Add more safety checks to BlockingCounter
#20489 opened Dec 12, 2024
PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk
#20490 opened Dec 12, 2024
Support host->device dynamic-update-slice in HostOffloader.
#20492 opened Dec 13, 2024
Update slop_factor flag desc in debug_options_flags.cc
#20494 opened Dec 13, 2024
Make max value in Range optional to allow for Unbounded Range calculations.
#20495 opened Dec 13, 2024
Make max value in Range optional to allow for Unbounded Range calculations.
#20496 opened Dec 13, 2024
Extend the visibility of //third_party/tensorflow/compiler/xla/stream_executor/gpu:gpu_init_impl to the friends group.
#20499 opened Dec 13, 2024
Integrate LLVM at llvm/llvm-project@5e53a8dadb00
#20500 opened Dec 13, 2024
Replace some usage of tsl::BlockingCounter with absl::BlockingCounter.
#20501 opened Dec 13, 2024
Update the forward propagation for slice instructions in xla::ShardingPropagation.
#20502 opened Dec 13, 2024
fixed the typos in bfloat16_propagation_test.cc
#20503 opened Dec 13, 2024
[XLA:CPU] Test elemental comparison ops
#20509 opened Dec 13, 2024
[XLA:CPU] Export codegen testlib functionality via __init__.py
#20510 opened Dec 13, 2024
PR #19649: [ROCm] Implement hermetic rocm dependency
#20512 opened Dec 13, 2024
[XLA:CPU] Move CPU ElementalIrEmitter implementation into a separate file
#20514 opened Dec 13, 2024
[XLA:CPU] Add ability to dump the LLvmIrSource to a string in the testlib
#20515 opened Dec 13, 2024
Simplify `WaitAndLogIfStuck()` to use `absl::Barrier` instead of `tsl::BlockingCounter`.
#20516 opened Dec 13, 2024
#sdy make mhlo_export_pipeline test use no rank maximal sharding.
#20518 opened Dec 13, 2024
#sdy add constraints to maximal shardings during verification.
#20519 opened Dec 13, 2024
[XLA:GPU] Synchronize compute and communication streams in NCCL group implementation
#20523 opened Dec 13, 2024
Move some ShapeUtil validation to cpp. Create generic ShapeError.
#20524 opened Dec 13, 2024
Remove CHECKs for inline vectors. Returning 0 is functionally correct.
#20525 opened Dec 13, 2024
Temporary check to avoid cases where wrong result sharding causes a RET_CHECK to fail in hlo_sharding.cc.
#20526 opened Dec 13, 2024
[XLA:GPU] update collective pipeline parallelism execution test to include nccl groups in a while loop
#20527 opened Dec 13, 2024
[XLA:GPU] update microbenchmarks to include nccl groups in a while loop
#20528 opened Dec 13, 2024
Use absl::string_view instead of std::string_view as some environments (e.g. Android) don't provide std::string_view.
#20530 opened Dec 13, 2024
Integrate LLVM at llvm/llvm-project@5f72f2c8fd6c
#20531 opened Dec 13, 2024
Make collective selet folder convert-aware
#20532 opened Dec 13, 2024
[XLA] Add S1 and U1 as data types
#20533 opened Dec 13, 2024
Remove CUDA dependencies from jaxlib wheel.
#20534 opened Dec 13, 2024
Reverts a8a0a7ec72ed3960b33a061c12e43d7e5beb9087
#20535 opened Dec 13, 2024
[Shardy] HLO ⇄ MHLO to HLO ⇄ StableHLO
#20536 opened Dec 13, 2024
Remove legacy trivial main programs
#20537 opened Dec 13, 2024
[xla:gpu] Add an option to use persistent collective cliques
#20539 opened Dec 14, 2024
Optimize XLA SPMD Slice partitioner, containing the following 3 steps.
#20540 opened Dec 14, 2024
[xla:cpu] Implement 2D and 3D loop parallelization
#20541 opened Dec 14, 2024
[xla:cpu] Add an xnn_threadpool for wrapping ParallelLoopRunner as pthreadpool API
#20542 opened Dec 14, 2024

3 Issues closed by 3 people

[BUG] Returning a matrix crashes the HLO runner
#9307 closed Dec 11, 2024
Does Tensorflow XLA use MLIR?
#20125 closed Dec 10, 2024
Profiling crashes when run using JAX
#4431 closed Dec 9, 2024

4 Issues opened by 4 people

Make `cache_all_gather` configurable
#20508 opened Dec 13, 2024
Copies supersede OptimizationBarrier
#20440 opened Dec 11, 2024
Activation offloading with scan fails in XLA with same shape pinned-host/device scan outputs
#20373 opened Dec 10, 2024
Darwin CPU "LLVM ERROR: inconsistency in registered CommandLine options"
#20284 opened Dec 8, 2024

33 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[ROCm] Implement hermetic rocm dependency
#19649 commented on Dec 11, 2024 • 9 new comments
Add ExplicitStreamAnnotationAsyncWrapper pass
#19462 commented on Dec 11, 2024 • 8 new comments
[ROCm] Fix tests breaking due to change in threads_per_warp
#20164 commented on Dec 12, 2024 • 1 new comment
Explicit stream annotation: Set ExecutionStreamId based on frontend attribute
#19699 commented on Dec 11, 2024 • 1 new comment
[NVIDIA GPU] Fix mem p2p init in collective permute thunk
#20086 commented on Dec 12, 2024 • 0 new comments
PR #20006: [XLA:GPU] Only allow horizontal loop fusion for default memory space
#20121 commented on Dec 11, 2024 • 0 new comments
Make Tensorflow runtime handle errors returned by CudaExecutor::CreateDeviceDescription
#20131 commented on Dec 10, 2024 • 0 new comments
Add support for emitting a kCall from an async instruction
#20155 commented on Dec 12, 2024 • 0 new comments
[XLA:GPU:Rocm] Fix wavefront size on gfx10+
#20170 commented on Dec 7, 2024 • 0 new comments
Introduce shape splitting into MSA.
#20182 commented on Dec 11, 2024 • 0 new comments
Create shim targets for most commonly used TSL headers in preparation for updating users
#20183 commented on Dec 13, 2024 • 0 new comments
[hlo-opt] move the tool to hlo/tools/ directory
#20242 commented on Dec 11, 2024 • 0 new comments
Automated Code Change
#20243 commented on Dec 7, 2024 • 0 new comments
[XLA:CPU] Expose better parallelism control
#20256 commented on Dec 10, 2024 • 0 new comments
In progress experimentation for supporting JAX Arrays with variable-width strings (i.e., with dtype = StringDType).
#20269 commented on Dec 14, 2024 • 0 new comments
[ROCm] Emit allocas on function entry in lower_tensors.cc
#20274 commented on Dec 9, 2024 • 0 new comments
Add Shape::FromProto static factory method and replace usage.
#20278 commented on Dec 10, 2024 • 0 new comments
change usage from constructor to factory method
#20279 commented on Dec 10, 2024 • 0 new comments
#sdy support StableHLO from refining Shardy ops with polymorphic shapes
#20081 commented on Dec 9, 2024 • 0 new comments
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#20042 commented on Dec 10, 2024 • 0 new comments
[AllGatherCodeMotion] Add a pass that can code motion all-gathers in while loops.
#20023 commented on Dec 10, 2024 • 0 new comments
CHLO defns for a ragged dot that permits ragged batch and contraction.
#19996 commented on Dec 13, 2024 • 0 new comments
[XLA:GPU] Clean-up DeviceDescription
#19958 commented on Dec 11, 2024 • 0 new comments
[XLA:CPU] Extend the custom algorithm for transposed convolutions
#19921 commented on Dec 12, 2024 • 0 new comments
Add support for async dynamic slice fusion
#19834 commented on Dec 11, 2024 • 0 new comments
[XLA:Collective] Support normalizing all-reduce
#19819 commented on Dec 13, 2024 • 0 new comments
Revert `edf18ce` and fix launch dimension triplet
#19582 commented on Dec 10, 2024 • 0 new comments
Split out MemorySpaceAssignmentTest class for re-use.
#19497 commented on Dec 14, 2024 • 0 new comments
Create repository rule to generate a file with python wheel version.
#19308 commented on Dec 10, 2024 • 0 new comments
[RematerializeLargeAllGather] Add pass to rematerialize large tensor parallel all-gathers. Allow configurable bytes.
#19163 commented on Dec 9, 2024 • 0 new comments
PR #18331: [XLA:CPU] upgrading onednn version to 3.6
#19102 commented on Dec 12, 2024 • 0 new comments
Add F4E2M1FN and F8E8M0FNU types
#19096 commented on Dec 12, 2024 • 0 new comments
[StableHLO] Remove XlaCallModule's MHLO dependency
#19079 commented on Dec 13, 2024 • 0 new comments