-
Notifications
You must be signed in to change notification settings - Fork 451
Insights: openxla/xla
Overview
Could not load contribution data
Please try again later
160 Pull requests merged by 1 person
-
Configure HloPjRtTestBase with new option structs.
#20452 merged
Dec 14, 2024 -
Extend use_parameter_layout_on_device option to
ExecuteReplicated
.#20441 merged
Dec 14, 2024 -
Integrate LLVM at llvm/llvm-project@a21f9bfe29c2
#20538 merged
Dec 14, 2024 -
Remove a bunch of #if CUDA macro use in xla_compile_lib.
#20331 merged
Dec 14, 2024 -
Improve compilation time by not fusing large constants into LLVM modules for XLA::CPU.
#20257 merged
Dec 14, 2024 -
Swap output format of "MatchTrivialLoopRange" to better align with Range usage.
#20454 merged
Dec 13, 2024 -
timespan, xplane_visitor: Support operator<=.
#20405 merged
Dec 13, 2024 -
[XLA] Generalize type handling within InProcessCollectives
#20498 merged
Dec 13, 2024 -
Migrate some TSL code over to ABSL equivalents
#20457 merged
Dec 13, 2024 -
[xla:cpu] Add parallel loop runner
#20529 merged
Dec 13, 2024 -
Stop using redzone_allocator in autotuner_util.
#20520 merged
Dec 13, 2024 -
Add new
ml-build-rbe
container to the configurations of RBE used with remote config.#20459 merged
Dec 13, 2024 -
Integrate LLVM at llvm/llvm-project@bc29fc937c6c
#20511 merged
Dec 13, 2024 -
Move AutotunerUtil::CreateBuffer from a static method to a method on RedzoneAllocator.
#20517 merged
Dec 13, 2024 -
[XLA] Add some traceme annotations around XLA:CPU compilation and CPU compiler stack trace logging.
#20312 merged
Dec 13, 2024 -
[XLA:MSA] Allow cross-program prefetch for buffers that are already pinned to alternate memory.
#20316 merged
Dec 13, 2024 -
Replace TSL's BlockingCounter with absl's.
#20522 merged
Dec 13, 2024 -
Add TF package import tests in CPU presubmit, continuous and nightly jobs.
#20493 merged
Dec 13, 2024 -
Add has_megacore and has_merged_vmem in XPlane stats.
#20491 merged
Dec 13, 2024 -
[StableHLO] Add shape refinement callback to specify additional patterns.
#20321 merged
Dec 13, 2024 -
[IFRT] Add option to compile IFRT IR atom programs using Sdy
#20521 merged
Dec 13, 2024 -
[XLA:Collective] Add utility functions.
#20038 merged
Dec 13, 2024 -
[XLA:Python] Use &PyArray_Type rather than looking up numpy.ndarray via Python attrs.
#20513 merged
Dec 13, 2024 -
PR #20109: Create a workflow to run CPU benchmarks
#20475 merged
Dec 13, 2024 -
Reenable
buildifier
for all files underxla/
, fix warnings#19866 merged
Dec 13, 2024 -
Remove unneeded #ifdef'ed dependency.
#20328 merged
Dec 13, 2024 -
[xla:python] Add support for stateful FFI calls registered via Python.
#16789 merged
Dec 13, 2024 -
Integrate LLVM at llvm/llvm-project@5e53a8dadb00
#20507 merged
Dec 13, 2024 -
[XLA:CPU] Add shape method python binding to Literal
#20466 merged
Dec 13, 2024 -
Replace xla proto dependency to make TF happy
#20506 merged
Dec 13, 2024 -
[XLA:GPU] Readability and performance nits.
#20473 merged
Dec 13, 2024 -
Automated Code Change
#20504 merged
Dec 13, 2024 -
[XLA:GPU] Clean up
TF_RET_CHECK
s inTritonFusionAnalysis::ExecuteForDotFusion
.#20505 merged
Dec 13, 2024 -
Automated Code Change
#20416 merged
Dec 13, 2024 -
Use absl::Mutex::Await() instead of absl::CondVar::Wait() in XLA.
#19957 merged
Dec 13, 2024 -
Automated Code Change
#20418 merged
Dec 13, 2024 -
Automated Code Change
#20468 merged
Dec 13, 2024 -
Fix range analysis bug.
#20479 merged
Dec 13, 2024 -
[XLA] Guarantee ordering of infeeds/outfeeds across called computations
#19827 merged
Dec 13, 2024 -
[XLA:Python] Mark from_python and from_cpp methods of nanobind typecasters as noexcept.
#20482 merged
Dec 13, 2024 -
[XLA:GPU] add execution tests for NCCL group with partially pipelined send/recv instructions
#20178 merged
Dec 13, 2024 -
Fix an issue in
PartitionGatherTrivialSlicedOperandDimensions
when handling out-of-bound indices.#20401 merged
Dec 13, 2024 -
Make ReshapeOp return MHLO_AnyTensor instead of MHLO_StaticShapeTensor.
#18964 merged
Dec 13, 2024 -
Adds a "SHARDING" ProfileType to HloModuleProto.
#20283 merged
Dec 13, 2024 -
Implement flatten one level with keys in C++ and use it for the prefix/equality error printing.
#20044 merged
Dec 13, 2024 -
Switch multihost runner to public XLA:GPU target
#20486 merged
Dec 13, 2024 -
Add a tuple sharding when creating get-tuple-element(tuple(single_result)).
#20126 merged
Dec 13, 2024 -
Update
py_import
macros for the ability to unpack additional wheels in the same folder as the main wheel.#20089 merged
Dec 13, 2024 -
Update the Windows Docker CI image to include the C++ ATL library.
#20487 merged
Dec 12, 2024 -
Make CompiledMemoryStats::ToProto() const.
#20458 merged
Dec 12, 2024 -
#sdy add option to avoid escaping attribute when adding to frontend attrs.
#20472 merged
Dec 12, 2024 -
Integrate LLVM at llvm/llvm-project@0876c11ceeb0
#20465 merged
Dec 12, 2024 -
[XLA:GPU][Emitters] Move gpu/fusions/ir to backends/gpu/codegen/ir
#20471 merged
Dec 12, 2024 -
[XLA:GPU] Propagate all profiling failures to
gemm_fusion_autotuner.cc
.#20383 merged
Dec 12, 2024 -
PR #20463: Updated multiple typo's
#20470 merged
Dec 12, 2024 -
Automated Code Change
#20411 merged
Dec 12, 2024 -
[XLA:GPU][NFC] Modularize a little bit gpu_hlo_schedule.cc.
#20430 merged
Dec 12, 2024 -
PR #19099: [XLA:CPU][oneDNN] Add post-ops for oneDNN Convolutions
#20445 merged
Dec 12, 2024 -
[XLA:GPU] Remove restriction on
bitcast
s being a no-op with regards to tiling#20433 merged
Dec 12, 2024 -
[XLA:GPU] Deprecate diamond chains in
SoftmaxRewriterTriton
.#20431 merged
Dec 12, 2024 -
[XLA:GPU] Rollback introduction of
EmitterLocOpBuilder
.#20464 merged
Dec 12, 2024 -
[XLA:CPU] Implement ElementalKernelEmitter
#20378 merged
Dec 12, 2024 -
#sdy Add unique module name in Shardy dumps
#20436 merged
Dec 12, 2024 -
Fix build failure in nvjitlink_impl.cc
#20460 merged
Dec 12, 2024 -
Integrate LLVM at llvm/llvm-project@19bc282320ba
#20450 merged
Dec 12, 2024 -
Automated Code Change
#20285 merged
Dec 12, 2024 -
[XLA:GPU] Add a nested builder arg to the EmitXlaLoop builder.
#20443 merged
Dec 12, 2024 -
Automated Code Change
#20362 merged
Dec 12, 2024 -
Automated Code Change
#20361 merged
Dec 12, 2024 -
Skip inserting into frontend attr if the (key, value) pair already exists and override if key exists
#20255 merged
Dec 12, 2024 -
[Mosaic] Pad trailing transposes chunks with zeros.
#20390 merged
Dec 12, 2024 -
Split ROCm-specific backend calls into their own targets.
#20337 merged
Dec 12, 2024 -
[xla-auto-sharding] Fix potential dangling pointer (reference) bug.
#20446 merged
Dec 12, 2024 -
[xla:cpu] Add missing files from openxla/xla#16438
#20447 merged
Dec 11, 2024 -
[XLA:GPU] Schedule send/recv early if pipeline parallelism ops enabled
#20403 merged
Dec 11, 2024 -
Internal CI/CD change
#20212 merged
Dec 11, 2024 -
[XLA] Avoid redundant lookup in ConsumeResource
#20434 merged
Dec 11, 2024 -
Remove the
test_hlo_pjrt_runner
tag.#20392 merged
Dec 11, 2024 -
Add B100 to default Nvidia gpu backends
#20407 merged
Dec 11, 2024 -
Migrate broadcast_test to always use PjRt for its test backend.
#20345 merged
Dec 11, 2024 -
Add a default error spec field to HloRunnerAgnosticTestBase.
#20397 merged
Dec 11, 2024 -
[XLA:CPU] Benchmark for grouped strided convolutions
#20425 merged
Dec 11, 2024 -
[XLA GPU] Add additional unit tests for
IsPtxRegisterAllocationError
.#20424 merged
Dec 11, 2024 -
Integrate LLVM at llvm/llvm-project@eacdbc269e5f
#20420 merged
Dec 11, 2024 -
[XLA:GPU] Implement NcclRaggedAllToAllThunk.
#20265 merged
Dec 11, 2024 -
Fix infinite loop in TopKSplitter
#20422 merged
Dec 11, 2024 -
PR #20214: Evaluate simple offset values, if possible
#20303 merged
Dec 11, 2024 -
PR #20313: Fix async wrapper to walk child computations
#20421 merged
Dec 11, 2024 -
#sdy Swap XLA Shardy passes to use StableHLO instead of MHLO as much as possible.
#19939 merged
Dec 11, 2024 -
Internal: add missing dependency on numpy
#20298 merged
Dec 11, 2024 -
[XLA:GPU] Use
absl::Status
payload to more precisely identify register allocation errors.#20396 merged
Dec 11, 2024 -
[XLA:CPU] Use KernelApiIrBuilder in IrEmitter2
#20380 merged
Dec 11, 2024 -
[XLA:CPU] Add new KernelApiIrBuilder
#20379 merged
Dec 11, 2024 -
PR #20334: [nfc] clang-format is failing on unrelated PRs because of this
#20419 merged
Dec 11, 2024 -
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#20347 merged
Dec 11, 2024 -
[XLA:GPU] Decrease
VLOG
levels to start logging at level2
insoftmax_rewriter_triton.cc
.#20415 merged
Dec 11, 2024 -
fix audit wheel compliance issues for pywrap rules
#20356 merged
Dec 11, 2024 -
Cleanup inconsistent names/comments
#20406 merged
Dec 11, 2024 -
Remove unused ErrorSpec.
#20400 merged
Dec 11, 2024 -
[XLA:GPU] Guard send/recv schedule manipulation behind xla_gpu_enable_pipelined_p2p flag
#20382 merged
Dec 10, 2024 -
Migrate all_reduce_test to always use PjRt for its test backend.
#20344 merged
Dec 10, 2024 -
Replace std::string_view with absl::string_view
#20349 merged
Dec 10, 2024 -
Respect DeviceAssignment in HloRunnerPjRt.
#20389 merged
Dec 10, 2024 -
Migrate gather_operation_test to always use PjRt for its test backend.
#20339 merged
Dec 10, 2024 -
[HLO->MHLO] Consolidate non-pipelined async ops into MHLO ops.
#20309 merged
Dec 10, 2024 -
Add implicit device step tracking.
#20261 merged
Dec 10, 2024 -
Migrate copy_test to always use PjRt for its test backend.
#20343 merged
Dec 10, 2024 -
[XLA:GPU] Remove
--xla_gpu_experimental_enable_triton_softmax_priority_fusion
.#20384 merged
Dec 10, 2024 -
Set implicitTrunc on APInt creation
#20387 merged
Dec 10, 2024 -
Integrate LLVM at llvm/llvm-project@0f7b3a9407d2
#20377 merged
Dec 10, 2024 -
Replace std::string_view with absl::string_view
#20322 merged
Dec 10, 2024 -
Add MHLO
mhlo.custom_call @ragged_all_to_all
-> HLO RaggedAllToAll pass#20354 merged
Dec 10, 2024 -
[XLA:CPU] Update ShapeToIrType & PrimitiveTypeToIrType to take a LLVMContext
#20381 merged
Dec 10, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op
#19734 merged
Dec 10, 2024 -
[xla:gpu] Extracted
CreateTritonPipeline
into a separate target#20367 merged
Dec 10, 2024 -
[xla:cpu] Add an object pool for efficient xnnpack object pooling
#20307 merged
Dec 10, 2024 -
Add
test_migrated_to_hlo_runner_pjrt
tag toxla_test
.#20338 merged
Dec 10, 2024 -
[xla-auto-sharding] Add BRKGA heuristic as an XLA auto-sharding option.
#20330 merged
Dec 10, 2024 -
Integrate LLVM at llvm/llvm-project@be2df95e9281
#20365 merged
Dec 10, 2024 -
Automated Code Change
#20364 merged
Dec 10, 2024 -
PR #16438: aarch64: implement onednn matmul operator with explicit reorders
#20369 merged
Dec 10, 2024 -
[XLA:GPU] Disable cutlass dynamic-update-slice rewrite on V100.
#20366 merged
Dec 10, 2024 -
PR #20025: [NVIDIA GPU] LHS enhancement for collective multi-streaming
#20213 merged
Dec 10, 2024 -
[XLA:GPU] Use absl instead of tensorflow functions/types.
#20311 merged
Dec 10, 2024 -
Add debug option for failing the PTX compilation on register spilling
#20358 merged
Dec 10, 2024 -
Automated Code Change
#20290 merged
Dec 10, 2024 -
Replace std::string_view with absl::string_view
#20353 merged
Dec 10, 2024 -
[XLA] Don't bail when encountering complex loop pipelining patterns
#20258 merged
Dec 10, 2024 -
Add an interpreter PjRt client registry for testing.
#20336 merged
Dec 10, 2024 -
Add a new test base class for a default PjRt test runner w/ SE interpreter.
#20320 merged
Dec 10, 2024 -
Add RegisterMlirToHloDependentDialects to register required dependent dialects
#20318 merged
Dec 10, 2024 -
Register WhileLoopAllReduceCodeMotion pass to the opt tool
#20342 merged
Dec 10, 2024 -
Create OpStatsToRooflineModel, in preparation of Roofline Model creation
#20280 merged
Dec 10, 2024 -
IFRT proxy: Add profiler spans to all entrypoints at the client.
#20325 merged
Dec 9, 2024 -
[XLA] Fix latency hiding scheduler when faced with annotated no-op instructions.
#20324 merged
Dec 9, 2024 -
Add CPU specific passes for hlo-opt tool.
#20154 merged
Dec 9, 2024 -
[Cleanup] Use HloPredicateIs(Not)Op
#20012 merged
Dec 9, 2024 -
Split up cusolver_context into CUDA-specific and ROCM-specific parts.
#20036 merged
Dec 9, 2024 -
[XLA:GPU] Remove unused
xla_experimental_exec_time_optimization_effort
flag.#20297 merged
Dec 9, 2024 -
Add support for CUDA 12.6.3 and CUDNN 9.5.1/9.6.0.
#20310 merged
Dec 9, 2024 -
[xla:gpu] Removed redundant parameter from
CompileTritonToLLVM
#20301 merged
Dec 9, 2024 -
[XLA:CPU] Add a Python extension for KernelRunner.
#20196 merged
Dec 9, 2024 -
[XLA:Python] Use nanobind::isinstance from upstream nanobind, delete xla::nb_isinstance.
#20250 merged
Dec 9, 2024 -
Reland of PR #19571. Fix test FunctionalHloRunnerTest.ShardedAutotuningWorks
#20197 merged
Dec 9, 2024 -
[XLA:GPU] Use
absl::Microseconds
instead of doing duration arithmetic.#20305 merged
Dec 9, 2024 -
Automated Code Change
#20194 merged
Dec 9, 2024 -
PR #18989: [AllGatherCSE] Add a pass that CSEs all-gathers on parameters.
#20300 merged
Dec 9, 2024 -
PR #20241: Updated Typo's in multiple documents
#20291 merged
Dec 9, 2024 -
[xla] Update warnings.bazelrc
#20299 merged
Dec 9, 2024 -
Reverts 26df9b97719020df83e882b31bbe4a7f2cbbdff5
#20293 merged
Dec 9, 2024 -
Automated Code Change
#20289 merged
Dec 9, 2024 -
Automated Code Change
#20286 merged
Dec 8, 2024
107 Pull requests opened by 10 people
-
Automated Code Change
#20281 opened
Dec 7, 2024 -
Automated Code Change
#20282 opened
Dec 7, 2024 -
Automated Code Change
#20287 opened
Dec 8, 2024 -
cuda_root_path: Find cuda libraries when installed with conda packages
#20288 opened
Dec 8, 2024 -
PR #18838: [NVIDIA GPU] Support multi-operand collective-permute
#20302 opened
Dec 9, 2024 -
[XLA:CPU] Fix crash due to OOM in XLA's custom convolution algorithm.
#20304 opened
Dec 9, 2024 -
Support dumping unoptimised hlo snapshots with argumnets in pjrt.
#20306 opened
Dec 9, 2024 -
[XLA:GPU] Simplify `AllocatorRetry::AllocateRaw` control flow.
#20308 opened
Dec 9, 2024 -
Added thread annotations to variable that claimed to not need them.
#20314 opened
Dec 9, 2024 -
Removed unused `GetCoordinationServiceInstance` code.
#20315 opened
Dec 9, 2024 -
Fix a breaking test
#20317 opened
Dec 9, 2024 -
Integrate LLVM at llvm/llvm-project@be2df95e9281
#20326 opened
Dec 9, 2024 -
[XLA] Handle empty leaf nodes in an original value
#20327 opened
Dec 9, 2024 -
Migrate gather_operation_test to always use PjRt for its test backend.
#20329 opened
Dec 9, 2024 -
Add runtime support for host calculation of offsets in ds fusion
#20332 opened
Dec 9, 2024 -
Copy result_accuracy when deriving new instruction.
#20333 opened
Dec 9, 2024 -
Fix missing template value
#20340 opened
Dec 9, 2024 -
Add Shape::FromProto static factory method to replace constructor.
#20341 opened
Dec 10, 2024 -
Automated Code Change
#20348 opened
Dec 10, 2024 -
Automated Code Change
#20350 opened
Dec 10, 2024 -
Replace std::string_view with absl::string_view
#20351 opened
Dec 10, 2024 -
Automated Code Change
#20352 opened
Dec 10, 2024 -
Automated Code Change
#20355 opened
Dec 10, 2024 -
Automated Code Change
#20357 opened
Dec 10, 2024 -
Automated Code Change
#20360 opened
Dec 10, 2024 -
Automated Code Change
#20363 opened
Dec 10, 2024 -
[XLA:GPU] Move calls to `allocation_attr.freed_by_func` into `BFCAllocator::AllocateRawInternal`.
#20368 opened
Dec 10, 2024 -
Automated Code Change
#20370 opened
Dec 10, 2024 -
Automated Code Change
#20371 opened
Dec 10, 2024 -
[XLA:GPU] Fix logging bug.
#20372 opened
Dec 10, 2024 -
[XLA:GPU] Re-run the host-offload-legalize pass after CSE
#20374 opened
Dec 10, 2024 -
Improve speed and collision/aliasing resistance of Absl::HashOf() on HloModule/HloComputation:
#20375 opened
Dec 10, 2024 -
Remove `//tensorflow/tsl/platform/strcat.h`.
#20386 opened
Dec 10, 2024 -
Add result accuracy attribute to ExpOp in StableHlo.
#20388 opened
Dec 10, 2024 -
PR #20313: Fix async wrapper to walk child computations
#20391 opened
Dec 10, 2024 -
Revert PR #20025: [NVIDIA GPU] LHS enhancement for collective multi-streaming
#20393 opened
Dec 10, 2024 -
[XLA:GPU] Add NVSHMEM library and initialization test
#20395 opened
Dec 10, 2024 -
Respect DeviceAssignment in HloRunnerPjRt.
#20398 opened
Dec 10, 2024 -
Elementwise Ops in Collective Pipeliner
#20399 opened
Dec 10, 2024 -
Add a HLOPrintOption to control printing of the parameter number for parameters.
#20402 opened
Dec 10, 2024 -
Add has_megacore and has_merged_vmem in XPlane stats.
#20404 opened
Dec 10, 2024 -
Automated Code Change
#20410 opened
Dec 11, 2024 -
Automated Code Change
#20412 opened
Dec 11, 2024 -
Automated Code Change
#20413 opened
Dec 11, 2024 -
Integrate LLVM at llvm/llvm-project@eacdbc269e5f
#20414 opened
Dec 11, 2024 -
Automated Code Change
#20423 opened
Dec 11, 2024 -
Layout assignment: Reset memory space in result layout
#20426 opened
Dec 11, 2024 -
[XLA:FFI] Fix C API
#20428 opened
Dec 11, 2024 -
[PJRT] Expose `ExecutionContext` when executing a `LoadedExecutable`
#20429 opened
Dec 11, 2024 -
Make ErrorSpec constructor `constexpr`.
#20435 opened
Dec 11, 2024 -
PR #19649: [ROCm] Implement hermetic rocm dependency
#20437 opened
Dec 11, 2024 -
Use int64_t for thread ids instead of int32_t
#20442 opened
Dec 11, 2024 -
Add action which automatically runs CI for public OpenXLA GitHub org members
#20444 opened
Dec 11, 2024 -
Add nozapfhahn for opt provider files, to ignore expected mutant testing warnings during submissions
#20451 opened
Dec 11, 2024 -
Reverts 555ba9967696cdea8c768ac5b44e59a3582a9820
#20453 opened
Dec 12, 2024 -
PR #77927: [oneDNN] upgrading oneDNN version to 3.6
#20456 opened
Dec 12, 2024 -
Automated Code Change
#20461 opened
Dec 12, 2024 -
Automated Code Change
#20462 opened
Dec 12, 2024 -
[XLA:CPU] Move kernel prototype properties inside of KernelApiIrBuilder
#20467 opened
Dec 12, 2024 -
[XLA:GPU] Rely on LLVM parser rather than objcopy to load fatbin in tests
#20474 opened
Dec 12, 2024 -
[XLA::CPU] Update `ElementalKernelEmitter` to take HLO instruction instead of shapes.
#20476 opened
Dec 12, 2024 -
[XLA:GPU] Force cuDNN convolutions to be assigned a `NHWC` layout from Hopper.
#20477 opened
Dec 12, 2024 -
[XLA:CPU] Improve F8E4M3 accuracy
#20478 opened
Dec 12, 2024 -
Integrate LLVM at llvm/llvm-project@03cbe42627c7
#20480 opened
Dec 12, 2024 -
Migrate StableHLO Python extension to nanobind.
#20481 opened
Dec 12, 2024 -
Remove CHECK macros from tsl/platform/default/logging.h
#20483 opened
Dec 12, 2024 -
Add missing default for
#20485 opened
Dec 12, 2024 -
Use CUPTI activity markers instead of nvtx driver callbacks for NVTX tracking.
#20488 opened
Dec 12, 2024 -
Add more safety checks to BlockingCounter
#20489 opened
Dec 12, 2024 -
PR #20086: [NVIDIA GPU] Fix mem p2p init in collective permute thunk
#20490 opened
Dec 12, 2024 -
Support host->device dynamic-update-slice in HostOffloader.
#20492 opened
Dec 13, 2024 -
Update slop_factor flag desc in debug_options_flags.cc
#20494 opened
Dec 13, 2024 -
Make max value in Range optional to allow for Unbounded Range calculations.
#20495 opened
Dec 13, 2024 -
Make max value in Range optional to allow for Unbounded Range calculations.
#20496 opened
Dec 13, 2024 -
Integrate LLVM at llvm/llvm-project@5e53a8dadb00
#20500 opened
Dec 13, 2024 -
Replace some usage of tsl::BlockingCounter with absl::BlockingCounter.
#20501 opened
Dec 13, 2024 -
Update the forward propagation for slice instructions in xla::ShardingPropagation.
#20502 opened
Dec 13, 2024 -
fixed the typos in bfloat16_propagation_test.cc
#20503 opened
Dec 13, 2024 -
[XLA:CPU] Test elemental comparison ops
#20509 opened
Dec 13, 2024 -
[XLA:CPU] Export codegen testlib functionality via __init__.py
#20510 opened
Dec 13, 2024 -
PR #19649: [ROCm] Implement hermetic rocm dependency
#20512 opened
Dec 13, 2024 -
[XLA:CPU] Move CPU ElementalIrEmitter implementation into a separate file
#20514 opened
Dec 13, 2024 -
[XLA:CPU] Add ability to dump the LLvmIrSource to a string in the testlib
#20515 opened
Dec 13, 2024 -
Simplify `WaitAndLogIfStuck()` to use `absl::Barrier` instead of `tsl::BlockingCounter`.
#20516 opened
Dec 13, 2024 -
#sdy make mhlo_export_pipeline test use no rank maximal sharding.
#20518 opened
Dec 13, 2024 -
#sdy add constraints to maximal shardings during verification.
#20519 opened
Dec 13, 2024 -
[XLA:GPU] Synchronize compute and communication streams in NCCL group implementation
#20523 opened
Dec 13, 2024 -
Move some ShapeUtil validation to cpp. Create generic ShapeError.
#20524 opened
Dec 13, 2024 -
Remove CHECKs for inline vectors. Returning 0 is functionally correct.
#20525 opened
Dec 13, 2024 -
Temporary check to avoid cases where wrong result sharding causes a RET_CHECK to fail in hlo_sharding.cc.
#20526 opened
Dec 13, 2024 -
[XLA:GPU] update collective pipeline parallelism execution test to include nccl groups in a while loop
#20527 opened
Dec 13, 2024 -
[XLA:GPU] update microbenchmarks to include nccl groups in a while loop
#20528 opened
Dec 13, 2024 -
Integrate LLVM at llvm/llvm-project@5f72f2c8fd6c
#20531 opened
Dec 13, 2024 -
Make collective selet folder convert-aware
#20532 opened
Dec 13, 2024 -
[XLA] Add S1 and U1 as data types
#20533 opened
Dec 13, 2024 -
Remove CUDA dependencies from jaxlib wheel.
#20534 opened
Dec 13, 2024 -
Reverts a8a0a7ec72ed3960b33a061c12e43d7e5beb9087
#20535 opened
Dec 13, 2024 -
[Shardy] HLO ⇄ MHLO to HLO ⇄ StableHLO
#20536 opened
Dec 13, 2024 -
Remove legacy trivial main programs
#20537 opened
Dec 13, 2024 -
[xla:gpu] Add an option to use persistent collective cliques
#20539 opened
Dec 14, 2024 -
Optimize XLA SPMD Slice partitioner, containing the following 3 steps.
#20540 opened
Dec 14, 2024 -
[xla:cpu] Implement 2D and 3D loop parallelization
#20541 opened
Dec 14, 2024 -
[xla:cpu] Add an xnn_threadpool for wrapping ParallelLoopRunner as pthreadpool API
#20542 opened
Dec 14, 2024
3 Issues closed by 3 people
-
[BUG] Returning a matrix crashes the HLO runner
#9307 closed
Dec 11, 2024 -
Does Tensorflow XLA use MLIR?
#20125 closed
Dec 10, 2024 -
Profiling crashes when run using JAX
#4431 closed
Dec 9, 2024
4 Issues opened by 4 people
-
Make `cache_all_gather` configurable
#20508 opened
Dec 13, 2024 -
Copies supersede OptimizationBarrier
#20440 opened
Dec 11, 2024 -
Activation offloading with scan fails in XLA with same shape pinned-host/device scan outputs
#20373 opened
Dec 10, 2024 -
Darwin CPU "LLVM ERROR: inconsistency in registered CommandLine options"
#20284 opened
Dec 8, 2024
33 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[ROCm] Implement hermetic rocm dependency
#19649 commented on
Dec 11, 2024 • 9 new comments -
Add ExplicitStreamAnnotationAsyncWrapper pass
#19462 commented on
Dec 11, 2024 • 8 new comments -
[ROCm] Fix tests breaking due to change in threads_per_warp
#20164 commented on
Dec 12, 2024 • 1 new comment -
Explicit stream annotation: Set ExecutionStreamId based on frontend attribute
#19699 commented on
Dec 11, 2024 • 1 new comment -
[NVIDIA GPU] Fix mem p2p init in collective permute thunk
#20086 commented on
Dec 12, 2024 • 0 new comments -
PR #20006: [XLA:GPU] Only allow horizontal loop fusion for default memory space
#20121 commented on
Dec 11, 2024 • 0 new comments -
Make Tensorflow runtime handle errors returned by CudaExecutor::CreateDeviceDescription
#20131 commented on
Dec 10, 2024 • 0 new comments -
Add support for emitting a kCall from an async instruction
#20155 commented on
Dec 12, 2024 • 0 new comments -
[XLA:GPU:Rocm] Fix wavefront size on gfx10+
#20170 commented on
Dec 7, 2024 • 0 new comments -
Introduce shape splitting into MSA.
#20182 commented on
Dec 11, 2024 • 0 new comments -
Create shim targets for most commonly used TSL headers in preparation for updating users
#20183 commented on
Dec 13, 2024 • 0 new comments -
[hlo-opt] move the tool to hlo/tools/ directory
#20242 commented on
Dec 11, 2024 • 0 new comments -
Automated Code Change
#20243 commented on
Dec 7, 2024 • 0 new comments -
[XLA:CPU] Expose better parallelism control
#20256 commented on
Dec 10, 2024 • 0 new comments -
In progress experimentation for supporting JAX Arrays with variable-width strings (i.e., with dtype = StringDType).
#20269 commented on
Dec 14, 2024 • 0 new comments -
[ROCm] Emit allocas on function entry in lower_tensors.cc
#20274 commented on
Dec 9, 2024 • 0 new comments -
Add Shape::FromProto static factory method and replace usage.
#20278 commented on
Dec 10, 2024 • 0 new comments -
change usage from constructor to factory method
#20279 commented on
Dec 10, 2024 • 0 new comments -
#sdy support StableHLO from refining Shardy ops with polymorphic shapes
#20081 commented on
Dec 9, 2024 • 0 new comments -
PR #19161: Asymmetrically Replicated Instructions in Replication Analysis
#20042 commented on
Dec 10, 2024 • 0 new comments -
[AllGatherCodeMotion] Add a pass that can code motion all-gathers in while loops.
#20023 commented on
Dec 10, 2024 • 0 new comments -
CHLO defns for a ragged dot that permits ragged batch and contraction.
#19996 commented on
Dec 13, 2024 • 0 new comments -
[XLA:GPU] Clean-up DeviceDescription
#19958 commented on
Dec 11, 2024 • 0 new comments -
[XLA:CPU] Extend the custom algorithm for transposed convolutions
#19921 commented on
Dec 12, 2024 • 0 new comments -
Add support for async dynamic slice fusion
#19834 commented on
Dec 11, 2024 • 0 new comments -
[XLA:Collective] Support normalizing all-reduce
#19819 commented on
Dec 13, 2024 • 0 new comments -
Revert `edf18ce` and fix launch dimension triplet
#19582 commented on
Dec 10, 2024 • 0 new comments -
Split out MemorySpaceAssignmentTest class for re-use.
#19497 commented on
Dec 14, 2024 • 0 new comments -
Create repository rule to generate a file with python wheel version.
#19308 commented on
Dec 10, 2024 • 0 new comments -
[RematerializeLargeAllGather] Add pass to rematerialize large tensor parallel all-gathers. Allow configurable bytes.
#19163 commented on
Dec 9, 2024 • 0 new comments -
PR #18331: [XLA:CPU] upgrading onednn version to 3.6
#19102 commented on
Dec 12, 2024 • 0 new comments -
Add F4E2M1FN and F8E8M0FNU types
#19096 commented on
Dec 12, 2024 • 0 new comments -
[StableHLO] Remove XlaCallModule's MHLO dependency
#19079 commented on
Dec 13, 2024 • 0 new comments