Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 6, 2025

Add CUDA Backend Support ✅

This PR implements CUDA backend support for Iris, enabling the framework to run seamlessly on both AMD GPUs (via HIP) and NVIDIA GPUs (via CUDA) with transparent backend auto-detection.

Summary

Iris now supports both AMD GPUs (HIP backend) and NVIDIA GPUs (CUDA backend) with automatic backend detection based on available GPU libraries. All backend logic is consolidated into a single iris/hip.py file with conditional branching.

Changes Made

  1. Unified Backend Module (iris/hip.py):

    • Auto-detects backend at module load time by trying to load libamdhip64.so or libcudart.so
    • Sets _is_amd_backend flag for internal use
    • All functions branch internally based on backend type
    • Dynamic IPC handle size (64 bytes for HIP, 128 bytes for CUDA)
    • Exports get_ipc_handle_size() for use by iris.py
  2. Dynamic Sizing (iris/iris.py):

    • Added get_ipc_handle_size() import
    • Uses dynamic IPC handle size instead of hardcoded 64
    • Ensures proper IPC handle allocation for each backend
  3. Key Features:

    • Automatic backend detection based on available GPU libraries
    • Minimal code changes - original hip.py structure preserved with conditional branching
    • Consolidated implementation - all backend logic in single file
    • Zero configuration required - works out of the box
    • Falls back to HIP by default for backward compatibility
  4. Code Quality:

    • Minimal changes to existing codebase
    • No separate backend files - everything consolidated in iris/hip.py
    • Clean conditional logic throughout
    • No build-time configuration or environment variables

Usage

# Just install normally - backend auto-detected
pip install git+https://github.com/ROCm/iris.git

# Works automatically on both AMD and NVIDIA GPUs
import iris
ctx = iris.iris(heap_size=1 << 30)

Implementation Details

Backend Detection (in iris/hip.py):

  • Tries to load libamdhip64.so → sets _is_amd_backend = True
  • Falls back to libcudart.so → sets _is_amd_backend = False
  • Defaults to HIP for backward compatibility if both fail

Conditional Functions:

  • hip_try() - branches to hipGetErrorString or cudaGetErrorString
  • hipIpcMemHandle_t - 64 bytes for HIP, 128 bytes for CUDA
  • open_ipc_handle() - calls hipIpcOpenMemHandle or cudaIpcOpenMemHandle
  • All device/memory functions branch based on _is_amd_backend

Backend-Specific Behavior:

  • get_rocm_version() - returns (-1, -1) for CUDA
  • get_arch_string() - returns GCN arch for HIP, compute capability for CUDA (e.g., "sm_90")
  • get_num_xcc() - returns actual count for AMD, 1 for NVIDIA

Code Statistics

  • Modified files: iris/hip.py, iris/iris.py
  • Total additions: ~40 lines (conditional logic)
  • Files changed: 2
  • Commits: 17

Backend Detection Logic

  1. Try to load libamdhip64.so → use HIP backend
  2. Try to load libcudart.so → use CUDA backend
  3. Default to HIP for backward compatibility

Supported Hardware

AMD GPUs (HIP Backend):

  • MI300X, MI350X, MI355X
  • Requirements: ROCm 6.3.1+, PyTorch with ROCm

NVIDIA GPUs (CUDA Backend):

  • All CUDA-capable GPUs with compute capability 7.0+
  • Requirements: CUDA 11.0+, PyTorch with CUDA

Notes

  • Minimal changes per review feedback - consolidated into single file with conditional branching
  • Pure auto-detection with no configuration needed
  • Original hip.py structure preserved - only added conditional logic
  • Zero breaking changes - fully backward compatible
  • All changes follow existing Iris code style and patterns

References

Original prompt

This section details on the original issue you should resolve

<issue_title>Add CUDA backend</issue_title>
<issue_description>#### Summary
Allow Iris to be installed with --config-settings backend=nvidia so that iris.hip.* imports transparently redirect to CUDA equivalents (iris.cuda.*). This enables NVIDIA backend support without changing source code.

Goal

At install or runtime, users should be able to do:

pip install git+https://github.com/ROCm/iris.git --config-settings backend=nvidia

and have:

from iris.hip.runtime import hip_malloc

automatically resolve to:

from iris.cuda.runtime import hip_malloc

Cherry pick commits from https://github.com/astroC86/iris/tree/astroC86/cuda-port and #118</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #199

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 5 commits October 6, 2025 00:47
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Add CUDA backend Add CUDA backend support for NVIDIA GPUs with transparent runtime switching Oct 6, 2025
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 00:59
Copilot finished work on behalf of mawad-amd October 6, 2025 00:59
- Add custom build backend (build_backend.py) to support --config-settings
- Update pyproject.toml to use custom build backend
- Modify iris/hip.py to prioritize build-time config over env vars
- Update README with pip install --config-settings backend=nvidia examples
- Add 3 new unit tests for build-time configuration (11 tests total pass)
- Update example script to document both build-time and runtime methods
- Add iris/.config/ to .gitignore

Backend detection priority is now:
1. Build-time configuration (--config-settings backend=nvidia/hip)
2. IRIS_BACKEND environment variable (runtime)
3. Auto-detection based on available libraries
4. Default to HIP for backward compatibility

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI changed the title Add CUDA backend support for NVIDIA GPUs with transparent runtime switching Add CUDA backend support for NVIDIA GPUs with build-time and runtime switching Oct 6, 2025
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 01:09
Copilot finished work on behalf of mawad-amd October 6, 2025 01:09
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 02:44
Copilot finished work on behalf of mawad-amd October 6, 2025 02:44
- Add build() hook in setup.py to handle --config-settings
- Sets IRIS_BACKEND environment variable during build
- Normalizes backend names (nvidia→cuda, amd/rocm→hip)
- Updated build_backend.py to also set env var for consistency
- All 11 unit tests pass

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 02:51
- Restore README.md to state before PR changes
- Keep only code changes without documentation updates
- All linting checks pass

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copilot finished work on behalf of mawad-amd October 6, 2025 04:24
- Remove build() function from setup.py (back to original)
- Simplify iris/hip.py to use only auto-detection (no build-time config)
- Remove _backend_selected.py from .gitignore (not used)
- Backend selection now purely based on library availability
- Minimal changes - keeps only essential backend auto-detection

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI changed the title Add CUDA backend support for NVIDIA GPUs with build-time switching Add CUDA backend support for NVIDIA GPUs with automatic detection Oct 6, 2025
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 05:22
Copilot finished work on behalf of mawad-amd October 6, 2025 05:22
- Remove separate _hip.py and _cuda.py files
- Add backend auto-detection at module load time (_is_amd_backend flag)
- All functions now branch internally based on _is_amd_backend
- Minimal changes - keeps original hip.py structure
- Dynamic IPC handle size (64 for HIP, 128 for CUDA)
- Zero configuration required - works out of the box

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 05:40
Copilot finished work on behalf of mawad-amd October 6, 2025 05:40
@mawad-amd mawad-amd marked this pull request as ready for review October 8, 2025 20:26
@mawad-amd mawad-amd requested review from BKP and neoblizz as code owners October 8, 2025 20:26
@Copilot Copilot AI review requested due to automatic review settings October 8, 2025 20:26
@mawad-amd mawad-amd merged commit 6ba9c79 into main Oct 8, 2025
12 checks passed
@mawad-amd mawad-amd deleted the copilot/fix-4d83afe6-045c-4573-a6ec-6f6dd80f6f0e branch October 8, 2025 21:17
@mawad-amd mawad-amd mentioned this pull request Oct 8, 2025
1 task
mawad-amd referenced this pull request in tom-pollak/iris Oct 9, 2025
tom-pollak added a commit to tom-pollak/iris that referenced this pull request Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CUDA backend

2 participants