push sheeet

2025-10-09 14:15:47 +02:00
commit 646b892680
49168 changed files with 5897842 additions and 0 deletions
--- a/pkgs/development/cuda-modules/README.md
+++ b/pkgs/development/cuda-modules/README.md
@@ -0,0 +1,90 @@
+# CUDA Modules
+
+> [!NOTE]
+> This document is meant to help CUDA maintainers understand the structure of
+> the CUDA packages in Nixpkgs. It is not meant to be a user-facing document.
+> For a user-facing document, see [the CUDA section of the manual](../../../doc/languages-frameworks/cuda.section.md).
+
+The files in this directory are added (in some way) to the `cudaPackages`
+package set by [cuda-packages.nix](../../top-level/cuda-packages.nix).
+
+## Top-level directories
+
+- `cuda`: CUDA redistributables! Provides extension to `cudaPackages` scope.
+- `cudatoolkit`: monolithic CUDA Toolkit run-file installer. Provides extension
+    to `cudaPackages` scope.
+- `cudnn`: NVIDIA cuDNN library.
+- `cutensor`: NVIDIA cuTENSOR library.
+- `fixups`: Each file or directory (excluding `default.nix`) should contain a
+    `callPackage`-able expression to be provided to the `overrideAttrs` attribute
+    of a package produced by the generic manifest builder.
+    These fixups are applied by `pname`, so packages with multiple versions
+    (e.g., `cudnn`, `cudnn_8_9`, etc.) all share a single fixup function
+    (i.e., `fixups/cudnn.nix`).
+- `generic-builders`:
+  - Contains a builder `manifest.nix` which operates on the `Manifest` type
+      defined in `modules/generic/manifests`. Most packages are built using this
+      builder.
+  - Contains a builder `multiplex.nix` which leverages the Manifest builder. In
+      short, the Multiplex builder adds multiple versions of a single package to
+      single instance of the CUDA Packages package set. It is used primarily for
+      packages like `cudnn` and `cutensor`.
+- `modules`: Nixpkgs modules to check the shape and content of CUDA
+    redistributable and feature manifests. These modules additionally use shims
+    provided by some CUDA packages to allow them to re-use the
+    `genericManifestBuilder`, even if they don't have manifest files of their
+    own. `cudnn` and `tensorrt` are examples of packages which provide such
+    shims. These modules are further described in the
+    [Modules](./modules/README.md) documentation.
+- `packages`: Contains packages which exist in every instance of the CUDA
+    package set. These packages are built in a `by-name` fashion.
+- `setup-hooks`: Nixpkgs setup hooks for CUDA.
+- `tensorrt`: NVIDIA TensorRT library.
+
+## Distinguished packages
+
+### CUDA Compatibility
+
+[CUDA Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/),
+available as `cudaPackages.cuda_compat`, is a component which makes it possible
+to run applications built against a newer CUDA toolkit (for example CUDA 12) on
+a machine with an older CUDA driver (for example CUDA 11), which isn't possible
+out of the box. At the time of writing, CUDA Compatibility is only available on
+the Nvidia Jetson architecture, but Nvidia might release support for more
+architectures in the future.
+
+As CUDA Compatibility strictly increases the range of supported applications, we
+try our best to enable it by default on supported platforms.
+
+#### Functioning
+
+`cuda_compat` simply provides a new `libcuda.so` (and associated variants) that
+needs to be used in place of the default CUDA driver's `libcuda.so`. However,
+the other shared libraries of the default driver must still be accessible:
+`cuda_compat` isn't a complete drop-in replacement for the driver (and that's
+the point, otherwise, it would just be a newer driver).
+
+Nvidia's recommendation is to set `LD_LIBRARY_PATH` to point to `cuda_compat`'s
+driver. This is fine for a manual, one-shot usage, but in general setting
+`LD_LIBRARY_PATH` is a red flag. This is global state which short-circuits most
+of other dynamic library resolution mechanisms and can break things in
+non-obvious ways, especially with other Nix-built software.
+
+#### CUDA Compat with Nix
+
+Since `cuda_compat` is a known derivation, the easy way to do this in Nix would
+be to add `cuda_compat` as a dependency of CUDA libraries and applications and
+let Nix do its magic by filling the `DT_RUNPATH` fields. However,
+`cuda_compat` itself depends on `libnvrm_mem` and `libnvrm_gpu` which are loaded
+dynamically at runtime from `/run/opengl-driver`. This doesn't please the Nix
+sandbox when building, which can't find those (a second minor issue is that
+`addOpenGLRunpathHook` prepends the `/run/opengl-driver` path, so that would
+still take precedence).
+
+The current solution is to do something similar to `addOpenGLRunpathHook`: the
+`addCudaCompatRunpathHook` prepends to the path to `cuda_compat`'s `libcuda.so`
+to the `DT_RUNPATH` of whichever package includes the hook as a dependency, and
+we include the hook by default for packages in `cudaPackages` (by adding it as a
+inputs in `genericManifestBuilder`). We also make sure it's included after
+`addOpenGLRunpathHook`, so that it appears _before_ in the `DT_RUNPATH` and
+takes precedence.
--- a/pkgs/development/cuda-modules/_cuda/db/bootstrap/cuda.nix
+++ b/pkgs/development/cuda-modules/_cuda/db/bootstrap/cuda.nix
@@ -0,0 +1,273 @@
+{ lib }:
+{
+
+  /**
+    Attribute set of supported CUDA capability mapped to information about that capability.
+
+    NOTE: For more on baseline, architecture-specific, and family-specific feature sets, see
+    https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features.
+
+    NOTE: For information on when support for a given architecture was added, see
+    https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes
+
+    NOTE: For baseline feature sets, `dontDefaultAfterCudaMajorMinorVersion` is generally set to the CUDA release
+    immediately prior to TensorRT removing support for that architecture.
+
+    Many thanks to Arnon Shimoni for maintaining a list of these architectures and capabilities.
+    Without your work, this would have been much more difficult.
+    https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
+
+    # Type
+
+    ```
+    cudaCapabilityToInfo ::
+      AttrSet
+        CudaCapability
+        { archName :: String
+        , cudaCapability :: CudaCapability
+        , isJetson :: Bool
+        , isArchitectureSpecific :: Bool
+        , isFamilySpecific :: Bool
+        , minCudaMajorMinorVersion :: MajorMinorVersion
+        , maxCudaMajorMinorVersion :: MajorMinorVersion
+        , dontDefaultAfterCudaMajorMinorVersion :: Null | MajorMinorVersion
+        }
+    ```
+
+    `archName`
+
+    : The name of the microarchitecture
+
+    `cudaCapability`
+
+    : The CUDA capability
+
+    `isJetson`
+
+    : Whether this capability is part of NVIDIA's line of Jetson embedded computers. This field is notable
+      because it tells us what architecture to build for (as Jetson devices are aarch64).
+      More on Jetson devices here: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/
+      NOTE: These architectures are only built upon request.
+
+    `isArchitectureSpecific`
+
+    : Whether this capability is an architecture-specific feature set.
+      NOTE: These architectures are only built upon request.
+
+    `isFamilySpecific`
+
+    : Whether this capability is a family-specific feature set.
+      NOTE: These architectures are only built upon request.
+
+    `minCudaMajorMinorVersion`
+
+    : The minimum (inclusive) CUDA version that supports this capability.
+
+    `maxCudaMajorMinorVersion`
+
+    : The maximum (exclusive) CUDA version that supports this capability.
+      `null` means there is no maximum.
+
+    `dontDefaultAfterCudaMajorMinorVersion`
+
+    : The CUDA version after which to exclude this capability from the list of default capabilities we build.
+  */
+  cudaCapabilityToInfo =
+    lib.mapAttrs
+      (
+        cudaCapability:
+        # Supplies default values.
+        {
+          archName,
+          isJetson ? false,
+          isArchitectureSpecific ? (lib.hasSuffix "a" cudaCapability),
+          isFamilySpecific ? (lib.hasSuffix "f" cudaCapability),
+          minCudaMajorMinorVersion,
+          maxCudaMajorMinorVersion ? null,
+          dontDefaultAfterCudaMajorMinorVersion ? null,
+        }:
+        {
+          inherit
+            archName
+            cudaCapability
+            isJetson
+            isArchitectureSpecific
+            isFamilySpecific
+            minCudaMajorMinorVersion
+            maxCudaMajorMinorVersion
+            dontDefaultAfterCudaMajorMinorVersion
+            ;
+        }
+      )
+      {
+        # Tesla/Quadro M series
+        "5.0" = {
+          archName = "Maxwell";
+          minCudaMajorMinorVersion = "10.0";
+          dontDefaultAfterCudaMajorMinorVersion = "11.0";
+        };
+
+        # Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X
+        "5.2" = {
+          archName = "Maxwell";
+          minCudaMajorMinorVersion = "10.0";
+          dontDefaultAfterCudaMajorMinorVersion = "11.0";
+        };
+
+        # Quadro GP100, Tesla P100, DGX-1 (Generic Pascal)
+        "6.0" = {
+          archName = "Pascal";
+          minCudaMajorMinorVersion = "10.0";
+          # Removed from TensorRT 10.0, which corresponds to CUDA 12.4 release.
+          # https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1001/support-matrix/index.html
+          dontDefaultAfterCudaMajorMinorVersion = "12.3";
+        };
+
+        # GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030 (GP108), GT 1010 (GP108) Titan Xp, Tesla
+        # P40, Tesla P4, Discrete GPU on the NVIDIA Drive PX2
+        "6.1" = {
+          archName = "Pascal";
+          minCudaMajorMinorVersion = "10.0";
+          # Removed from TensorRT 10.0, which corresponds to CUDA 12.4 release.
+          # https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1001/support-matrix/index.html
+          dontDefaultAfterCudaMajorMinorVersion = "12.3";
+        };
+
+        # DGX-1 with Volta, Tesla V100, GTX 1180 (GV104), Titan V, Quadro GV100
+        "7.0" = {
+          archName = "Volta";
+          minCudaMajorMinorVersion = "10.0";
+          # Removed from TensorRT 10.5, which corresponds to CUDA 12.6 release.
+          # https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1050/support-matrix/index.html
+          dontDefaultAfterCudaMajorMinorVersion = "12.5";
+        };
+
+        # GTX/RTX Turing – GTX 1660 Ti, RTX 2060, RTX 2070, RTX 2080, Titan RTX, Quadro RTX 4000,
+        # Quadro RTX 5000, Quadro RTX 6000, Quadro RTX 8000, Quadro T1000/T2000, Tesla T4
+        "7.5" = {
+          archName = "Turing";
+          minCudaMajorMinorVersion = "10.0";
+        };
+
+        # NVIDIA A100 (the name “Tesla” has been dropped – GA100), NVIDIA DGX-A100
+        "8.0" = {
+          archName = "Ampere";
+          minCudaMajorMinorVersion = "11.2";
+        };
+
+        # Tesla GA10x cards, RTX Ampere – RTX 3080, GA102 – RTX 3090, RTX A2000, A3000, RTX A4000,
+        # A5000, A6000, NVIDIA A40, GA106 – RTX 3060, GA104 – RTX 3070, GA107 – RTX 3050, RTX A10, RTX
+        # A16, RTX A40, A2 Tensor Core GPU
+        "8.6" = {
+          archName = "Ampere";
+          minCudaMajorMinorVersion = "11.2";
+        };
+
+        # Jetson AGX Orin and Drive AGX Orin only
+        "8.7" = {
+          archName = "Ampere";
+          minCudaMajorMinorVersion = "11.5";
+          isJetson = true;
+        };
+
+        # NVIDIA GeForce RTX 4090, RTX 4080, RTX 6000, Tesla L40
+        "8.9" = {
+          archName = "Ada";
+          minCudaMajorMinorVersion = "11.8";
+        };
+
+        # NVIDIA H100 (GH100)
+        "9.0" = {
+          archName = "Hopper";
+          minCudaMajorMinorVersion = "11.8";
+        };
+
+        "9.0a" = {
+          archName = "Hopper";
+          minCudaMajorMinorVersion = "12.0";
+        };
+
+        # NVIDIA B100
+        "10.0" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.7";
+        };
+
+        "10.0a" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.7";
+        };
+
+        "10.0f" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.9";
+        };
+
+        # NVIDIA Jetson Thor Blackwell
+        "10.1" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.7";
+          isJetson = true;
+        };
+
+        "10.1a" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.7";
+          isJetson = true;
+        };
+
+        "10.1f" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.9";
+          isJetson = true;
+        };
+
+        # NVIDIA ???
+        "10.3" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.9";
+        };
+
+        "10.3a" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.9";
+        };
+
+        "10.3f" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.9";
+        };
+
+        # NVIDIA GeForce RTX 5090 (GB202) etc.
+        "12.0" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.8";
+        };
+
+        "12.0a" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.8";
+        };
+
+        "12.0f" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.9";
+        };
+
+        # NVIDIA ???
+        "12.1" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.9";
+        };
+
+        "12.1a" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.9";
+        };
+
+        "12.1f" = {
+          archName = "Blackwell";
+          minCudaMajorMinorVersion = "12.9";
+        };
+      };
+}
--- a/pkgs/development/cuda-modules/_cuda/db/bootstrap/default.nix
+++ b/pkgs/development/cuda-modules/_cuda/db/bootstrap/default.nix
@@ -0,0 +1,30 @@
+{ lib }:
+{
+  # See ./cuda.nix for documentation.
+  inherit (import ./cuda.nix { inherit lib; })
+    cudaCapabilityToInfo
+    ;
+
+  # See ./nvcc.nix for documentation.
+  inherit (import ./nvcc.nix)
+    nvccCompatibilities
+    ;
+
+  # See ./redist.nix for documentation.
+  inherit (import ./redist.nix)
+    redistNames
+    redistSystems
+    redistUrlPrefix
+    ;
+
+  /**
+    The path to the CUDA packages root directory, for use with `callPackage` to create new package sets.
+
+    # Type
+
+    ```
+    cudaPackagesPath :: Path
+    ```
+  */
+  cudaPackagesPath = ./../../..;
+}
--- a/pkgs/development/cuda-modules/_cuda/db/bootstrap/nvcc.nix
+++ b/pkgs/development/cuda-modules/_cuda/db/bootstrap/nvcc.nix
@@ -0,0 +1,70 @@
+{
+  /**
+      Mapping of CUDA versions to NVCC compatibilities
+
+      Taken from
+      https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#host-compiler-support-policy
+
+        NVCC performs a version check on the host compiler's major version and so newer minor versions
+        of the compilers listed below will be supported, but major versions falling outside the range
+        will not be supported.
+
+      NOTE: These constraints don't apply to Jetson, which uses something else.
+
+      NOTE: NVIDIA can and will add support for newer compilers even during patch releases.
+      E.g.: CUDA 12.2.1 maxxed out with support for Clang 15.0; 12.2.2 added support for Clang 16.0.
+
+      NOTE: Because all platforms NVIDIA supports use GCC and Clang, we omit the architectures here.
+
+      # Type
+
+      ```
+      nvccCompatibilities ::
+        AttrSet
+          String
+          { clang :: { maxMajorVersion :: String, minMajorVersion :: String }
+          , gcc :: { maxMajorVersion :: String, minMajorVersion :: String }
+          }
+      ```
+  */
+  nvccCompatibilities = {
+    # Our baseline
+    # https://docs.nvidia.com/cuda/archive/12.6.0/cuda-installation-guide-linux/index.html#host-compiler-support-policy
+    "12.6" = {
+      clang = {
+        maxMajorVersion = "18";
+        minMajorVersion = "7";
+      };
+      gcc = {
+        maxMajorVersion = "13";
+        minMajorVersion = "6";
+      };
+    };
+
+    # Maximum Clang version is 19, maximum GCC version is 14
+    # https://docs.nvidia.com/cuda/archive/12.8.1/cuda-installation-guide-linux/index.html#host-compiler-support-policy
+    "12.8" = {
+      clang = {
+        maxMajorVersion = "19";
+        minMajorVersion = "7";
+      };
+      gcc = {
+        maxMajorVersion = "14";
+        minMajorVersion = "6";
+      };
+    };
+
+    # No changes from 12.8 to 12.9
+    # https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#host-compiler-support-policy
+    "12.9" = {
+      clang = {
+        maxMajorVersion = "19";
+        minMajorVersion = "7";
+      };
+      gcc = {
+        maxMajorVersion = "14";
+        minMajorVersion = "6";
+      };
+    };
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/db/bootstrap/redist.nix
+++ b/pkgs/development/cuda-modules/_cuda/db/bootstrap/redist.nix
@@ -0,0 +1,56 @@
+{
+  /**
+    A list of redistributable names to use in creation of the `redistName` option type.
+
+    # Type
+
+    ```
+    redistNames :: [String]
+    ```
+  */
+  redistNames = [
+    "cublasmp"
+    "cuda"
+    "cudnn"
+    "cudss"
+    "cuquantum"
+    "cusolvermp"
+    "cusparselt"
+    "cutensor"
+    "nppplus"
+    "nvcomp"
+    # "nvidia-driver",  # NOTE: Some of the earlier manifests don't follow our scheme.
+    "nvjpeg2000"
+    "nvpl"
+    "nvtiff"
+    "tensorrt" # NOTE: not truly a redist; uses different naming convention
+  ];
+
+  /**
+    A list of redistributable systems to use in creation of the `redistSystem` option type.
+
+    # Type
+
+    ```
+    redistSystems :: [String]
+    ```
+  */
+  redistSystems = [
+    "linux-aarch64"
+    "linux-all" # Taken to mean all other linux systems
+    "linux-sbsa"
+    "linux-x86_64"
+    "source" # Source-agnostic platform
+  ];
+
+  /**
+    The prefix of the URL for redistributable files.
+
+    # Type
+
+    ```
+    redistUrlPrefix :: String
+    ```
+  */
+  redistUrlPrefix = "https://developer.download.nvidia.com/compute";
+}
--- a/pkgs/development/cuda-modules/_cuda/db/default.nix
+++ b/pkgs/development/cuda-modules/_cuda/db/default.nix
@@ -0,0 +1,65 @@
+{
+  lib,
+  bootstrapData,
+  db,
+}:
+
+bootstrapData
+// {
+  /**
+    All CUDA capabilities, sorted by version.
+
+    NOTE: Since the capabilities are sorted by version and architecture/family-specific features are
+    appended to the minor version component, the sorted list groups capabilities by baseline feature
+    set.
+
+    # Type
+
+    ```
+    allSortedCudaCapabilities :: [CudaCapability]
+    ```
+
+    # Example
+
+    ```
+    allSortedCudaCapabilities = [
+      "5.0"
+      "5.2"
+      "6.0"
+      "6.1"
+      "7.0"
+      "7.2"
+      "7.5"
+      "8.0"
+      "8.6"
+      "8.7"
+      "8.9"
+      "9.0"
+      "9.0a"
+      "10.0"
+      "10.0a"
+      "10.0f"
+      "10.1"
+      "10.1a"
+      "10.1f"
+      "10.3"
+      "10.3a"
+      "10.3f"
+    ];
+    ```
+  */
+  allSortedCudaCapabilities = lib.sort lib.versionOlder (lib.attrNames db.cudaCapabilityToInfo);
+
+  /**
+    Mapping of CUDA micro-architecture name to capabilities belonging to that micro-architecture.
+
+    # Type
+
+    ```
+    cudaArchNameToCapabilities :: AttrSet NonEmptyStr (NonEmptyListOf CudaCapability)
+    ```
+  */
+  cudaArchNameToCapabilities = lib.groupBy (
+    cudaCapability: db.cudaCapabilityToInfo.${cudaCapability}.archName
+  ) db.allSortedCudaCapabilities;
+}
--- a/pkgs/development/cuda-modules/_cuda/default.nix
+++ b/pkgs/development/cuda-modules/_cuda/default.nix
@@ -0,0 +1,31 @@
+# The _cuda attribute set is a fixed-point which contains the static functionality required to construct CUDA package
+# sets. For example, `_cuda.bootstrapData` includes information about NVIDIA's redistributables (such as the names
+# NVIDIA uses for different systems), `_cuda.lib` contains utility functions like `formatCapabilities` (which generate
+# common arguments passed to NVCC and `cmakeFlags`), and `_cuda.fixups` contains `callPackage`-able functions which
+# are provided to the corresponding package's `overrideAttrs` attribute to provide package-specific fixups
+# out of scope of the generic redistributable builder.
+#
+# Since this attribute set is used to construct the CUDA package sets, it must exist outside the fixed point of the
+# package sets. Make these attributes available directly in the package set construction could cause confusion if
+# users override the attribute set with the expection that changes will be reflected in the enclosing CUDA package
+# set. To avoid this, we declare `_cuda` and inherit its members here, at top-level. (This also allows us to benefit
+# from import caching, as it should be evaluated once per system, rather than per-system and CUDA package set.)
+
+let
+  lib = import ../../../../lib;
+in
+lib.fixedPoints.makeExtensible (final: {
+  bootstrapData = import ./db/bootstrap {
+    inherit lib;
+  };
+  db = import ./db {
+    inherit (final) bootstrapData db;
+    inherit lib;
+  };
+  extensions = [ ]; # Extensions applied to every CUDA package set.
+  fixups = import ./fixups { inherit lib; };
+  lib = import ./lib {
+    _cuda = final;
+    inherit lib;
+  };
+})
--- a/pkgs/development/cuda-modules/_cuda/fixups/cuda_compat.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/cuda_compat.nix
@@ -0,0 +1,12 @@
+{ flags, lib }:
+prevAttrs: {
+  autoPatchelfIgnoreMissingDeps = prevAttrs.autoPatchelfIgnoreMissingDeps or [ ] ++ [
+    "libnvrm_gpu.so"
+    "libnvrm_mem.so"
+    "libnvdla_runtime.so"
+  ];
+  # `cuda_compat` only works on aarch64-linux, and only when building for Jetson devices.
+  badPlatformsConditions = prevAttrs.badPlatformsConditions or { } // {
+    "Trying to use cuda_compat on aarch64-linux targeting non-Jetson devices" = !flags.isJetsonBuild;
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/cuda_cudart.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/cuda_cudart.nix
@@ -0,0 +1,37 @@
+# TODO(@connorbaker): cuda_cudart.dev depends on crt/host_config.h, which is from
+# (getDev cuda_nvcc). It would be nice to be able to encode that.
+{ addDriverRunpath, lib }:
+prevAttrs: {
+  # Remove once cuda-find-redist-features has a special case for libcuda
+  outputs =
+    prevAttrs.outputs or [ ]
+    ++ lib.lists.optionals (!(builtins.elem "stubs" prevAttrs.outputs)) [ "stubs" ];
+
+  allowFHSReferences = false;
+
+  # The libcuda stub's pkg-config doesn't follow the general pattern:
+  postPatch =
+    prevAttrs.postPatch or ""
+    + ''
+      while IFS= read -r -d $'\0' path; do
+        sed -i \
+          -e "s|^libdir\s*=.*/lib\$|libdir=''${!outputLib}/lib/stubs|" \
+          -e "s|^Libs\s*:\(.*\)\$|Libs: \1 -Wl,-rpath,${addDriverRunpath.driverLink}/lib|" \
+          "$path"
+      done < <(find -iname 'cuda-*.pc' -print0)
+    ''
+    # Namelink may not be enough, add a soname.
+    # Cf. https://gitlab.kitware.com/cmake/cmake/-/issues/25536
+    + ''
+      if [[ -f lib/stubs/libcuda.so && ! -f lib/stubs/libcuda.so.1 ]]; then
+        ln -s libcuda.so lib/stubs/libcuda.so.1
+      fi
+    '';
+
+  postFixup = prevAttrs.postFixup or "" + ''
+    mv "''${!outputDev}/share" "''${!outputDev}/lib"
+    moveToOutput lib/stubs "$stubs"
+    ln -s "$stubs"/lib/stubs/* "$stubs"/lib/
+    ln -s "$stubs"/lib/stubs "''${!outputLib}/lib/stubs"
+  '';
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/cuda_demo_suite.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/cuda_demo_suite.nix
@@ -0,0 +1,18 @@
+{
+  libglut,
+  libcufft,
+  libcurand,
+  libGLU,
+  libglvnd,
+  libgbm,
+}:
+prevAttrs: {
+  buildInputs = prevAttrs.buildInputs or [ ] ++ [
+    libglut
+    libcufft
+    libcurand
+    libGLU
+    libglvnd
+    libgbm
+  ];
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/cuda_gdb.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/cuda_gdb.nix
@@ -0,0 +1,35 @@
+{
+  cudaAtLeast,
+  gmp,
+  expat,
+  libxcrypt-legacy,
+  ncurses6,
+  python310,
+  python311,
+  python312,
+  stdenv,
+  lib,
+}:
+prevAttrs: {
+  buildInputs =
+    prevAttrs.buildInputs or [ ]
+    ++ [
+      gmp
+      libxcrypt-legacy
+      ncurses6
+      python310
+      python311
+      python312
+    ]
+    # aarch64,sbsa needs expat
+    ++ lib.lists.optionals (stdenv.hostPlatform.isAarch64) [ expat ];
+
+  installPhase =
+    prevAttrs.installPhase or ""
+    # Python 3.8 is not in nixpkgs anymore, delete Python 3.8 cuda-gdb support
+    # to avoid autopatchelf failing to find libpython3.8.so.
+    + ''
+      find $bin -name '*python3.8*' -delete
+      find $bin -name '*python3.9*' -delete
+    '';
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/cuda_nvcc.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/cuda_nvcc.nix
@@ -0,0 +1,62 @@
+{
+  lib,
+  backendStdenv,
+  setupCudaHook,
+}:
+prevAttrs: {
+  # Merge "bin" and "dev" into "out" to avoid circular references
+  outputs = builtins.filter (
+    x:
+    !(builtins.elem x [
+      "dev"
+      "bin"
+    ])
+  ) prevAttrs.outputs or [ ];
+
+  # Patch the nvcc.profile.
+  # Syntax:
+  # - `=` for assignment,
+  # - `?=` for conditional assignment,
+  # - `+=` to "prepend",
+  # - `=+` to "append".
+
+  # Cf. https://web.archive.org/web/20230308044351/https://arcb.csc.ncsu.edu/~mueller/cluster/nvidia/2.0/nvcc_2.0.pdf
+
+  # We set all variables with the lowest priority (=+), but we do force
+  # nvcc to use the fixed backend toolchain. Cf. comments in
+  # backend-stdenv.nix
+
+  postPatch =
+    prevAttrs.postPatch or ""
+    + ''
+      substituteInPlace bin/nvcc.profile \
+        --replace-fail \
+          '$(TOP)/$(_TARGET_DIR_)/include' \
+          "''${!outputDev}/include"
+    ''
+    + ''
+      cat << EOF >> bin/nvcc.profile
+
+      # Fix a compatible backend compiler
+      PATH += "${backendStdenv.cc}/bin":
+
+      # Expose the split-out nvvm
+      LIBRARIES =+ "-L''${!outputBin}/nvvm/lib"
+      INCLUDES =+ "-I''${!outputBin}/nvvm/include"
+      EOF
+    '';
+
+  # Entries here will be in nativeBuildInputs when cuda_nvcc is in nativeBuildInputs.
+  propagatedBuildInputs = prevAttrs.propagatedBuildInputs or [ ] ++ [ setupCudaHook ];
+
+  postInstall = prevAttrs.postInstall or "" + ''
+    moveToOutput "nvvm" "''${!outputBin}"
+  '';
+
+  # The nvcc and cicc binaries contain hard-coded references to /usr
+  allowFHSReferences = true;
+
+  meta = prevAttrs.meta or { } // {
+    mainProgram = "nvcc";
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/cuda_nvprof.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/cuda_nvprof.nix
@@ -0,0 +1 @@
+{ cuda_cupti }: prevAttrs: { buildInputs = prevAttrs.buildInputs or [ ] ++ [ cuda_cupti ]; }
--- a/pkgs/development/cuda-modules/_cuda/fixups/cuda_sanitizer_api.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/cuda_sanitizer_api.nix
@@ -0,0 +1 @@
+_: _: { outputs = [ "out" ]; }
--- a/pkgs/development/cuda-modules/_cuda/fixups/cudnn.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/cudnn.nix
@@ -0,0 +1,75 @@
+{
+  cudaOlder,
+  cudaMajorMinorVersion,
+  fetchurl,
+  lib,
+  libcublas,
+  patchelf,
+  zlib,
+}:
+let
+  inherit (lib)
+    attrsets
+    maintainers
+    meta
+    strings
+    ;
+in
+finalAttrs: prevAttrs: {
+  src = fetchurl { inherit (finalAttrs.passthru.redistribRelease) hash url; };
+
+  # Useful for inspecting why something went wrong.
+  badPlatformsConditions =
+    let
+      cudaTooOld = cudaOlder finalAttrs.passthru.featureRelease.minCudaVersion;
+      cudaTooNew =
+        (finalAttrs.passthru.featureRelease.maxCudaVersion != null)
+        && strings.versionOlder finalAttrs.passthru.featureRelease.maxCudaVersion cudaMajorMinorVersion;
+    in
+    prevAttrs.badPlatformsConditions or { }
+    // {
+      "CUDA version is too old" = cudaTooOld;
+      "CUDA version is too new" = cudaTooNew;
+    };
+
+  buildInputs = prevAttrs.buildInputs or [ ] ++ [
+    zlib
+    (attrsets.getLib libcublas)
+  ];
+
+  # Tell autoPatchelf about runtime dependencies. *_infer* libraries only
+  # exist in CuDNN 8.
+  # NOTE: Versions from CUDNN releases have four components.
+  postFixup =
+    prevAttrs.postFixup or ""
+    +
+      strings.optionalString
+        (
+          strings.versionAtLeast finalAttrs.version "8.0.5.0"
+          && strings.versionOlder finalAttrs.version "9.0.0.0"
+        )
+        ''
+          ${meta.getExe patchelf} $lib/lib/libcudnn.so --add-needed libcudnn_cnn_infer.so
+          ${meta.getExe patchelf} $lib/lib/libcudnn_ops_infer.so --add-needed libcublas.so --add-needed libcublasLt.so
+        '';
+
+  meta = prevAttrs.meta or { } // {
+    homepage = "https://developer.nvidia.com/cudnn";
+    maintainers =
+      prevAttrs.meta.maintainers or [ ]
+      ++ (with maintainers; [
+        mdaiter
+        samuela
+        connorbaker
+      ]);
+    # TODO(@connorbaker): Temporary workaround to avoid changing the derivation hash since introducing more
+    # brokenConditions would change the derivation as they're top-level and __structuredAttrs is set.
+    teams = prevAttrs.meta.teams or [ ];
+    license = {
+      shortName = "cuDNN EULA";
+      fullName = "NVIDIA cuDNN Software License Agreement (EULA)";
+      url = "https://docs.nvidia.com/deeplearning/sdk/cudnn-sla/index.html#supplement";
+      free = false;
+    };
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/default.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/default.nix
@@ -0,0 +1,11 @@
+{ lib }:
+lib.concatMapAttrs (
+  fileName: _type:
+  let
+    # Fixup is in `./${attrName}.nix` or in `./${fileName}/default.nix`:
+    attrName = lib.removeSuffix ".nix" fileName;
+    fixup = import (./. + "/${fileName}");
+    isFixup = fileName != "default.nix";
+  in
+  lib.optionalAttrs isFixup { ${attrName} = fixup; }
+) (builtins.readDir ./.)
--- a/pkgs/development/cuda-modules/_cuda/fixups/driver_assistant.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/driver_assistant.nix
@@ -0,0 +1,5 @@
+_: prevAttrs: {
+  badPlatformsConditions = prevAttrs.badPlatformsConditions or { } // {
+    "Package is not supported; use drivers from linuxPackages" = true;
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/fabricmanager.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/fabricmanager.nix
@@ -0,0 +1 @@
+{ zlib }: prevAttrs: { buildInputs = prevAttrs.buildInputs or [ ] ++ [ zlib ]; }
--- a/pkgs/development/cuda-modules/_cuda/fixups/imex.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/imex.nix
@@ -0,0 +1 @@
+{ zlib }: prevAttrs: { buildInputs = prevAttrs.buildInputs or [ ] ++ [ zlib ]; }
--- a/pkgs/development/cuda-modules/_cuda/fixups/libcufile.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/libcufile.nix
@@ -0,0 +1,12 @@
+{
+  libcublas,
+  numactl,
+  rdma-core,
+}:
+prevAttrs: {
+  buildInputs = prevAttrs.buildInputs or [ ] ++ [
+    libcublas
+    numactl
+    rdma-core
+  ];
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/libcusolver.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/libcusolver.nix
@@ -0,0 +1,19 @@
+{
+  cudaAtLeast,
+  lib,
+  libcublas,
+  libcusparse ? null,
+  libnvjitlink ? null,
+}:
+prevAttrs: {
+  buildInputs = prevAttrs.buildInputs or [ ] ++ [
+    libcublas
+    libnvjitlink
+    libcusparse
+  ];
+
+  brokenConditions = prevAttrs.brokenConditions or { } // {
+    "libnvjitlink missing (CUDA >= 12.0)" = libnvjitlink == null;
+    "libcusparse missing (CUDA >= 12.1)" = libcusparse == null;
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/libcusparse.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/libcusparse.nix
@@ -0,0 +1,12 @@
+{
+  cudaAtLeast,
+  lib,
+  libnvjitlink ? null,
+}:
+prevAttrs: {
+  buildInputs = prevAttrs.buildInputs or [ ] ++ [ libnvjitlink ];
+
+  brokenConditions = prevAttrs.brokenConditions or { } // {
+    "libnvjitlink missing (CUDA >= 12.0)" = libnvjitlink == null;
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/libcusparse_lt.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/libcusparse_lt.nix
@@ -0,0 +1,23 @@
+{
+  cuda_cudart,
+  lib,
+  libcublas,
+}:
+finalAttrs: prevAttrs: {
+  buildInputs =
+    prevAttrs.buildInputs or [ ]
+    ++ [ (lib.getLib libcublas) ]
+    # For some reason, the 1.4.x release of cusparselt requires the cudart library.
+    ++ lib.optionals (lib.hasPrefix "1.4" finalAttrs.version) [ (lib.getLib cuda_cudart) ];
+  meta = prevAttrs.meta or { } // {
+    description = "cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication";
+    homepage = "https://developer.nvidia.com/cusparselt-downloads";
+    maintainers = prevAttrs.meta.maintainers or [ ] ++ [ lib.maintainers.sepiabrown ];
+    teams = prevAttrs.meta.teams or [ ];
+    license = lib.licenses.unfreeRedistributable // {
+      shortName = "cuSPARSELt EULA";
+      fullName = "cuSPARSELt SUPPLEMENT TO SOFTWARE LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS";
+      url = "https://docs.nvidia.com/cuda/cusparselt/license.html";
+    };
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/libcutensor.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/libcutensor.nix
@@ -0,0 +1,23 @@
+{
+  cuda_cudart,
+  lib,
+  libcublas,
+}:
+finalAttrs: prevAttrs: {
+  buildInputs =
+    prevAttrs.buildInputs or [ ]
+    ++ [ (lib.getLib libcublas) ]
+    # For some reason, the 1.4.x release of cuTENSOR requires the cudart library.
+    ++ lib.optionals (lib.hasPrefix "1.4" finalAttrs.version) [ (lib.getLib cuda_cudart) ];
+  meta = prevAttrs.meta or { } // {
+    description = "cuTENSOR: A High-Performance CUDA Library For Tensor Primitives";
+    homepage = "https://developer.nvidia.com/cutensor";
+    maintainers = prevAttrs.meta.maintainers or [ ] ++ [ lib.maintainers.obsidian-systems-maintenance ];
+    teams = prevAttrs.meta.teams;
+    license = lib.licenses.unfreeRedistributable // {
+      shortName = "cuTENSOR EULA";
+      fullName = "cuTENSOR SUPPLEMENT TO SOFTWARE LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS";
+      url = "https://docs.nvidia.com/cuda/cutensor/license.html";
+    };
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/nsight_compute.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/nsight_compute.nix
@@ -0,0 +1,86 @@
+{
+  cudaAtLeast,
+  cudaMajorMinorVersion,
+  cudaOlder,
+  e2fsprogs,
+  elfutils,
+  flags,
+  gst_all_1,
+  lib,
+  libjpeg8,
+  qt6,
+  rdma-core,
+  stdenv,
+  ucx,
+}:
+prevAttrs:
+let
+  qtwayland = lib.getLib qt6.qtwayland;
+  inherit (qt6) wrapQtAppsHook qtwebview;
+  archDir =
+    {
+      aarch64-linux = "linux-" + (if flags.isJetsonBuild then "v4l_l4t" else "desktop") + "-t210-a64";
+      x86_64-linux = "linux-desktop-glibc_2_11_3-x64";
+    }
+    .${stdenv.hostPlatform.system} or (throw "Unsupported system: ${stdenv.hostPlatform.system}");
+in
+{
+  outputs = [ "out" ]; # NOTE(@connorbaker): Force a single output so relative lookups work.
+  nativeBuildInputs = prevAttrs.nativeBuildInputs or [ ] ++ [ wrapQtAppsHook ];
+  buildInputs =
+    prevAttrs.buildInputs or [ ]
+    ++ [
+      qtwayland
+      qtwebview
+      (qt6.qtwebengine or qt6.full)
+      rdma-core
+    ]
+    ++ lib.optionals (cudaOlder "12.7") [
+      e2fsprogs
+      ucx
+    ]
+    ++ lib.optionals (cudaMajorMinorVersion == "12.9") [
+      elfutils
+    ];
+  dontWrapQtApps = true;
+  preInstall = prevAttrs.preInstall or "" + ''
+    if [[ -d nsight-compute ]]; then
+      nixLog "Lifting components of Nsight Compute to the top level"
+      mv -v nsight-compute/*/* .
+      nixLog "Removing empty directories"
+      rmdir -pv nsight-compute/*
+    fi
+
+    rm -rf host/${archDir}/Mesa/
+  '';
+  postInstall =
+    prevAttrs.postInstall or ""
+    + ''
+      moveToOutput 'ncu' "''${!outputBin}/bin"
+      moveToOutput 'ncu-ui' "''${!outputBin}/bin"
+      moveToOutput 'host/${archDir}' "''${!outputBin}/bin"
+      moveToOutput 'target/${archDir}' "''${!outputBin}/bin"
+      wrapQtApp "''${!outputBin}/bin/host/${archDir}/ncu-ui.bin"
+    ''
+    # NOTE(@connorbaker): No idea what this platform is or how to patchelf for it.
+    + lib.optionalString (flags.isJetsonBuild && cudaOlder "12.9") ''
+      nixLog "Removing QNX 700 target directory for Jetson builds"
+      rm -rfv "''${!outputBin}/target/qnx-700-t210-a64"
+    ''
+    + lib.optionalString (flags.isJetsonBuild && cudaAtLeast "12.8") ''
+      nixLog "Removing QNX 800 target directory for Jetson builds"
+      rm -rfv "''${!outputBin}/target/qnx-800-tegra-a64"
+    '';
+  # lib needs libtiff.so.5, but nixpkgs provides libtiff.so.6
+  preFixup = prevAttrs.preFixup or "" + ''
+    patchelf --replace-needed libtiff.so.5 libtiff.so "''${!outputBin}/bin/host/${archDir}/Plugins/imageformats/libqtiff.so"
+  '';
+  autoPatchelfIgnoreMissingDeps = prevAttrs.autoPatchelfIgnoreMissingDeps or [ ] ++ [
+    "libnvidia-ml.so.1"
+  ];
+  # NOTE(@connorbaker): It might be a problem that when nsight_compute contains hosts and targets of different
+  # architectures, that we patchelf just the binaries matching the builder's platform; autoPatchelfHook prints
+  # messages like
+  #   skipping [$out]/host/linux-desktop-glibc_2_11_3-x64/libQt6Core.so.6 because its architecture (x64) differs from
+  #   target (AArch64)
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/nsight_systems.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/nsight_systems.nix
@@ -0,0 +1,136 @@
+{
+  boost178,
+  cuda_cudart,
+  cudaAtLeast,
+  e2fsprogs,
+  gst_all_1,
+  lib,
+  nss,
+  numactl,
+  pulseaudio,
+  qt6,
+  rdma-core,
+  stdenv,
+  ucx,
+  wayland,
+  xorg,
+}:
+prevAttrs:
+let
+  qtwayland = lib.getLib qt6.qtwayland;
+  qtWaylandPlugins = "${qtwayland}/${qt6.qtbase.qtPluginPrefix}";
+  # NOTE(@connorbaker): nsight_systems doesn't support Jetson, so no need for case splitting on aarch64-linux.
+  hostDir =
+    {
+      aarch64-linux = "host-linux-armv8";
+      x86_64-linux = "host-linux-x64";
+    }
+    .${stdenv.hostPlatform.system} or (throw "Unsupported system: ${stdenv.hostPlatform.system}");
+  targetDir =
+    {
+      aarch64-linux = "target-linux-sbsa-armv8";
+      x86_64-linux = "target-linux-x64";
+    }
+    .${stdenv.hostPlatform.system} or (throw "Unsupported system: ${stdenv.hostPlatform.system}");
+in
+{
+  outputs = [ "out" ]; # NOTE(@connorbaker): Force a single output so relative lookups work.
+
+  # An ad hoc replacement for
+  # https://github.com/ConnorBaker/cuda-redist-find-features/issues/11
+  env = prevAttrs.env or { } // {
+    rmPatterns =
+      prevAttrs.env.rmPatterns or ""
+      + toString [
+        "${hostDir}/lib{arrow,jpeg}*"
+        "${hostDir}/lib{ssl,ssh,crypto}*"
+        "${hostDir}/libboost*"
+        "${hostDir}/libexec"
+        "${hostDir}/libstdc*"
+        "${hostDir}/python/bin/python"
+        "${hostDir}/Mesa"
+      ];
+  };
+
+  # NOTE(@connorbaker): nsight-exporter and nsight-sys are deprecated scripts wrapping nsys, it's fine to remove them.
+  prePatch = prevAttrs.prePatch or "" + ''
+    if [[ -d bin ]]; then
+      nixLog "Removing bin wrapper scripts"
+      for knownWrapper in bin/{nsys{,-ui},nsight-{exporter,sys}}; do
+        [[ -e $knownWrapper ]] && rm -v "$knownWrapper"
+      done
+      unset -v knownWrapper
+
+      nixLog "Removing empty bin directory"
+      rmdir -v bin
+    fi
+
+    if [[ -d nsight-systems ]]; then
+      nixLog "Lifting components of Nsight System to the top level"
+      mv -v nsight-systems/*/* .
+      nixLog "Removing empty nsight-systems directory"
+      rmdir -pv nsight-systems/*
+    fi
+  '';
+
+  postPatch = prevAttrs.postPatch or "" + ''
+    for path in $rmPatterns; do
+      rm -r "$path"
+    done
+    patchShebangs nsight-systems
+  '';
+
+  nativeBuildInputs = prevAttrs.nativeBuildInputs or [ ] ++ [ qt6.wrapQtAppsHook ];
+
+  dontWrapQtApps = true;
+
+  buildInputs =
+    prevAttrs.buildInputs or [ ]
+    ++ [
+      (qt6.qtdeclarative or qt6.full)
+      (qt6.qtsvg or qt6.full)
+      (qt6.qtimageformats or qt6.full)
+      (qt6.qtpositioning or qt6.full)
+      (qt6.qtscxml or qt6.full)
+      (qt6.qttools or qt6.full)
+      (qt6.qtwebengine or qt6.full)
+      (qt6.qtwayland or qt6.full)
+      boost178
+      cuda_cudart.stubs
+      e2fsprogs
+      gst_all_1.gst-plugins-base
+      gst_all_1.gstreamer
+      nss
+      numactl
+      pulseaudio
+      qt6.qtbase
+      qtWaylandPlugins
+      rdma-core
+      ucx
+      wayland
+      xorg.libXcursor
+      xorg.libXdamage
+      xorg.libXrandr
+      xorg.libXtst
+    ]
+    # NOTE(@connorbaker): Seems to be required only for aarch64-linux.
+    ++ lib.optionals stdenv.hostPlatform.isAarch64 [
+      gst_all_1.gst-plugins-bad
+    ];
+
+  postInstall = prevAttrs.postInstall or "" + ''
+    moveToOutput '${hostDir}' "''${!outputBin}"
+    moveToOutput '${targetDir}' "''${!outputBin}"
+    moveToOutput 'bin' "''${!outputBin}"
+    wrapQtApp "''${!outputBin}/${hostDir}/nsys-ui.bin"
+  '';
+
+  # lib needs libtiff.so.5, but nixpkgs provides libtiff.so.6
+  preFixup = prevAttrs.preFixup or "" + ''
+    patchelf --replace-needed libtiff.so.5 libtiff.so "''${!outputBin}/${hostDir}/Plugins/imageformats/libqtiff.so"
+  '';
+
+  autoPatchelfIgnoreMissingDeps = prevAttrs.autoPatchelfIgnoreMissingDeps or [ ] ++ [
+    "libnvidia-ml.so.1"
+  ];
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/nvidia_driver.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/nvidia_driver.nix
@@ -0,0 +1,5 @@
+_: prevAttrs: {
+  badPlatformsConditions = prevAttrs.badPlatformsConditions or { } // {
+    "Package is not supported; use drivers from linuxPackages" = true;
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/fixups/tensorrt.nix
+++ b/pkgs/development/cuda-modules/_cuda/fixups/tensorrt.nix
@@ -0,0 +1,128 @@
+{
+  _cuda,
+  cudaOlder,
+  cudaPackages,
+  cudaMajorMinorVersion,
+  lib,
+  patchelf,
+  requireFile,
+  stdenv,
+}:
+let
+  inherit (lib)
+    attrsets
+    maintainers
+    meta
+    strings
+    versions
+    ;
+  inherit (stdenv) hostPlatform;
+  # targetArch :: String
+  targetArch = attrsets.attrByPath [ hostPlatform.system ] "unsupported" {
+    x86_64-linux = "x86_64-linux-gnu";
+    aarch64-linux = "aarch64-linux-gnu";
+  };
+in
+finalAttrs: prevAttrs: {
+  # Useful for inspecting why something went wrong.
+  brokenConditions =
+    let
+      cudaTooOld = cudaOlder finalAttrs.passthru.featureRelease.minCudaVersion;
+      cudaTooNew =
+        (finalAttrs.passthru.featureRelease.maxCudaVersion != null)
+        && strings.versionOlder finalAttrs.passthru.featureRelease.maxCudaVersion cudaMajorMinorVersion;
+      cudnnVersionIsSpecified = finalAttrs.passthru.featureRelease.cudnnVersion != null;
+      cudnnVersionSpecified = versions.majorMinor finalAttrs.passthru.featureRelease.cudnnVersion;
+      cudnnVersionProvided = versions.majorMinor finalAttrs.passthru.cudnn.version;
+      cudnnTooOld =
+        cudnnVersionIsSpecified && (strings.versionOlder cudnnVersionProvided cudnnVersionSpecified);
+      cudnnTooNew =
+        cudnnVersionIsSpecified && (strings.versionOlder cudnnVersionSpecified cudnnVersionProvided);
+    in
+    prevAttrs.brokenConditions or { }
+    // {
+      "CUDA version is too old" = cudaTooOld;
+      "CUDA version is too new" = cudaTooNew;
+      "CUDNN version is too old" = cudnnTooOld;
+      "CUDNN version is too new" = cudnnTooNew;
+    };
+
+  src = requireFile {
+    name = finalAttrs.passthru.redistribRelease.filename;
+    inherit (finalAttrs.passthru.redistribRelease) hash;
+    message = ''
+      To use the TensorRT derivation, you must join the NVIDIA Developer Program and
+      download the ${finalAttrs.version} TAR package for CUDA ${cudaMajorMinorVersion} from
+      ${finalAttrs.meta.homepage}.
+
+      Once you have downloaded the file, add it to the store with the following
+      command, and try building this derivation again.
+
+      $ nix-store --add-fixed sha256 ${finalAttrs.passthru.redistribRelease.filename}
+    '';
+  };
+
+  # We need to look inside the extracted output to get the files we need.
+  sourceRoot = "TensorRT-${finalAttrs.version}";
+
+  buildInputs = prevAttrs.buildInputs or [ ] ++ [ (finalAttrs.passthru.cudnn.lib or null) ];
+
+  preInstall =
+    prevAttrs.preInstall or ""
+    + strings.optionalString (targetArch != "unsupported") ''
+      # Replace symlinks to bin and lib with the actual directories from targets.
+      for dir in bin lib; do
+        rm "$dir"
+        mv "targets/${targetArch}/$dir" "$dir"
+      done
+
+      # Remove broken symlinks
+      for dir in include samples; do
+        rm "targets/${targetArch}/$dir" || :
+      done
+    '';
+
+  # Tell autoPatchelf about runtime dependencies.
+  postFixup =
+    let
+      versionTriple = "${versions.majorMinor finalAttrs.version}.${versions.patch finalAttrs.version}";
+    in
+    prevAttrs.postFixup or ""
+    + ''
+      ${meta.getExe' patchelf "patchelf"} --add-needed libnvinfer.so \
+        "$lib/lib/libnvinfer.so.${versionTriple}" \
+        "$lib/lib/libnvinfer_plugin.so.${versionTriple}" \
+        "$lib/lib/libnvinfer_builder_resource.so.${versionTriple}"
+    '';
+
+  passthru = prevAttrs.passthru or { } // {
+    # The CUDNN used with TensorRT.
+    # If null, the default cudnn derivation will be used.
+    # If a version is specified, the cudnn derivation with that version will be used,
+    # unless it is not available, in which case the default cudnn derivation will be used.
+    cudnn =
+      let
+        desiredName = _cuda.lib.mkVersionedName "cudnn" (
+          lib.versions.majorMinor finalAttrs.passthru.featureRelease.cudnnVersion
+        );
+      in
+      if finalAttrs.passthru.featureRelease.cudnnVersion == null || (cudaPackages ? desiredName) then
+        cudaPackages.cudnn
+      else
+        cudaPackages.${desiredName};
+  };
+
+  meta = prevAttrs.meta or { } // {
+    badPlatforms =
+      prevAttrs.meta.badPlatforms or [ ]
+      ++ lib.optionals (targetArch == "unsupported") [ hostPlatform.system ];
+    homepage = "https://developer.nvidia.com/tensorrt";
+    maintainers = prevAttrs.meta.maintainers or [ ] ++ [ maintainers.aidalgol ];
+    teams = prevAttrs.meta.teams or [ ];
+
+    # Building TensorRT on Hydra is impossible because of the non-redistributable
+    # license and because the source needs to be manually downloaded from the
+    # NVIDIA Developer Program (see requireFile above).
+    hydraPlatforms = lib.platforms.none;
+  };
+}
--- a/pkgs/development/cuda-modules/_cuda/lib/assertions.nix
+++ b/pkgs/development/cuda-modules/_cuda/lib/assertions.nix
@@ -0,0 +1,139 @@
+{ _cuda, lib }:
+{
+  /**
+    Evaluate assertions and add error context to return value.
+
+    NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
+
+    # Type
+
+    ```
+    _evaluateAssertions
+      :: (assertions :: List { assertion :: Bool, message :: String })
+      -> Bool
+    ```
+  */
+  _evaluateAssertions =
+    assertions:
+    let
+      failedAssertionsString = _cuda.lib._mkFailedAssertionsString assertions;
+    in
+    if failedAssertionsString == "" then
+      true
+    else
+      lib.addErrorContext "with failed assertions:${failedAssertionsString}" false;
+
+  /**
+    Function to generate a string of failed assertions.
+
+    NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
+
+    # Type
+
+    ```
+    _mkFailedAssertionsString
+      :: (assertions :: List { assertion :: Bool, message :: String })
+      -> String
+    ```
+
+    # Inputs
+
+    `assertions`
+
+    : A list of assertions to evaluate
+
+    # Examples
+
+    :::{.example}
+    ## `_cuda.lib._mkFailedAssertionsString` usage examples
+
+    ```nix
+    _mkFailedAssertionsString [
+      { assertion = false; message = "Assertion 1 failed"; }
+      { assertion = true; message = "Assertion 2 failed"; }
+    ]
+    => "\n- Assertion 1 failed"
+    ```
+
+    ```nix
+    _mkFailedAssertionsString [
+      { assertion = false; message = "Assertion 1 failed"; }
+      { assertion = false; message = "Assertion 2 failed"; }
+    ]
+    => "\n- Assertion 1 failed\n- Assertion 2 failed"
+    ```
+    :::
+  */
+  _mkFailedAssertionsString = lib.foldl' (
+    failedAssertionsString:
+    { assertion, message }:
+    failedAssertionsString + lib.optionalString (!assertion) ("\n- " + message)
+  ) "";
+
+  /**
+    Utility function to generate assertions for missing packages.
+
+    Used to mark a package as unsupported if any of its required packages are missing (null).
+
+    Expects a set of attributes.
+
+    Most commonly used in overrides files on a callPackage-provided attribute set of packages.
+
+    NOTE: We typically use platfromAssertions instead of brokenAssertions because the presence of packages set to null
+    means evaluation will fail if package attributes are accessed without checking for null first. OfBorg evaluation
+    sets allowBroken to true, which means we can't rely on brokenAssertions to prevent evaluation of a package with
+    missing dependencies.
+
+    NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
+
+    # Type
+
+    ```
+    _mkMissingPackagesAssertions
+      :: (attrs :: AttrSet)
+      -> (assertions :: List { assertion :: Bool, message :: String })
+    ```
+
+    # Inputs
+
+    `attrs`
+
+    : The attributes to check for null
+
+    # Examples
+
+    :::{.example}
+    ## `_cuda.lib._mkMissingPackagesAssertions` usage examples
+
+    ```nix
+    {
+      lib,
+      libcal ? null,
+      libcublas,
+      utils,
+    }:
+    let
+      inherit (lib.attrsets) recursiveUpdate;
+      inherit (_cuda.lib) _mkMissingPackagesAssertions;
+    in
+    prevAttrs: {
+      passthru = prevAttrs.passthru or { } // {
+        platformAssertions =
+          prevAttrs.passthru.platformAssertions or [ ]
+          ++ _mkMissingPackagesAssertions { inherit libcal; };
+      };
+    }
+    ```
+    :::
+  */
+  _mkMissingPackagesAssertions = lib.flip lib.pipe [
+    # Take the attributes that are null.
+    (lib.filterAttrs (_: value: value == null))
+    lib.attrNames
+    # Map them to assertions.
+    (lib.map (name: {
+      message = "${name} is available";
+      assertion = false;
+    }))
+  ];
+}
--- a/pkgs/development/cuda-modules/_cuda/lib/cuda.nix
+++ b/pkgs/development/cuda-modules/_cuda/lib/cuda.nix
@@ -0,0 +1,129 @@
+{ lib }:
+{
+  /**
+    Returns whether a capability should be built by default for a particular CUDA version.
+
+    Capabilities built by default are baseline, non-Jetson capabilities with relatively recent CUDA support.
+
+    NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
+
+    # Type
+
+    ```
+    _cudaCapabilityIsDefault
+      :: (cudaMajorMinorVersion :: Version)
+      -> (cudaCapabilityInfo :: CudaCapabilityInfo)
+      -> Bool
+    ```
+
+    # Inputs
+
+    `cudaMajorMinorVersion`
+
+    : The CUDA version to check
+
+    `cudaCapabilityInfo`
+
+    : The capability information to check
+  */
+  _cudaCapabilityIsDefault =
+    cudaMajorMinorVersion: cudaCapabilityInfo:
+    let
+      recentCapability =
+        cudaCapabilityInfo.dontDefaultAfterCudaMajorMinorVersion == null
+        || lib.versionAtLeast cudaCapabilityInfo.dontDefaultAfterCudaMajorMinorVersion cudaMajorMinorVersion;
+    in
+    recentCapability
+    && !cudaCapabilityInfo.isJetson
+    && !cudaCapabilityInfo.isArchitectureSpecific
+    && !cudaCapabilityInfo.isFamilySpecific;
+
+  /**
+    Returns whether a capability is supported for a particular CUDA version.
+
+    NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
+
+    # Type
+
+    ```
+    _cudaCapabilityIsSupported
+      :: (cudaMajorMinorVersion :: Version)
+      -> (cudaCapabilityInfo :: CudaCapabilityInfo)
+      -> Bool
+    ```
+
+    # Inputs
+
+    `cudaMajorMinorVersion`
+
+    : The CUDA version to check
+
+    `cudaCapabilityInfo`
+
+    : The capability information to check
+  */
+  _cudaCapabilityIsSupported =
+    cudaMajorMinorVersion: cudaCapabilityInfo:
+    let
+      lowerBoundSatisfied = lib.versionAtLeast cudaMajorMinorVersion cudaCapabilityInfo.minCudaMajorMinorVersion;
+      upperBoundSatisfied =
+        cudaCapabilityInfo.maxCudaMajorMinorVersion == null
+        || lib.versionAtLeast cudaCapabilityInfo.maxCudaMajorMinorVersion cudaMajorMinorVersion;
+    in
+    lowerBoundSatisfied && upperBoundSatisfied;
+
+  /**
+    Generates a CUDA variant name from a version.
+
+    NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
+
+    # Type
+
+    ```
+    _mkCudaVariant :: (version :: String) -> String
+    ```
+
+    # Inputs
+
+    `version`
+
+    : The version string
+
+    # Examples
+
+    :::{.example}
+    ## `_cuda.lib._mkCudaVariant` usage examples
+
+    ```nix
+    _mkCudaVariant "11.0"
+    => "cuda11"
+    ```
+    :::
+  */
+  _mkCudaVariant = version: "cuda${lib.versions.major version}";
+
+  /**
+    A predicate which, given a package, returns true if the package has a free license or one of NVIDIA's licenses.
+
+    This function is intended to be provided as `config.allowUnfreePredicate` when `import`-ing Nixpkgs.
+
+    # Type
+
+    ```
+    allowUnfreeCudaPredicate :: (package :: Package) -> Bool
+    ```
+  */
+  allowUnfreeCudaPredicate =
+    package:
+    lib.all (
+      license:
+      license.free
+      || lib.elem license.shortName [
+        "CUDA EULA"
+        "cuDNN EULA"
+        "cuSPARSELt EULA"
+        "cuTENSOR EULA"
+        "NVidia OptiX EULA"
+      ]
+    ) (lib.toList package.meta.license);
+}
--- a/pkgs/development/cuda-modules/_cuda/lib/default.nix
+++ b/pkgs/development/cuda-modules/_cuda/lib/default.nix
@@ -0,0 +1,52 @@
+{
+  _cuda,
+  lib,
+}:
+{
+  # See ./assertions.nix for documentation.
+  inherit (import ./assertions.nix { inherit _cuda lib; })
+    _evaluateAssertions
+    _mkFailedAssertionsString
+    _mkMissingPackagesAssertions
+    ;
+
+  # See ./cuda.nix for documentation.
+  inherit (import ./cuda.nix { inherit lib; })
+    _cudaCapabilityIsDefault
+    _cudaCapabilityIsSupported
+    _mkCudaVariant
+    allowUnfreeCudaPredicate
+    ;
+
+  # See ./meta.nix for documentation.
+  inherit (import ./meta.nix { inherit _cuda lib; })
+    _mkMetaBadPlatforms
+    _mkMetaBroken
+    ;
+
+  # See ./redist.nix for documentation.
+  inherit (import ./redist.nix { inherit _cuda lib; })
+    _redistSystemIsSupported
+    getNixSystems
+    getRedistSystem
+    mkRedistUrl
+    ;
+
+  # See ./strings.nix for documentation.
+  inherit (import ./strings.nix { inherit _cuda lib; })
+    dotsToUnderscores
+    dropDots
+    formatCapabilities
+    mkCmakeCudaArchitecturesString
+    mkGencodeFlag
+    mkRealArchitecture
+    mkVersionedName
+    mkVirtualArchitecture
+    ;
+
+  # See ./versions.nix for documentation.
+  inherit (import ./versions.nix { inherit _cuda lib; })
+    majorMinorPatch
+    trimComponents
+    ;
+}
--- a/pkgs/development/cuda-modules/_cuda/lib/meta.nix
+++ b/pkgs/development/cuda-modules/_cuda/lib/meta.nix
@@ -0,0 +1,71 @@
+{ _cuda, lib }:
+{
+  /**
+    Returns a list of bad platforms for a given package if assertsions in `finalAttrs.passthru.platformAssertions`
+    fail, optionally logging evaluation warnings for each reason.
+
+    NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
+
+    NOTE: This function requires `finalAttrs.passthru.platformAssertions` to be a list of assertions and
+    `finalAttrs.finalPackage.name` and `finalAttrs.finalPackage.stdenv` to be available.
+
+    # Type
+
+    ```
+    _mkMetaBadPlatforms :: (warn :: Bool) -> (finalAttrs :: AttrSet) -> List String
+    ```
+  */
+  _mkMetaBadPlatforms =
+    warn: finalAttrs:
+    let
+      failedAssertionsString = _cuda.lib._mkFailedAssertionsString finalAttrs.passthru.platformAssertions;
+      hasFailedAssertions = failedAssertionsString != "";
+      finalStdenv = finalAttrs.finalPackage.stdenv;
+    in
+    lib.warnIf (warn && hasFailedAssertions)
+      "Package ${finalAttrs.finalPackage.name} is unsupported on this platform due to the following failed assertions:${failedAssertionsString}"
+      (
+        lib.optionals hasFailedAssertions (
+          lib.unique [
+            finalStdenv.buildPlatform.system
+            finalStdenv.hostPlatform.system
+            finalStdenv.targetPlatform.system
+          ]
+        )
+      );
+
+  /**
+    Returns a boolean indicating whether the package is broken as a result of `finalAttrs.passthru.brokenAssertions`,
+    optionally logging evaluation warnings for each reason.
+
+    NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
+
+    NOTE: This function requires `finalAttrs.passthru.brokenAssertions` to be a list of assertions and
+    `finalAttrs.finalPackage.name` to be available.
+
+    # Type
+
+    ```
+    _mkMetaBroken :: (warn :: Bool) -> (finalAttrs :: AttrSet) -> Bool
+    ```
+
+    # Inputs
+
+    `warn`
+
+    : A boolean indicating whether to log warnings
+
+    `finalAttrs`
+
+    : The final attributes of the package
+  */
+  _mkMetaBroken =
+    warn: finalAttrs:
+    let
+      failedAssertionsString = _cuda.lib._mkFailedAssertionsString finalAttrs.passthru.brokenAssertions;
+      hasFailedAssertions = failedAssertionsString != "";
+    in
+    lib.warnIf (warn && hasFailedAssertions)
+      "Package ${finalAttrs.finalPackage.name} is marked as broken due to the following failed assertions:${failedAssertionsString}"
+      hasFailedAssertions;
+}
--- a/pkgs/development/cuda-modules/_cuda/lib/redist.nix
+++ b/pkgs/development/cuda-modules/_cuda/lib/redist.nix
@@ -0,0 +1,196 @@
+{ _cuda, lib }:
+{
+  /**
+    Returns a boolean indicating whether the provided redist system is supported by any of the provided redist systems.
+
+    NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
+
+    # Type
+
+    ```
+    _redistSystemIsSupported
+      :: (redistSystem :: RedistSystem)
+      -> (redistSystems :: List RedistSystem)
+      -> Bool
+    ```
+
+    # Inputs
+
+    `redistSystem`
+
+    : The redist system to check
+
+    `redistSystems`
+
+    : The list of redist systems to check against
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib._redistSystemIsSupported` usage examples
+
+    ```nix
+    _redistSystemIsSupported "linux-x86_64" [ "linux-x86_64" ]
+    => true
+    ```
+
+    ```nix
+    _redistSystemIsSupported "linux-x86_64" [ "linux-aarch64" ]
+    => false
+    ```
+
+    ```nix
+    _redistSystemIsSupported "linux-x86_64" [ "linux-aarch64" "linux-x86_64" ]
+    => true
+    ```
+
+    ```nix
+    _redistSystemIsSupported "linux-x86_64" [ "linux-aarch64" "linux-all" ]
+    => true
+    ```
+    :::
+  */
+  _redistSystemIsSupported =
+    redistSystem: redistSystems:
+    lib.findFirst (
+      redistSystem':
+      redistSystem' == redistSystem || redistSystem' == "linux-all" || redistSystem' == "source"
+    ) null redistSystems != null;
+
+  /**
+    Maps a NVIDIA redistributable system to Nix systems.
+
+    NOTE: This function returns a list of systems because the redistributable systems `"linux-all"` and `"source"` can
+    be built on multiple systems.
+
+    NOTE: This function *will* be called by unsupported systems because `cudaPackages` is evaluated on all systems. As
+    such, we need to handle unsupported systems gracefully.
+
+    # Type
+
+    ```
+    getNixSystems :: (redistSystem :: RedistSystem) -> [String]
+    ```
+
+    # Inputs
+
+    `redistSystem`
+
+    : The NVIDIA redistributable system
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib.getNixSystems` usage examples
+
+    ```nix
+    getNixSystems "linux-sbsa"
+    => [ "aarch64-linux" ]
+    ```
+
+    ```nix
+    getNixSystems "linux-aarch64"
+    => [ "aarch64-linux" ]
+    ```
+    :::
+  */
+  getNixSystems =
+    redistSystem:
+    if redistSystem == "linux-x86_64" then
+      [ "x86_64-linux" ]
+    else if redistSystem == "linux-sbsa" || redistSystem == "linux-aarch64" then
+      [ "aarch64-linux" ]
+    else if redistSystem == "linux-all" || redistSystem == "source" then
+      [
+        "aarch64-linux"
+        "x86_64-linux"
+      ]
+    else
+      [ ];
+
+  /**
+    Maps a Nix system to a NVIDIA redistributable system.
+
+    NOTE: We swap out the default `linux-sbsa` redist (for server-grade ARM chips) with the `linux-aarch64` redist
+    (which is for Jetson devices) if we're building any Jetson devices. Since both are based on aarch64, we can only
+    have one or the other, otherwise there's an ambiguity as to which should be used.
+
+    NOTE: This function *will* be called by unsupported systems because `cudaPackages` is evaluated on all systems. As
+    such, we need to handle unsupported systems gracefully.
+
+    # Type
+
+    ```
+    getRedistSystem :: (hasJetsonCudaCapability :: Bool) -> (nixSystem :: String) -> String
+    ```
+
+    # Inputs
+
+    `hasJetsonCudaCapability`
+
+    : If configured for a Jetson device
+
+    `nixSystem`
+
+    : The Nix system
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib.getRedistSystem` usage examples
+
+    ```nix
+    getRedistSystem true "aarch64-linux"
+    => "linux-aarch64"
+    ```
+
+    ```nix
+    getRedistSystem false "aarch64-linux"
+    => "linux-sbsa"
+    ```
+    :::
+  */
+  getRedistSystem =
+    hasJetsonCudaCapability: nixSystem:
+    if nixSystem == "x86_64-linux" then
+      "linux-x86_64"
+    else if nixSystem == "aarch64-linux" then
+      if hasJetsonCudaCapability then "linux-aarch64" else "linux-sbsa"
+    else
+      "unsupported";
+
+  /**
+    Function to generate a URL for something in the redistributable tree.
+
+    # Type
+
+    ```
+    mkRedistUrl :: (redistName :: RedistName) -> (relativePath :: NonEmptyStr) -> RedistUrl
+    ```
+
+    # Inputs
+
+    `redistName`
+
+    : The name of the redistributable
+
+    `relativePath`
+
+    : The relative path to a file in the redistributable tree
+  */
+  mkRedistUrl =
+    redistName: relativePath:
+    lib.concatStringsSep "/" (
+      [ _cuda.db.redistUrlPrefix ]
+      ++ (
+        if redistName != "tensorrt" then
+          [
+            redistName
+            "redist"
+          ]
+        else
+          [ "machine-learning" ]
+      )
+      ++ [ relativePath ]
+    );
+}
--- a/pkgs/development/cuda-modules/_cuda/lib/strings.nix
+++ b/pkgs/development/cuda-modules/_cuda/lib/strings.nix
@@ -0,0 +1,382 @@
+{ _cuda, lib }:
+let
+  cudaLib = _cuda.lib;
+in
+{
+  /**
+    Replaces dots in a string with underscores.
+
+    # Type
+
+    ```
+    dotsToUnderscores :: (str :: String) -> String
+    ```
+
+    # Inputs
+
+    `str`
+
+    : The string for which dots shall be replaced by underscores
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib.dotsToUnderscores` usage examples
+
+    ```nix
+    dotsToUnderscores "1.2.3"
+    => "1_2_3"
+    ```
+    :::
+  */
+  dotsToUnderscores = lib.replaceStrings [ "." ] [ "_" ];
+
+  /**
+    Removes the dots from a string.
+
+    # Type
+
+    ```
+    dropDots :: (str :: String) -> String
+    ```
+
+    # Inputs
+
+    `str`
+
+    : The string to remove dots from
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib.dropDots` usage examples
+
+    ```nix
+    dropDots "1.2.3"
+    => "123"
+    ```
+    :::
+  */
+  dropDots = lib.replaceStrings [ "." ] [ "" ];
+
+  /**
+    Produces an attribute set of useful data and functionality for packaging CUDA software within Nixpkgs.
+
+    # Type
+
+    ```
+    formatCapabilities
+      :: { cudaCapabilityToInfo :: AttrSet CudaCapability CudaCapabilityInfo
+         , cudaCapabilities :: List CudaCapability
+         , cudaForwardCompat :: Bool
+         }
+      -> { cudaCapabilities :: List CudaCapability
+         , cudaForwardCompat :: Bool
+         , gencode :: List String
+         , realArches :: List String
+         , virtualArches :: List String
+         , archNames :: List String
+         , arches :: List String
+         , gencodeString :: String
+         , cmakeCudaArchitecturesString :: String
+         }
+    ```
+
+    # Inputs
+
+    `cudaCapabilityToInfo`
+
+    : A mapping of CUDA capabilities to their information
+
+    `cudaCapabilities`
+
+    : A list of CUDA capabilities to use
+
+    `cudaForwardCompat`
+
+    : A boolean indicating whether to include the forward compatibility gencode (+PTX) to support future GPU
+      generations
+  */
+  formatCapabilities =
+    {
+      cudaCapabilityToInfo,
+      cudaCapabilities,
+      cudaForwardCompat,
+    }:
+    let
+      /**
+        The real architectures for the given CUDA capabilities.
+
+        # Type
+
+        ```
+        realArches :: List String
+        ```
+      */
+      realArches = lib.map cudaLib.mkRealArchitecture cudaCapabilities;
+
+      /**
+        The virtual architectures for the given CUDA capabilities.
+
+        These are typically used for forward compatibility, when trying to support an architecture newer than the CUDA
+        version allows.
+
+        # Type
+
+        ```
+        virtualArches :: List String
+        ```
+      */
+      virtualArches = lib.map cudaLib.mkVirtualArchitecture cudaCapabilities;
+
+      /**
+        The gencode flags for the given CUDA capabilities.
+
+        # Type
+
+        ```
+        gencode :: List String
+        ```
+      */
+      gencode =
+        let
+          base = lib.map (cudaLib.mkGencodeFlag "sm") cudaCapabilities;
+          forward = cudaLib.mkGencodeFlag "compute" (lib.last cudaCapabilities);
+        in
+        base ++ lib.optionals cudaForwardCompat [ forward ];
+    in
+    {
+      inherit
+        cudaCapabilities
+        cudaForwardCompat
+        gencode
+        realArches
+        virtualArches
+        ;
+
+      /**
+        The architecture names for the given CUDA capabilities.
+
+        # Type
+
+        ```
+        archNames :: List String
+        ```
+      */
+      # E.g. [ "Ampere" "Turing" ]
+      archNames = lib.pipe cudaCapabilities [
+        (lib.map (cudaCapability: cudaCapabilityToInfo.${cudaCapability}.archName))
+        lib.unique
+        lib.naturalSort
+      ];
+
+      /**
+        The architectures for the given CUDA capabilities, including both real and virtual architectures.
+
+        When `cudaForwardCompat` is enabled, the last architecture in the list is used as the forward compatibility architecture.
+
+        # Type
+
+        ```
+        arches :: List String
+        ```
+      */
+      # E.g. [ "sm_75" "sm_86" "compute_86" ]
+      arches = realArches ++ lib.optionals cudaForwardCompat [ (lib.last virtualArches) ];
+
+      /**
+        The CMake-compatible CUDA architectures string for the given CUDA capabilities.
+
+        # Type
+
+        ```
+        cmakeCudaArchitecturesString :: String
+        ```
+      */
+      cmakeCudaArchitecturesString = cudaLib.mkCmakeCudaArchitecturesString cudaCapabilities;
+
+      /**
+        The gencode string for the given CUDA capabilities.
+
+        # Type
+
+        ```
+        gencodeString :: String
+        ```
+      */
+      gencodeString = lib.concatStringsSep " " gencode;
+    };
+
+  /**
+    Produces a CMake-compatible CUDA architecture string from a list of CUDA capabilities.
+
+    # Type
+
+    ```
+    mkCmakeCudaArchitecturesString :: (cudaCapabilities :: List String) -> String
+    ```
+
+    # Inputs
+
+    `cudaCapabilities`
+
+    : The CUDA capabilities to convert
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib.mkCmakeCudaArchitecturesString` usage examples
+
+    ```nix
+    mkCmakeCudaArchitecturesString [ "8.9" "10.0a" ]
+    => "89;100a"
+    ```
+    :::
+  */
+  mkCmakeCudaArchitecturesString = lib.concatMapStringsSep ";" cudaLib.dropDots;
+
+  /**
+    Produces a gencode flag from a CUDA capability.
+
+    # Type
+
+    ```
+    mkGencodeFlag :: (archPrefix :: String) -> (cudaCapability :: String) -> String
+    ```
+
+    # Inputs
+
+    `archPrefix`
+
+    : The architecture prefix to use for the `code` field
+
+    `cudaCapability`
+
+    : The CUDA capability to convert
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib.mkGencodeFlag` usage examples
+
+    ```nix
+    mkGencodeFlag "sm" "8.9"
+    => "-gencode=arch=compute_89,code=sm_89"
+    ```
+
+    ```nix
+    mkGencodeFlag "compute" "10.0a"
+    => "-gencode=arch=compute_100a,code=compute_100a"
+    ```
+    :::
+  */
+  mkGencodeFlag =
+    archPrefix: cudaCapability:
+    let
+      cap = cudaLib.dropDots cudaCapability;
+    in
+    "-gencode=arch=compute_${cap},code=${archPrefix}_${cap}";
+
+  /**
+    Produces a real architecture string from a CUDA capability.
+
+    # Type
+
+    ```
+    mkRealArchitecture :: (cudaCapability :: String) -> String
+    ```
+
+    # Inputs
+
+    `cudaCapability`
+
+    : The CUDA capability to convert
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib.mkRealArchitecture` usage examples
+
+    ```nix
+    mkRealArchitecture "8.9"
+    => "sm_89"
+    ```
+
+    ```nix
+    mkRealArchitecture "10.0a"
+    => "sm_100a"
+    ```
+    :::
+  */
+  mkRealArchitecture = cudaCapability: "sm_" + cudaLib.dropDots cudaCapability;
+
+  /**
+    Create a versioned attribute name from a version by replacing dots with underscores.
+
+    # Type
+
+    ```
+    mkVersionedName :: (name :: String) -> (version :: Version) -> String
+    ```
+
+    # Inputs
+
+    `name`
+
+    : The name to use
+
+    `version`
+
+    : The version to use
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib.mkVersionedName` usage examples
+
+    ```nix
+    mkVersionedName "hello" "1.2.3"
+    => "hello_1_2_3"
+    ```
+
+    ```nix
+    mkVersionedName "cudaPackages" "12.8"
+    => "cudaPackages_12_8"
+    ```
+    :::
+  */
+  mkVersionedName = name: version: "${name}_${cudaLib.dotsToUnderscores version}";
+
+  /**
+    Produces a virtual architecture string from a CUDA capability.
+
+    # Type
+
+    ```
+    mkVirtualArchitecture :: (cudaCapability :: String) -> String
+    ```
+
+    # Inputs
+
+    `cudaCapability`
+
+    : The CUDA capability to convert
+
+    # Examples
+
+    :::{.example}
+    ## `cudaLib.mkVirtualArchitecture` usage examples
+
+    ```nix
+    mkVirtualArchitecture "8.9"
+    => "compute_89"
+    ```
+
+    ```nix
+    mkVirtualArchitecture "10.0a"
+    => "compute_100a"
+    ```
+    :::
+  */
+  mkVirtualArchitecture = cudaCapability: "compute_" + cudaLib.dropDots cudaCapability;
+}
--- a/pkgs/development/cuda-modules/_cuda/lib/versions.nix
+++ b/pkgs/development/cuda-modules/_cuda/lib/versions.nix
@@ -0,0 +1,79 @@
+{ _cuda, lib }:
+let
+  cudaLib = _cuda.lib;
+in
+{
+  /**
+    Extracts the major, minor, and patch version from a string.
+
+    # Type
+
+    ```
+    majorMinorPatch :: (version :: String) -> String
+    ```
+
+    # Inputs
+
+    `version`
+
+    : The version string
+
+    # Examples
+
+    :::{.example}
+    ## `_cuda.lib.majorMinorPatch` usage examples
+
+    ```nix
+    majorMinorPatch "11.0.3.4"
+    => "11.0.3"
+    ```
+    :::
+  */
+  majorMinorPatch = cudaLib.trimComponents 3;
+
+  /**
+    Get a version string with no more than than the specified number of components.
+
+    # Type
+
+    ```
+    trimComponents :: (numComponents :: Integer) -> (version :: String) -> String
+    ```
+
+    # Inputs
+
+    `numComponents`
+    : A positive integer corresponding to the maximum number of components to keep
+
+    `version`
+    : A version string
+
+    # Examples
+
+    :::{.example}
+    ## `_cuda.lib.trimComponents` usage examples
+
+    ```nix
+    trimComponents 1 "1.2.3.4"
+    => "1"
+    ```
+
+    ```nix
+    trimComponents 3 "1.2.3.4"
+    => "1.2.3"
+    ```
+
+    ```nix
+    trimComponents 9 "1.2.3.4"
+    => "1.2.3.4"
+    ```
+    :::
+  */
+  trimComponents =
+    n: v:
+    lib.pipe v [
+      lib.splitVersion
+      (lib.take n)
+      (lib.concatStringsSep ".")
+    ];
+}
--- a/pkgs/development/cuda-modules/aliases.nix
+++ b/pkgs/development/cuda-modules/aliases.nix
@@ -0,0 +1,28 @@
+# Packages which have been deprecated or removed from cudaPackages
+{ lib }:
+let
+  mkRenamed =
+    oldName:
+    { path, package }:
+    lib.warn "cudaPackages.${oldName} is deprecated, use ${path} instead" package;
+in
+final: _:
+builtins.mapAttrs mkRenamed {
+  # A comment to prevent empty { } from collapsing into a single line
+
+  cudaFlags = {
+    path = "cudaPackages.flags";
+    package = final.flags;
+  };
+
+  cudaVersion = {
+    path = "cudaPackages.cudaMajorMinorVersion";
+    package = final.cudaMajorMinorVersion;
+  };
+
+  cudatoolkit-legacy-runfile = {
+    path = "cudaPackages.cudatoolkit";
+    package = final.cudatoolkit;
+  };
+
+}
--- a/pkgs/development/cuda-modules/cuda-library-samples/extension.nix
+++ b/pkgs/development/cuda-modules/cuda-library-samples/extension.nix
@@ -0,0 +1,16 @@
+{ lib, stdenv }:
+let
+  inherit (stdenv) hostPlatform;
+
+  # Samples are built around the CUDA Toolkit, which is not available for
+  # aarch64. Check for both CUDA version and platform.
+  platformIsSupported = hostPlatform.isx86_64 && hostPlatform.isLinux;
+
+  # Build our extension
+  extension =
+    final: _:
+    lib.attrsets.optionalAttrs platformIsSupported {
+      cuda-library-samples = final.callPackage ./generic.nix { };
+    };
+in
+extension
--- a/pkgs/development/cuda-modules/cuda-library-samples/generic.nix
+++ b/pkgs/development/cuda-modules/cuda-library-samples/generic.nix
@@ -0,0 +1,137 @@
+{
+  addDriverRunpath,
+  autoAddDriverRunpath,
+  autoPatchelfHook,
+  backendStdenv,
+  cmake,
+  cuda_cccl ? null,
+  cuda_cudart ? null,
+  cuda_nvcc ? null,
+  cudatoolkit,
+  cusparselt ? null,
+  cutensor ? null,
+  fetchFromGitHub,
+  lib,
+  libcusparse ? null,
+  setupCudaHook,
+}:
+
+let
+  base = backendStdenv.mkDerivation (finalAttrs: {
+    src = fetchFromGitHub {
+      owner = "NVIDIA";
+      repo = "CUDALibrarySamples";
+      rev = "e57b9c483c5384b7b97b7d129457e5a9bdcdb5e1";
+      sha256 = "0g17afsmb8am0darxchqgjz1lmkaihmnn7k1x4ahg5gllcmw8k3l";
+    };
+    version =
+      lib.strings.substring 0 7 finalAttrs.src.rev + "-" + lib.versions.majorMinor cudatoolkit.version;
+    nativeBuildInputs = [
+      cmake
+      addDriverRunpath
+    ];
+    buildInputs = [ cudatoolkit ];
+    postFixup = ''
+      for exe in $out/bin/*; do
+        addDriverRunpath $exe
+      done
+    '';
+    meta = {
+      description = "examples of using libraries using CUDA";
+      longDescription = ''
+        CUDA Library Samples contains examples demonstrating the use of
+        features in the math and image processing libraries cuBLAS, cuTENSOR,
+        cuSPARSE, cuSOLVER, cuFFT, cuRAND, NPP and nvJPEG.
+      '';
+      license = lib.licenses.bsd3;
+      platforms = [ "x86_64-linux" ];
+      maintainers = with lib.maintainers; [ obsidian-systems-maintenance ];
+      teams = [ lib.teams.cuda ];
+    };
+  });
+in
+
+{
+  cublas = base.overrideAttrs (
+    finalAttrs: _: {
+      pname = "cuda-library-samples-cublas";
+      sourceRoot = "${finalAttrs.src.name}/cuBLASLt";
+    }
+  );
+
+  cusolver = base.overrideAttrs (
+    finalAttrs: _: {
+      pname = "cuda-library-samples-cusolver";
+      sourceRoot = "${finalAttrs.src.name}/cuSOLVER/gesv";
+    }
+  );
+
+  cutensor = base.overrideAttrs (
+    finalAttrs: prevAttrs: {
+      pname = "cuda-library-samples-cutensor";
+
+      sourceRoot = "${finalAttrs.src.name}/cuTENSOR";
+
+      buildInputs = prevAttrs.buildInputs or [ ] ++ [ cutensor ];
+
+      cmakeFlags = prevAttrs.cmakeFlags or [ ] ++ [
+        "-DCUTENSOR_EXAMPLE_BINARY_INSTALL_DIR=${placeholder "out"}/bin"
+      ];
+
+      # CUTENSOR_ROOT is double escaped
+      postPatch = prevAttrs.postPatch or "" + ''
+        substituteInPlace CMakeLists.txt \
+          --replace-fail "\''${CUTENSOR_ROOT}/include" "${lib.getDev cutensor}/include"
+      '';
+
+      CUTENSOR_ROOT = cutensor;
+
+      meta = prevAttrs.meta or { } // {
+        broken = cutensor == null;
+      };
+    }
+  );
+
+  cusparselt = base.overrideAttrs (
+    finalAttrs: prevAttrs: {
+      pname = "cuda-library-samples-cusparselt";
+
+      sourceRoot = "${finalAttrs.src.name}/cuSPARSELt/matmul";
+
+      nativeBuildInputs = prevAttrs.nativeBuildInputs or [ ] ++ [
+        cmake
+        addDriverRunpath
+        (lib.getDev cusparselt)
+        (lib.getDev libcusparse)
+        cuda_nvcc
+        (lib.getDev cuda_cudart) # <cuda_runtime_api.h>
+        cuda_cccl # <nv/target>
+      ];
+
+      postPatch = prevAttrs.postPatch or "" + ''
+        substituteInPlace CMakeLists.txt \
+          --replace-fail "''${CUSPARSELT_ROOT}/lib64/libcusparseLt.so" "${lib.getLib cusparselt}/lib/libcusparseLt.so" \
+          --replace-fail "''${CUSPARSELT_ROOT}/lib64/libcusparseLt_static.a" "${lib.getStatic cusparselt}/lib/libcusparseLt_static.a"
+      '';
+
+      postInstall = prevAttrs.postInstall or "" + ''
+        mkdir -p $out/bin
+        cp matmul_example $out/bin/
+        cp matmul_example_static $out/bin/
+      '';
+
+      CUDA_TOOLKIT_PATH = lib.getLib cudatoolkit;
+      CUSPARSELT_PATH = lib.getLib cusparselt;
+
+      meta = prevAttrs.meta or { } // {
+        broken =
+          # Base dependencies
+          cusparselt == null
+          || libcusparse == null
+          || cuda_nvcc == null
+          || cuda_cudart == null
+          || cuda_cccl == null;
+      };
+    }
+  );
+}
--- a/pkgs/development/cuda-modules/cuda/extension.nix
+++ b/pkgs/development/cuda-modules/cuda/extension.nix
@@ -0,0 +1,70 @@
+{ cudaMajorMinorVersion, lib }:
+let
+  inherit (lib) attrsets modules trivial;
+  redistName = "cuda";
+
+  # Manifest files for CUDA redistributables (aka redist). These can be found at
+  # https://developer.download.nvidia.com/compute/cuda/redist/
+  # Maps a cuda version to the specific version of the manifest.
+  cudaVersionMap = {
+    "12.6" = "12.6.3";
+    "12.8" = "12.8.1";
+    "12.9" = "12.9.1";
+  };
+
+  # Check if the current CUDA version is supported.
+  cudaVersionMappingExists = builtins.hasAttr cudaMajorMinorVersion cudaVersionMap;
+
+  # fullCudaVersion : String
+  fullCudaVersion = cudaVersionMap.${cudaMajorMinorVersion};
+
+  evaluatedModules = modules.evalModules {
+    modules = [
+      ../modules
+      # We need to nest the manifests in a config.cuda.manifests attribute so the
+      # module system can evaluate them.
+      {
+        cuda.manifests = {
+          redistrib = trivial.importJSON (./manifests + "/redistrib_${fullCudaVersion}.json");
+          feature = trivial.importJSON (./manifests + "/feature_${fullCudaVersion}.json");
+        };
+      }
+    ];
+  };
+
+  # Generally we prefer to do things involving getting attribute names with feature_manifest instead
+  # of redistrib_manifest because the feature manifest will have *only* the redist system
+  # names as the keys, whereas the redistrib manifest will also have things like version, name, license,
+  # and license_path.
+  featureManifest = evaluatedModules.config.cuda.manifests.feature;
+  redistribManifest = evaluatedModules.config.cuda.manifests.redistrib;
+
+  # Builder function which builds a single redist package for a given platform.
+  # buildRedistPackage : callPackage -> PackageName -> Derivation
+  buildRedistPackage =
+    callPackage: pname:
+    callPackage ../generic-builders/manifest.nix {
+      inherit pname redistName;
+      # We pass the whole release to the builder because it has logic to handle
+      # the case we're trying to build on an unsupported platform.
+      redistribRelease = redistribManifest.${pname};
+      featureRelease = featureManifest.${pname};
+    };
+
+  # Build all the redist packages given final and prev.
+  redistPackages =
+    final: _prev:
+    # Wrap the whole thing in an optionalAttrs so we can return an empty set if the CUDA version
+    # is not supported.
+    # NOTE: We cannot include the call to optionalAttrs *in* the pipe as we would strictly evaluate the
+    # attrNames before we check if the CUDA version is supported.
+    attrsets.optionalAttrs cudaVersionMappingExists (
+      trivial.pipe featureManifest [
+        # Get all the package names
+        builtins.attrNames
+        # Build the redist packages
+        (trivial.flip attrsets.genAttrs (buildRedistPackage final.callPackage))
+      ]
+    );
+in
+redistPackages
--- a/pkgs/development/cuda-modules/cuda/manifests/feature_12.6.3.json
+++ b/pkgs/development/cuda-modules/cuda/manifests/feature_12.6.3.json
--- a/pkgs/development/cuda-modules/cuda/manifests/feature_12.8.1.json
+++ b/pkgs/development/cuda-modules/cuda/manifests/feature_12.8.1.json
--- a/pkgs/development/cuda-modules/cuda/manifests/feature_12.9.1.json
+++ b/pkgs/development/cuda-modules/cuda/manifests/feature_12.9.1.json
--- a/pkgs/development/cuda-modules/cuda/manifests/redistrib_12.6.3.json
+++ b/pkgs/development/cuda-modules/cuda/manifests/redistrib_12.6.3.json
--- a/pkgs/development/cuda-modules/cuda/manifests/redistrib_12.8.1.json
+++ b/pkgs/development/cuda-modules/cuda/manifests/redistrib_12.8.1.json
--- a/pkgs/development/cuda-modules/cuda/manifests/redistrib_12.9.1.json
+++ b/pkgs/development/cuda-modules/cuda/manifests/redistrib_12.9.1.json
--- a/pkgs/development/cuda-modules/cudatoolkit/redist-wrapper.nix
+++ b/pkgs/development/cuda-modules/cudatoolkit/redist-wrapper.nix
@@ -0,0 +1,80 @@
+{
+  lib,
+  symlinkJoin,
+  backendStdenv,
+  cudaMajorMinorVersion,
+  cuda_cccl ? null,
+  cuda_cudart ? null,
+  cuda_cuobjdump ? null,
+  cuda_cupti ? null,
+  cuda_cuxxfilt ? null,
+  cuda_gdb ? null,
+  cuda_nvcc ? null,
+  cuda_nvdisasm ? null,
+  cuda_nvml_dev ? null,
+  cuda_nvprune ? null,
+  cuda_nvrtc ? null,
+  cuda_nvtx ? null,
+  cuda_profiler_api ? null,
+  cuda_sanitizer_api ? null,
+  libcublas ? null,
+  libcufft ? null,
+  libcurand ? null,
+  libcusolver ? null,
+  libcusparse ? null,
+  libnpp ? null,
+}:
+
+let
+  getAllOutputs = p: [
+    (lib.getBin p)
+    (lib.getLib p)
+    (lib.getDev p)
+  ];
+  hostPackages = [
+    cuda_cuobjdump
+    cuda_gdb
+    cuda_nvcc
+    cuda_nvdisasm
+    cuda_nvprune
+  ];
+  targetPackages = [
+    cuda_cccl
+    cuda_cudart
+    cuda_cupti
+    cuda_cuxxfilt
+    cuda_nvml_dev
+    cuda_nvrtc
+    cuda_nvtx
+    cuda_profiler_api
+    cuda_sanitizer_api
+    libcublas
+    libcufft
+    libcurand
+    libcusolver
+    libcusparse
+    libnpp
+  ];
+
+  # This assumes we put `cudatoolkit` in `buildInputs` instead of `nativeBuildInputs`:
+  allPackages = (map (p: p.__spliced.buildHost or p) hostPackages) ++ targetPackages;
+in
+symlinkJoin rec {
+  name = "cuda-merged-${cudaMajorMinorVersion}";
+  version = cudaMajorMinorVersion;
+
+  paths = builtins.concatMap getAllOutputs allPackages;
+
+  passthru = {
+    cc = lib.warn "cudaPackages.cudatoolkit is deprecated, refer to the manual and use splayed packages instead" backendStdenv.cc;
+    lib = symlinkJoin {
+      inherit name;
+      paths = map (p: lib.getLib p) allPackages;
+    };
+  };
+
+  meta = with lib; {
+    description = "Wrapper substituting the deprecated runfile-based CUDA installation";
+    license = licenses.nvidiaCuda;
+  };
+}
--- a/pkgs/development/cuda-modules/cudnn/releases.nix
+++ b/pkgs/development/cuda-modules/cudnn/releases.nix
@@ -0,0 +1,112 @@
+# NOTE: Check the following URLs for support matrices:
+#       v8 -> https://docs.nvidia.com/deeplearning/cudnn/archives/index.html
+#       v9 -> https://docs.nvidia.com/deeplearning/cudnn/frontend/latest/reference/support-matrix.html
+# Version policy is to keep the latest minor release for each major release.
+#             https://developer.download.nvidia.com/compute/cudnn/redist/
+{
+  cudnn.releases = {
+    # jetson
+    linux-aarch64 = [
+      {
+        version = "8.9.5.30";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-aarch64/cudnn-linux-aarch64-8.9.5.30_cuda12-archive.tar.xz";
+        hash = "sha256-BJH3sC9VwiB362eL8xTB+RdSS9UHz1tlgjm/mKRyM6E=";
+      }
+      {
+        version = "9.7.1.26";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-aarch64/cudnn-linux-aarch64-9.7.1.26_cuda12-archive.tar.xz";
+        hash = "sha256-jDPWAXKOiJYpblPwg5FUSh7F0Dgg59LLnd+pX9y7r1w=";
+      }
+      {
+        version = "9.8.0.87";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-aarch64/cudnn-linux-aarch64-9.8.0.87_cuda12-archive.tar.xz";
+        hash = "sha256-8D7OP/B9FxnwYhiXOoeXzsG+OHzDF7qrW7EY3JiBmec=";
+      }
+    ];
+    # powerpc
+    linux-ppc64le = [ ];
+    # server-grade arm
+    linux-sbsa = [
+      {
+        version = "8.9.7.29";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-8.9.7.29_cuda12-archive.tar.xz";
+        hash = "sha256-6Yt8gAEHheXVygHuTOm1sMjHNYfqb4ZIvjTT+NHUe9E=";
+      }
+      {
+        version = "9.3.0.75";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.6";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.3.0.75_cuda12-archive.tar.xz";
+        hash = "sha256-Eibdm5iciYY4VSlj0ACjz7uKCgy5uvjLCear137X1jk=";
+      }
+      {
+        version = "9.7.1.26";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.7.1.26_cuda12-archive.tar.xz";
+        hash = "sha256-koJFUKlesnWwbJCZhBDhLOBRQOBQjwkFZExlTJ7Xp2Q=";
+      }
+      {
+        version = "9.8.0.87";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.8.0.87_cuda12-archive.tar.xz";
+        hash = "sha256-IvYvR08MuzW+9UCtsdhB2mPJzT33azxOQwEPQ2ss2Fw=";
+      }
+      {
+        version = "9.11.0.98";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.9";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.11.0.98_cuda12-archive.tar.xz";
+        hash = "sha256-X81kUdiKnTt/rLwASB+l4rsV8sptxvhuCysgG8QuzVY=";
+      }
+
+    ];
+    # x86_64
+    linux-x86_64 = [
+      {
+        version = "8.9.7.29";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz";
+        hash = "sha256-R1MzYlx+QqevPKCy91BqEG4wyTsaoAgc2cE++24h47s=";
+      }
+      {
+        version = "9.3.0.75";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.6";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.3.0.75_cuda12-archive.tar.xz";
+        hash = "sha256-PW7xCqBtyTOaR34rBX4IX/hQC73ueeQsfhNlXJ7/LCY=";
+      }
+      {
+        version = "9.7.1.26";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.7.1.26_cuda12-archive.tar.xz";
+        hash = "sha256-EJpeXGvN9Dlub2Pz+GLtLc8W7pPuA03HBKGxG98AwLE=";
+      }
+      {
+        version = "9.8.0.87";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.8.0.87_cuda12-archive.tar.xz";
+        hash = "sha256-MhubM7sSh0BNk9VnLTUvFv6rxLIgrGrguG5LJ/JX3PQ=";
+      }
+      {
+        version = "9.11.0.98";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.9";
+        url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.11.0.98_cuda12-archive.tar.xz";
+        hash = "sha256-tgyPrQH6FSHS5x7TiIe5BHjX8Hs9pJ/WirEYqf7k2kg=";
+      }
+    ];
+  };
+}
--- a/pkgs/development/cuda-modules/cudnn/shims.nix
+++ b/pkgs/development/cuda-modules/cudnn/shims.nix
@@ -0,0 +1,21 @@
+# Shims to mimic the shape of ../modules/generic/manifests/{feature,redistrib}/release.nix
+{
+  package,
+  # redistSystem :: String
+  # String is "unsupported" if the given architecture is unsupported.
+  redistSystem,
+}:
+{
+  featureRelease = {
+    inherit (package) minCudaVersion maxCudaVersion;
+    ${redistSystem}.outputs = {
+      lib = true;
+      static = true;
+      dev = true;
+    };
+  };
+  redistribRelease = {
+    name = "NVIDIA CUDA Deep Neural Network library (cuDNN)";
+    inherit (package) hash url version;
+  };
+}
--- a/pkgs/development/cuda-modules/cusparselt/extension.nix
+++ b/pkgs/development/cuda-modules/cusparselt/extension.nix
@@ -0,0 +1,96 @@
+# Support matrix can be found at
+# https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-880/support-matrix/index.html
+{
+  cudaLib,
+  lib,
+  redistSystem,
+}:
+let
+  inherit (lib)
+    attrsets
+    lists
+    modules
+    trivial
+    ;
+
+  redistName = "cusparselt";
+  pname = "libcusparse_lt";
+
+  cusparseltVersions = [
+    "0.7.1"
+  ];
+
+  # Manifests :: { redistrib, feature }
+
+  # Each release of cusparselt gets mapped to an evaluated module for that release.
+  # From there, we can get the min/max CUDA versions supported by that release.
+  # listOfManifests :: List Manifests
+  listOfManifests =
+    let
+      configEvaluator =
+        fullCusparseltVersion:
+        modules.evalModules {
+          modules = [
+            ../modules
+            # We need to nest the manifests in a config.cusparselt.manifests attribute so the
+            # module system can evaluate them.
+            {
+              cusparselt.manifests = {
+                redistrib = trivial.importJSON (./manifests + "/redistrib_${fullCusparseltVersion}.json");
+                feature = trivial.importJSON (./manifests + "/feature_${fullCusparseltVersion}.json");
+              };
+            }
+          ];
+        };
+      # Un-nest the manifests attribute set.
+      releaseGrabber = evaluatedModules: evaluatedModules.config.cusparselt.manifests;
+    in
+    lists.map (trivial.flip trivial.pipe [
+      configEvaluator
+      releaseGrabber
+    ]) cusparseltVersions;
+
+  # platformIsSupported :: Manifests -> Boolean
+  platformIsSupported =
+    { feature, redistrib, ... }:
+    (attrsets.attrByPath [
+      pname
+      redistSystem
+    ] null feature) != null;
+
+  # TODO(@connorbaker): With an auxiliary file keeping track of the CUDA versions each release supports,
+  # we could filter out releases that don't support our CUDA version.
+  # However, we don't have that currently, so we make a best-effort to try to build TensorRT with whatever
+  # libPath corresponds to our CUDA version.
+  # supportedManifests :: List Manifests
+  supportedManifests = builtins.filter platformIsSupported listOfManifests;
+
+  # Compute versioned attribute name to be used in this package set
+  # Patch version changes should not break the build, so we only use major and minor
+  # computeName :: RedistribRelease -> String
+  computeName =
+    { version, ... }: cudaLib.mkVersionedName redistName (lib.versions.majorMinor version);
+in
+final: _:
+let
+  # buildCusparseltPackage :: Manifests -> AttrSet Derivation
+  buildCusparseltPackage =
+    { redistrib, feature }:
+    let
+      drv = final.callPackage ../generic-builders/manifest.nix {
+        inherit pname redistName;
+        redistribRelease = redistrib.${pname};
+        featureRelease = feature.${pname};
+      };
+    in
+    attrsets.nameValuePair (computeName redistrib.${pname}) drv;
+
+  extension =
+    let
+      nameOfNewest = computeName (lists.last supportedManifests).redistrib.${pname};
+      drvs = builtins.listToAttrs (lists.map buildCusparseltPackage supportedManifests);
+      containsDefault = attrsets.optionalAttrs (drvs != { }) { cusparselt = drvs.${nameOfNewest}; };
+    in
+    drvs // containsDefault;
+in
+extension
--- a/pkgs/development/cuda-modules/cusparselt/manifests/feature_0.7.1.json
+++ b/pkgs/development/cuda-modules/cusparselt/manifests/feature_0.7.1.json
@@ -0,0 +1,44 @@
+{
+  "libcusparse_lt": {
+    "linux-aarch64": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": true,
+        "sample": false,
+        "static": true
+      }
+    },
+    "linux-sbsa": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": true,
+        "sample": false,
+        "static": true
+      }
+    },
+    "linux-x86_64": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": true,
+        "sample": false,
+        "static": true
+      }
+    },
+    "windows-x86_64": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": false,
+        "sample": false,
+        "static": false
+      }
+    }
+  }
+}
--- a/pkgs/development/cuda-modules/cusparselt/manifests/redistrib_0.7.1.json
+++ b/pkgs/development/cuda-modules/cusparselt/manifests/redistrib_0.7.1.json
@@ -0,0 +1,35 @@
+{
+    "release_date": "2025-02-25",
+    "release_label": "0.7.1",
+    "release_product": "cusparselt",
+    "libcusparse_lt": {
+        "name": "NVIDIA cuSPARSELt",
+        "license": "cuSPARSELt",
+        "license_path": "libcusparse_lt/LICENSE.txt",
+        "version": "0.7.1.0",
+        "linux-x86_64": {
+            "relative_path": "libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.7.1.0-archive.tar.xz",
+            "sha256": "a0d885837887c73e466a31b4e86aaae2b7d0cc9c5de0d40921dbe2a15dbd6a88",
+            "md5": "b2e5f3c9b9d69e1e0b55b16de33fdc6e",
+            "size": "353151840"
+        },
+        "linux-sbsa": {
+            "relative_path": "libcusparse_lt/linux-sbsa/libcusparse_lt-linux-sbsa-0.7.1.0-archive.tar.xz",
+            "sha256": "4a131d0a54728e53ba536b50bb65380603456f1656e7df8ee52e285618a0b57c",
+            "md5": "612a712c7da6e801ee773687e99af87e",
+            "size": "352406784"
+        },
+        "windows-x86_64": {
+            "relative_path": "libcusparse_lt/windows-x86_64/libcusparse_lt-windows-x86_64-0.7.1.0-archive.zip",
+            "sha256": "004bcb1b700c24ca8d60a8ddd2124640f61138a6c29914d2afaa0bfa0d0e3cf2",
+            "md5": "a1d8df8dc8ff4b3bd0e859f992f8f392",
+            "size": "268594665"
+        },
+        "linux-aarch64": {
+            "relative_path": "libcusparse_lt/linux-aarch64/libcusparse_lt-linux-aarch64-0.7.1.0-archive.tar.xz",
+            "sha256": "d3b0a660fd552e0bd9a4491b15299d968674833483d5f164cfea35e70646136c",
+            "md5": "54e3f3b28c94118991ce54ec38f531fb",
+            "size": "5494380"
+        }
+    }
+}
--- a/pkgs/development/cuda-modules/cutensor/extension.nix
+++ b/pkgs/development/cuda-modules/cutensor/extension.nix
@@ -0,0 +1,124 @@
+# Support matrix can be found at
+# https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-880/support-matrix/index.html
+#
+# TODO(@connorbaker):
+# This is a very similar strategy to CUDA/CUDNN:
+#
+# - Get all versions supported by the current release of CUDA
+# - Build all of them
+# - Make the newest the default
+#
+# Unique twists:
+#
+# - Instead of providing different releases for each version of CUDA, CuTensor has multiple subdirectories in `lib`
+#   -- one for each version of CUDA.
+{
+  cudaLib,
+  cudaMajorMinorVersion,
+  lib,
+  redistSystem,
+}:
+let
+  inherit (lib)
+    attrsets
+    lists
+    modules
+    versions
+    trivial
+    ;
+
+  redistName = "cutensor";
+  pname = "libcutensor";
+
+  cutensorVersions = [
+    "2.0.2"
+    "2.1.0"
+  ];
+
+  # Manifests :: { redistrib, feature }
+
+  # Each release of cutensor gets mapped to an evaluated module for that release.
+  # From there, we can get the min/max CUDA versions supported by that release.
+  # listOfManifests :: List Manifests
+  listOfManifests =
+    let
+      configEvaluator =
+        fullCutensorVersion:
+        modules.evalModules {
+          modules = [
+            ../modules
+            # We need to nest the manifests in a config.cutensor.manifests attribute so the
+            # module system can evaluate them.
+            {
+              cutensor.manifests = {
+                redistrib = trivial.importJSON (./manifests + "/redistrib_${fullCutensorVersion}.json");
+                feature = trivial.importJSON (./manifests + "/feature_${fullCutensorVersion}.json");
+              };
+            }
+          ];
+        };
+      # Un-nest the manifests attribute set.
+      releaseGrabber = evaluatedModules: evaluatedModules.config.cutensor.manifests;
+    in
+    lists.map (trivial.flip trivial.pipe [
+      configEvaluator
+      releaseGrabber
+    ]) cutensorVersions;
+
+  # Our cudaMajorMinorVersion tells us which version of CUDA we're building against.
+  # The subdirectories in lib/ tell us which versions of CUDA are supported.
+  # Typically the names will look like this:
+  #
+  # - 11
+  # - 12
+
+  # libPath :: String
+  libPath = versions.major cudaMajorMinorVersion;
+
+  # A release is supported if it has a libPath that matches our CUDA version for our platform.
+  # LibPath are not constant across the same release -- one platform may support fewer
+  # CUDA versions than another.
+  # platformIsSupported :: Manifests -> Boolean
+  platformIsSupported =
+    { feature, redistrib, ... }:
+    (attrsets.attrByPath [
+      pname
+      redistSystem
+    ] null feature) != null;
+
+  # TODO(@connorbaker): With an auxiliary file keeping track of the CUDA versions each release supports,
+  # we could filter out releases that don't support our CUDA version.
+  # However, we don't have that currently, so we make a best-effort to try to build TensorRT with whatever
+  # libPath corresponds to our CUDA version.
+  # supportedManifests :: List Manifests
+  supportedManifests = builtins.filter platformIsSupported listOfManifests;
+
+  # Compute versioned attribute name to be used in this package set
+  # Patch version changes should not break the build, so we only use major and minor
+  # computeName :: RedistribRelease -> String
+  computeName =
+    { version, ... }: cudaLib.mkVersionedName redistName (lib.versions.majorMinor version);
+in
+final: _:
+let
+  # buildCutensorPackage :: Manifests -> AttrSet Derivation
+  buildCutensorPackage =
+    { redistrib, feature }:
+    let
+      drv = final.callPackage ../generic-builders/manifest.nix {
+        inherit pname redistName libPath;
+        redistribRelease = redistrib.${pname};
+        featureRelease = feature.${pname};
+      };
+    in
+    attrsets.nameValuePair (computeName redistrib.${pname}) drv;
+
+  extension =
+    let
+      nameOfNewest = computeName (lists.last supportedManifests).redistrib.${pname};
+      drvs = builtins.listToAttrs (lists.map buildCutensorPackage supportedManifests);
+      containsDefault = attrsets.optionalAttrs (drvs != { }) { cutensor = drvs.${nameOfNewest}; };
+    in
+    drvs // containsDefault;
+in
+extension
--- a/pkgs/development/cuda-modules/cutensor/manifests/feature_2.0.2.json
+++ b/pkgs/development/cuda-modules/cutensor/manifests/feature_2.0.2.json
@@ -0,0 +1,44 @@
+{
+  "libcutensor": {
+    "linux-ppc64le": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": true,
+        "sample": false,
+        "static": true
+      }
+    },
+    "linux-sbsa": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": true,
+        "sample": false,
+        "static": true
+      }
+    },
+    "linux-x86_64": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": true,
+        "sample": false,
+        "static": true
+      }
+    },
+    "windows-x86_64": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": false,
+        "sample": false,
+        "static": false
+      }
+    }
+  }
+}
--- a/pkgs/development/cuda-modules/cutensor/manifests/feature_2.1.0.json
+++ b/pkgs/development/cuda-modules/cutensor/manifests/feature_2.1.0.json
@@ -0,0 +1,34 @@
+{
+  "libcutensor": {
+    "linux-sbsa": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": true,
+        "sample": false,
+        "static": true
+      }
+    },
+    "linux-x86_64": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": true,
+        "sample": false,
+        "static": true
+      }
+    },
+    "windows-x86_64": {
+      "outputs": {
+        "bin": false,
+        "dev": true,
+        "doc": false,
+        "lib": false,
+        "sample": false,
+        "static": false
+      }
+    }
+  }
+}
--- a/pkgs/development/cuda-modules/cutensor/manifests/redistrib_2.0.2.json
+++ b/pkgs/development/cuda-modules/cutensor/manifests/redistrib_2.0.2.json
@@ -0,0 +1,35 @@
+{
+    "release_date": "2024-06-24",
+    "release_label": "2.0.2",
+    "release_product": "cutensor",
+    "libcutensor": {
+        "name": "NVIDIA cuTENSOR",
+        "license": "cuTensor",
+        "license_path": "libcutensor/LICENSE.txt",
+        "version": "2.0.2.4",
+        "linux-x86_64": {
+            "relative_path": "libcutensor/linux-x86_64/libcutensor-linux-x86_64-2.0.2.4-archive.tar.xz",
+            "sha256": "957b04ef6343aca404fe5f4a3f1f1d3ac0bd04ceb3acecc93e53f4d63bd91157",
+            "md5": "2b994ecba434e69ee55043cf353e05b4",
+            "size": "545271628"
+        },
+        "linux-ppc64le": {
+            "relative_path": "libcutensor/linux-ppc64le/libcutensor-linux-ppc64le-2.0.2.4-archive.tar.xz",
+            "sha256": "db2c05e231a26fb5efee470e1d8e11cb1187bfe0726b665b87cbbb62a9901ba0",
+            "md5": "6b00e29407452333946744c4084157e8",
+            "size": "543070992"
+        },
+        "linux-sbsa": {
+            "relative_path": "libcutensor/linux-sbsa/libcutensor-linux-sbsa-2.0.2.4-archive.tar.xz",
+            "sha256": "9712b54aa0988074146867f9b6f757bf11a61996f3b58b21e994e920b272301b",
+            "md5": "c9bb31a92626a092d0c7152b8b3eaa18",
+            "size": "540299376"
+        },
+        "windows-x86_64": {
+            "relative_path": "libcutensor/windows-x86_64/libcutensor-windows-x86_64-2.0.2.4-archive.zip",
+            "sha256": "ab2fca16d410863d14f2716cec0d07fb21d20ecd24ee47d309e9970c9c01ed4a",
+            "md5": "f6cfdb29a9a421a1ee4df674dd54028c",
+            "size": "921154033"
+        }
+    }
+}
--- a/pkgs/development/cuda-modules/cutensor/manifests/redistrib_2.1.0.json
+++ b/pkgs/development/cuda-modules/cutensor/manifests/redistrib_2.1.0.json
@@ -0,0 +1,29 @@
+{
+    "release_date": "2025-01-27",
+    "release_label": "2.1.0",
+    "release_product": "cutensor",
+    "libcutensor": {
+        "name": "NVIDIA cuTENSOR",
+        "license": "cuTensor",
+        "license_path": "libcutensor/LICENSE.txt",
+        "version": "2.1.0.9",
+        "linux-x86_64": {
+            "relative_path": "libcutensor/linux-x86_64/libcutensor-linux-x86_64-2.1.0.9-archive.tar.xz",
+            "sha256": "ee59fcb4e8d59fc0d8cebf5f7f23bf2a196a76e6bcdcaa621aedbdcabd20a759",
+            "md5": "ed15120c512dfb3e32b49103850bb9dd",
+            "size": "814871140"
+        },
+        "linux-sbsa": {
+            "relative_path": "libcutensor/linux-sbsa/libcutensor-linux-sbsa-2.1.0.9-archive.tar.xz",
+            "sha256": "cef7819c4ecf3120d4f99b08463b8db1a8591be25147d1688371024885b1d2f0",
+            "md5": "fec00a1a825a05c0166eda6625dc587d",
+            "size": "782008004"
+        },
+        "windows-x86_64": {
+            "relative_path": "libcutensor/windows-x86_64/libcutensor-windows-x86_64-2.1.0.9-archive.zip",
+            "sha256": "ed835ba7fd617000f77e1dff87403d123edf540bd99339e3da2eaab9d32a4040",
+            "md5": "9efcbc0c9c372b0e71e11d4487aa5ffa",
+            "size": "1514752712"
+        }
+    }
+}
--- a/pkgs/development/cuda-modules/generic-builders/manifest.nix
+++ b/pkgs/development/cuda-modules/generic-builders/manifest.nix
@@ -0,0 +1,356 @@
+{
+  # General callPackage-supplied arguments
+  autoAddDriverRunpath,
+  autoAddCudaCompatRunpath,
+  autoPatchelfHook,
+  backendStdenv,
+  callPackage,
+  _cuda,
+  fetchurl,
+  lib,
+  markForCudatoolkitRootHook,
+  flags,
+  stdenv,
+  # Builder-specific arguments
+  # Short package name (e.g., "cuda_cccl")
+  # pname : String
+  pname,
+  # Common name (e.g., "cutensor" or "cudnn") -- used in the URL.
+  # Also known as the Redistributable Name.
+  # redistName : String,
+  redistName,
+  # If libPath is non-null, it must be a subdirectory of `lib`.
+  # The contents of `libPath` will be moved to the root of `lib`.
+  libPath ? null,
+  # See ./modules/generic/manifests/redistrib/release.nix
+  redistribRelease,
+  # See ./modules/generic/manifests/feature/release.nix
+  featureRelease,
+  cudaMajorMinorVersion,
+}:
+let
+  inherit (lib)
+    attrsets
+    lists
+    strings
+    trivial
+    licenses
+    teams
+    sourceTypes
+    ;
+
+  inherit (stdenv) hostPlatform;
+
+  # Last step before returning control to `callPackage` (adds the `.override` method)
+  # we'll apply (`overrideAttrs`) necessary package-specific "fixup" functions.
+  # Order is significant.
+  maybeFixup = _cuda.fixups.${pname} or null;
+  fixup = if maybeFixup != null then callPackage maybeFixup { } else { };
+
+  # Get the redist systems for which package provides distributables.
+  # These are used by meta.platforms.
+  supportedRedistSystems = builtins.attrNames featureRelease;
+  # redistSystem :: String
+  # The redistSystem is the name of the system for which the redistributable is built.
+  # It is `"unsupported"` if the redistributable is not supported on the target system.
+  redistSystem = _cuda.lib.getRedistSystem backendStdenv.hasJetsonCudaCapability hostPlatform.system;
+
+  sourceMatchesHost = lib.elem hostPlatform.system (_cuda.lib.getNixSystems redistSystem);
+in
+(backendStdenv.mkDerivation (finalAttrs: {
+  # NOTE: Even though there's no actual buildPhase going on here, the derivations of the
+  # redistributables are sensitive to the compiler flags provided to stdenv. The patchelf package
+  # is sensitive to the compiler flags provided to stdenv, and we depend on it. As such, we are
+  # also sensitive to the compiler flags provided to stdenv.
+  inherit pname;
+  inherit (redistribRelease) version;
+
+  # Don't force serialization to string for structured attributes, like outputToPatterns
+  # and brokenConditions.
+  # Avoids "set cannot be coerced to string" errors.
+  __structuredAttrs = true;
+
+  # Keep better track of dependencies.
+  strictDeps = true;
+
+  # NOTE: Outputs are evaluated jointly with meta, so in the case that this is an unsupported platform,
+  # we still need to provide a list of outputs.
+  outputs =
+    let
+      # Checks whether the redistributable provides an output.
+      hasOutput =
+        output:
+        attrsets.attrByPath [
+          redistSystem
+          "outputs"
+          output
+        ] false featureRelease;
+      # Order is important here so we use a list.
+      possibleOutputs = [
+        "bin"
+        "lib"
+        "static"
+        "dev"
+        "doc"
+        "sample"
+        "python"
+      ];
+      # Filter out outputs that don't exist in the redistributable.
+      # NOTE: In the case the redistributable isn't supported on the target platform,
+      # we will have `outputs = [ "out" ] ++ possibleOutputs`. This is of note because platforms which
+      # aren't supported would otherwise have evaluation errors when trying to access outputs other than `out`.
+      # The alternative would be to have `outputs = [ "out" ]` when`redistSystem = "unsupported"`, but that would
+      # require adding guards throughout the entirety of the CUDA package set to ensure `cudaSupport` is true --
+      # recall that OfBorg will evaluate packages marked as broken and that `cudaPackages` will be evaluated with
+      # `cudaSupport = false`!
+      additionalOutputs =
+        if redistSystem == "unsupported" then
+          possibleOutputs
+        else
+          builtins.filter hasOutput possibleOutputs;
+      # The out output is special -- it's the default output and we always include it.
+      outputs = [ "out" ] ++ additionalOutputs;
+    in
+    outputs;
+
+  # Traversed in the order of the outputs specified in outputs;
+  # entries are skipped if they don't exist in outputs.
+  outputToPatterns = {
+    bin = [ "bin" ];
+    dev = [
+      "share/pkgconfig"
+      "**/*.pc"
+      "**/*.cmake"
+    ];
+    lib = [
+      "lib"
+      "lib64"
+    ];
+    static = [ "**/*.a" ];
+    sample = [ "samples" ];
+    python = [ "**/*.whl" ];
+  };
+
+  # Useful for introspecting why something went wrong. Maps descriptions of why the derivation would be marked as
+  # broken on have badPlatforms include the current platform.
+
+  # brokenConditions :: AttrSet Bool
+  # Sets `meta.broken = true` if any of the conditions are true.
+  # Example: Broken on a specific version of CUDA or when a dependency has a specific version.
+  brokenConditions = {
+    # Unclear how this is handled by Nix internals.
+    "Duplicate entries in outputs" = finalAttrs.outputs != lists.unique finalAttrs.outputs;
+    # Typically this results in the static output being empty, as all libraries are moved
+    # back to the lib output.
+    "lib output follows static output" =
+      let
+        libIndex = lists.findFirstIndex (x: x == "lib") null finalAttrs.outputs;
+        staticIndex = lists.findFirstIndex (x: x == "static") null finalAttrs.outputs;
+      in
+      libIndex != null && staticIndex != null && libIndex > staticIndex;
+  };
+
+  # badPlatformsConditions :: AttrSet Bool
+  # Sets `meta.badPlatforms = meta.platforms` if any of the conditions are true.
+  # Example: Broken on a specific architecture when some condition is met (like targeting Jetson).
+  badPlatformsConditions = {
+    "No source" = !sourceMatchesHost;
+  };
+
+  # src :: Optional Derivation
+  # If redistSystem doesn't exist in redistribRelease, return null.
+  src = trivial.mapNullable (
+    { relative_path, sha256, ... }:
+    fetchurl {
+      url = "https://developer.download.nvidia.com/compute/${redistName}/redist/${relative_path}";
+      inherit sha256;
+    }
+  ) (redistribRelease.${redistSystem} or null);
+
+  postPatch =
+    # Pkg-config's setup hook expects configuration files in $out/share/pkgconfig
+    ''
+      for path in pkg-config pkgconfig; do
+        [[ -d "$path" ]] || continue
+        mkdir -p share/pkgconfig
+        mv "$path"/* share/pkgconfig/
+        rmdir "$path"
+      done
+    ''
+    # Rewrite FHS paths with store paths
+    # NOTE: output* fall back to out if the corresponding output isn't defined.
+    + ''
+      for pc in share/pkgconfig/*.pc; do
+        sed -i \
+          -e "s|^cudaroot\s*=.*\$|cudaroot=''${!outputDev}|" \
+          -e "s|^libdir\s*=.*/lib\$|libdir=''${!outputLib}/lib|" \
+          -e "s|^includedir\s*=.*/include\$|includedir=''${!outputDev}/include|" \
+          "$pc"
+      done
+    ''
+    # Generate unversioned names.
+    # E.g. cuda-11.8.pc -> cuda.pc
+    + ''
+      for pc in share/pkgconfig/*-"$majorMinorVersion.pc"; do
+        ln -s "$(basename "$pc")" "''${pc%-$majorMinorVersion.pc}".pc
+      done
+    '';
+
+  env.majorMinorVersion = cudaMajorMinorVersion;
+
+  # We do need some other phases, like configurePhase, so the multiple-output setup hook works.
+  dontBuild = true;
+
+  nativeBuildInputs = [
+    autoPatchelfHook
+    # This hook will make sure libcuda can be found
+    # in typically /lib/opengl-driver by adding that
+    # directory to the rpath of all ELF binaries.
+    # Check e.g. with `patchelf --print-rpath path/to/my/binary
+    autoAddDriverRunpath
+    markForCudatoolkitRootHook
+  ]
+  # autoAddCudaCompatRunpath depends on cuda_compat and would cause
+  # infinite recursion if applied to `cuda_compat` itself (beside the fact
+  # that it doesn't make sense in the first place)
+  ++ lib.optionals (pname != "cuda_compat" && flags.isJetsonBuild) [
+    # autoAddCudaCompatRunpath must appear AFTER autoAddDriverRunpath.
+    # See its documentation in ./setup-hooks/extension.nix.
+    autoAddCudaCompatRunpath
+  ];
+
+  buildInputs = [
+    # autoPatchelfHook will search for a libstdc++ and we're giving it
+    # one that is compatible with the rest of nixpkgs, even when
+    # nvcc forces us to use an older gcc
+    # NB: We don't actually know if this is the right thing to do
+    (lib.getLib stdenv.cc.cc)
+  ];
+
+  # Picked up by autoPatchelf
+  # Needed e.g. for libnvrtc to locate (dlopen) libnvrtc-builtins
+  appendRunpaths = [ "$ORIGIN" ];
+
+  # NOTE: We don't need to check for dev or doc, because those outputs are handled by
+  # the multiple-outputs setup hook.
+  # NOTE: moveToOutput operates on all outputs:
+  # https://github.com/NixOS/nixpkgs/blob/2920b6fc16a9ed5d51429e94238b28306ceda79e/pkgs/build-support/setup-hooks/multiple-outputs.sh#L105-L107
+  installPhase =
+    let
+      mkMoveToOutputCommand =
+        output:
+        let
+          template = pattern: ''moveToOutput "${pattern}" "${"$" + output}"'';
+          patterns = finalAttrs.outputToPatterns.${output} or [ ];
+        in
+        strings.concatMapStringsSep "\n" template patterns;
+    in
+    # Pre-install hook
+    ''
+      runHook preInstall
+    ''
+    # Handle the existence of libPath, which requires us to re-arrange the lib directory
+    + strings.optionalString (libPath != null) ''
+      full_lib_path="lib/${libPath}"
+      if [[ ! -d "$full_lib_path" ]]; then
+        echo "${finalAttrs.pname}: '$full_lib_path' does not exist, only found:" >&2
+        find lib/ -mindepth 1 -maxdepth 1 >&2
+        echo "This release might not support your CUDA version" >&2
+        exit 1
+      fi
+      echo "Making libPath '$full_lib_path' the root of lib" >&2
+      mv "$full_lib_path" lib_new
+      rm -r lib
+      mv lib_new lib
+    ''
+    # Create the primary output, out, and move the other outputs into it.
+    + ''
+      mkdir -p "$out"
+      mv * "$out"
+    ''
+    # Move the outputs into their respective outputs.
+    + strings.concatMapStringsSep "\n" mkMoveToOutputCommand (builtins.tail finalAttrs.outputs)
+    # Add a newline to the end of the installPhase, so that the post-install hook doesn't
+    # get concatenated with the last moveToOutput command.
+    + "\n"
+    # Post-install hook
+    + ''
+      runHook postInstall
+    '';
+
+  doInstallCheck = true;
+  allowFHSReferences = true; # TODO: Default to `false`
+  postInstallCheck = ''
+    echo "Executing postInstallCheck"
+
+    if [[ -z "''${allowFHSReferences-}" ]]; then
+      mapfile -t outputPaths < <(for o in $(getAllOutputNames); do echo "''${!o}"; done)
+      if grep --max-count=5 --recursive --exclude=LICENSE /usr/ "''${outputPaths[@]}"; then
+        echo "Detected references to /usr" >&2
+        exit 1
+      fi
+    fi
+  '';
+
+  # libcuda needs to be resolved during runtime
+  autoPatchelfIgnoreMissingDeps = [
+    "libcuda.so"
+    "libcuda.so.*"
+  ];
+
+  # _multioutPropagateDev() currently expects a space-separated string rather than an array
+  preFixup = ''
+    export propagatedBuildOutputs="''${propagatedBuildOutputs[@]}"
+  '';
+
+  # Propagate all outputs, including `static`
+  propagatedBuildOutputs = builtins.filter (x: x != "dev") finalAttrs.outputs;
+
+  # Kept in case overrides assume postPhases have already been defined
+  postPhases = [ "postPatchelf" ];
+  postPatchelf = ''
+    true
+  '';
+
+  passthru = {
+    # Provide access to the release information for fixup functions.
+    inherit redistribRelease featureRelease;
+    # Make the CUDA-patched stdenv available
+    stdenv = backendStdenv;
+  };
+
+  meta = {
+    description = "${redistribRelease.name}. By downloading and using the packages you accept the terms and conditions of the ${finalAttrs.meta.license.shortName}";
+    sourceProvenance = [ sourceTypes.binaryNativeCode ];
+    broken = lists.any trivial.id (attrsets.attrValues finalAttrs.brokenConditions);
+    platforms = trivial.pipe supportedRedistSystems [
+      # Map each redist system to the equivalent nix systems.
+      (lib.concatMap _cuda.lib.getNixSystems)
+      # Take all the unique values.
+      lib.unique
+      # Sort the list.
+      lib.naturalSort
+    ];
+    badPlatforms =
+      let
+        isBadPlatform = lists.any trivial.id (attrsets.attrValues finalAttrs.badPlatformsConditions);
+      in
+      lists.optionals isBadPlatform finalAttrs.meta.platforms;
+    license =
+      if redistName == "cuda" then
+        # Add the package-specific license.
+        let
+          licensePath =
+            if redistribRelease.license_path != null then
+              redistribRelease.license_path
+            else
+              "${pname}/LICENSE.txt";
+          url = "https://developer.download.nvidia.com/compute/cuda/redist/${licensePath}";
+        in
+        lib.licenses.nvidiaCudaRedist // { inherit url; }
+      else
+        licenses.unfree;
+    teams = [ teams.cuda ];
+  };
+})).overrideAttrs
+  fixup
--- a/pkgs/development/cuda-modules/generic-builders/multiplex.nix
+++ b/pkgs/development/cuda-modules/generic-builders/multiplex.nix
@@ -0,0 +1,130 @@
+{
+  lib,
+  cudaLib,
+  cudaMajorMinorVersion,
+  redistSystem,
+  stdenv,
+  # Builder-specific arguments
+  # Short package name (e.g., "cuda_cccl")
+  # pname : String
+  pname,
+  # Common name (e.g., "cutensor" or "cudnn") -- used in the URL.
+  # Also known as the Redistributable Name.
+  # redistName : String,
+  redistName,
+  # releasesModule :: Path
+  # A path to a module which provides a `releases` attribute
+  releasesModule,
+  # shims :: Path
+  # A path to a module which provides a `shims` attribute
+  # The redistribRelease is only used in ./manifest.nix for the package version
+  # and the package description (which NVIDIA's manifest calls the "name").
+  # It's also used for fetching the source, but we override that since we can't
+  # re-use that portion of the functionality (different URLs, etc.).
+  # The featureRelease is used to populate meta.platforms (by way of looking at the attribute names), determine the
+  # outputs of the package, and provide additional package-specific constraints (e.g., min/max supported CUDA versions,
+  # required versions of other packages, etc.).
+  # shimFn :: {package, redistSystem} -> AttrSet
+  shimsFn ? (throw "shimsFn must be provided"),
+}:
+let
+  evaluatedModules = lib.modules.evalModules {
+    modules = [
+      ../modules
+      releasesModule
+    ];
+  };
+
+  # NOTE: Important types:
+  # - Releases: ../modules/${pname}/releases/releases.nix
+  # - Package: ../modules/${pname}/releases/package.nix
+
+  # Check whether a package supports our CUDA version.
+  # satisfiesCudaVersion :: Package -> Bool
+  satisfiesCudaVersion =
+    package:
+    lib.versionAtLeast cudaMajorMinorVersion package.minCudaVersion
+    && lib.versionAtLeast package.maxCudaVersion cudaMajorMinorVersion;
+
+  # FIXME: do this at the module system level
+  propagatePlatforms = lib.mapAttrs (redistSystem: lib.map (p: { inherit redistSystem; } // p));
+
+  # Releases for all platforms and all CUDA versions.
+  allReleases = propagatePlatforms evaluatedModules.config.${pname}.releases;
+
+  # Releases for all platforms and our CUDA version.
+  allReleases' = lib.mapAttrs (_: lib.filter satisfiesCudaVersion) allReleases;
+
+  # Packages for all platforms and our CUDA versions.
+  allPackages = lib.concatLists (lib.attrValues allReleases');
+
+  packageOlder = p1: p2: lib.versionOlder p1.version p2.version;
+  packageSupportedPlatform = p: p.redistSystem == redistSystem;
+
+  # Compute versioned attribute name to be used in this package set
+  # Patch version changes should not break the build, so we only use major and minor
+  # computeName :: Package -> String
+  computeName = { version, ... }: cudaLib.mkVersionedName pname (lib.versions.majorMinor version);
+
+  # The newest package for each major-minor version, with newest first.
+  # newestPackages :: List Package
+  newestPackages =
+    let
+      newestForEachMajorMinorVersion = lib.foldl' (
+        newestPackages: package:
+        let
+          majorMinorVersion = lib.versions.majorMinor package.version;
+          existingPackage = newestPackages.${majorMinorVersion} or null;
+        in
+        newestPackages
+        // {
+          ${majorMinorVersion} =
+            # Only keep the existing package if it is newer than the one we are considering or it is supported on the
+            # current platform and the one we are considering is not.
+            if
+              existingPackage != null
+              && (
+                packageOlder package existingPackage
+                || (!packageSupportedPlatform package && packageSupportedPlatform existingPackage)
+              )
+            then
+              existingPackage
+            else
+              package;
+        }
+      ) { } allPackages;
+    in
+    # Sort the packages by version so the newest is first.
+    # NOTE: builtins.sort requires a strict weak ordering, so we must use versionOlder rather than versionAtLeast.
+    # See https://github.com/NixOS/nixpkgs/commit/9fd753ea84e5035b357a275324e7fd7ccfb1fc77.
+    lib.sort (lib.flip packageOlder) (lib.attrValues newestForEachMajorMinorVersion);
+
+  extension =
+    final: _:
+    let
+      # Builds our package into derivation and wraps it in a nameValuePair, where the name is the versioned name
+      # of the package.
+      buildPackage =
+        package:
+        let
+          shims = final.callPackage shimsFn { inherit package redistSystem; };
+          name = computeName package;
+          drv = final.callPackage ./manifest.nix {
+            inherit pname redistName;
+            inherit (shims) redistribRelease featureRelease;
+          };
+        in
+        lib.nameValuePair name drv;
+
+      # versionedDerivations :: AttrSet Derivation
+      versionedDerivations = builtins.listToAttrs (lib.map buildPackage newestPackages);
+
+      defaultDerivation = {
+        ${pname} = (buildPackage (lib.head newestPackages)).value;
+      };
+    in
+    # NOTE: Must condition on the length of newestPackages to avoid non-total function lib.head aborting if
+    # newestPackages is empty.
+    lib.optionalAttrs (lib.length newestPackages > 0) (versionedDerivations // defaultDerivation);
+in
+extension
--- a/pkgs/development/cuda-modules/modules/README.md
+++ b/pkgs/development/cuda-modules/modules/README.md
@@ -0,0 +1,56 @@
+# Modules
+
+Modules as they are used in `modules` exist primarily to check the shape and
+content of CUDA redistributable and feature manifests. They are ultimately meant
+to reduce the repetitive nature of repackaging CUDA redistributables.
+
+Building most redistributables follows a pattern of a manifest indicating which
+packages are available at a location, their versions, and their hashes. To avoid
+creating builders for each and every derivation, modules serve as a way for us
+to use a single `genericManifestBuilder` to build all redistributables.
+
+## `generic`
+
+The modules in `generic` are reusable components meant to check the shape and
+content of NVIDIA's CUDA redistributable manifests, our feature manifests (which
+are derived from NVIDIA's manifests), or hand-crafted Nix expressions describing
+available packages. They are used by the `genericManifestBuilder` to build CUDA
+redistributables.
+
+Generally, each package which relies on manifests or Nix release expressions
+will create an alias to the relevant generic module. For example, the [module
+for CUDNN](./cudnn/default.nix) aliases the generic module for release
+expressions, while the [module for CUDA redistributables](./cuda/default.nix)
+aliases the generic module for manifests.
+
+Alternatively, additional fields or values may need to be configured to account
+for the particulars of a package. For example, while the release expressions for
+[CUDNN](../cudnn/releases.nix) and [TensorRT](../tensorrt/releases.nix) are very
+close, they differ slightly in the fields they have. The [module for
+CUDNN](./cudnn/default.nix) is able to use the generic module for
+release expressions, while the [module for
+TensorRT](./tensorrt/default.nix) must add additional fields to the
+generic module.
+
+### `manifests`
+
+The modules in `generic/manifests` define the structure of NVIDIA's CUDA
+redistributable manifests and our feature manifests.
+
+NVIDIA's redistributable manifests are retrieved from their web server, while
+the feature manifests are produced by
+[`cuda-redist-find-features`](https://github.com/connorbaker/cuda-redist-find-features).
+
+### `releases`
+
+The modules in `generic/releases` define the structure of our hand-crafted Nix
+expressions containing information necessary to download and repackage CUDA
+redistributables. These expressions are created when NVIDIA-provided manifests
+are unavailable or otherwise unusable. For example, though CUDNN has manifests,
+a bug in NVIDIA's CI/CD causes manifests for different versions of CUDA to use
+the same name, which leads to the manifests overwriting each other.
+
+### `types`
+
+The modules in `generic/types` define reusable types used in both
+`generic/manifests` and `generic/releases`.
--- a/pkgs/development/cuda-modules/modules/cuda/default.nix
+++ b/pkgs/development/cuda-modules/modules/cuda/default.nix
@@ -0,0 +1,4 @@
+{ options, ... }:
+{
+  options.cuda.manifests = options.generic.manifests;
+}
--- a/pkgs/development/cuda-modules/modules/cudnn/default.nix
+++ b/pkgs/development/cuda-modules/modules/cudnn/default.nix
@@ -0,0 +1,12 @@
+{ options, ... }:
+{
+  options.cudnn.releases = options.generic.releases;
+  # TODO(@connorbaker): Figure out how to add additional options to the
+  # to the generic release.
+  # {
+  #   url = options.mkOption {
+  #     description = "URL to download the tarball from";
+  #     type = types.str;
+  #   };
+  # }
+}
--- a/pkgs/development/cuda-modules/modules/cusparselt/default.nix
+++ b/pkgs/development/cuda-modules/modules/cusparselt/default.nix
@@ -0,0 +1,4 @@
+{ options, ... }:
+{
+  options.cusparselt.manifests = options.generic.manifests;
+}
--- a/pkgs/development/cuda-modules/modules/cutensor/default.nix
+++ b/pkgs/development/cuda-modules/modules/cutensor/default.nix
@@ -0,0 +1,4 @@
+{ options, ... }:
+{
+  options.cutensor.manifests = options.generic.manifests;
+}
--- a/pkgs/development/cuda-modules/modules/default.nix
+++ b/pkgs/development/cuda-modules/modules/default.nix
@@ -0,0 +1,11 @@
+{
+  imports = [
+    ./generic
+    # Always after generic
+    ./cuda
+    ./cudnn
+    ./cusparselt
+    ./cutensor
+    ./tensorrt
+  ];
+}
--- a/pkgs/development/cuda-modules/modules/generic/default.nix
+++ b/pkgs/development/cuda-modules/modules/generic/default.nix
@@ -0,0 +1,7 @@
+{
+  imports = [
+    ./types
+    ./manifests
+    ./releases
+  ];
+}
--- a/pkgs/development/cuda-modules/modules/generic/manifests/default.nix
+++ b/pkgs/development/cuda-modules/modules/generic/manifests/default.nix
@@ -0,0 +1,7 @@
+{ lib, config, ... }:
+{
+  options.generic.manifests = {
+    feature = import ./feature/manifest.nix { inherit lib config; };
+    redistrib = import ./redistrib/manifest.nix { inherit lib; };
+  };
+}
--- a/pkgs/development/cuda-modules/modules/generic/manifests/feature/manifest.nix
+++ b/pkgs/development/cuda-modules/modules/generic/manifests/feature/manifest.nix
@@ -0,0 +1,10 @@
+{ lib, config, ... }:
+let
+  inherit (lib) options trivial types;
+  Release = import ./release.nix { inherit lib config; };
+in
+options.mkOption {
+  description = "Feature manifest is an attribute set which includes a mapping from package name to release";
+  example = trivial.importJSON ../../../../cuda/manifests/feature_11.8.0.json;
+  type = types.attrsOf Release.type;
+}
--- a/pkgs/development/cuda-modules/modules/generic/manifests/feature/outputs.nix
+++ b/pkgs/development/cuda-modules/modules/generic/manifests/feature/outputs.nix
@@ -0,0 +1,60 @@
+{ lib, ... }:
+let
+  inherit (lib) options types;
+in
+# https://github.com/ConnorBaker/cuda-redist-find-features/blob/603407bea2fab47f2dfcd88431122a505af95b42/cuda_redist_find_features/manifest/feature/package/package.py
+options.mkOption {
+  description = "Set of outputs that a package can provide";
+  example = {
+    bin = true;
+    dev = true;
+    doc = false;
+    lib = false;
+    sample = false;
+    static = false;
+  };
+  type = types.submodule {
+    options = {
+      bin = options.mkOption {
+        description = "`bin` output requires that we have a non-empty `bin` directory containing at least one file with the executable bit set";
+        type = types.bool;
+      };
+      dev = options.mkOption {
+        description = ''
+          A `dev` output requires that we have at least one of the following non-empty directories:
+
+          - `include`
+          - `lib/pkgconfig`
+          - `share/pkgconfig`
+          - `lib/cmake`
+          - `share/aclocal`
+        '';
+        type = types.bool;
+      };
+      doc = options.mkOption {
+        description = ''
+          A `doc` output requires that we have at least one of the following non-empty directories:
+
+          - `share/info`
+          - `share/doc`
+          - `share/gtk-doc`
+          - `share/devhelp`
+          - `share/man`
+        '';
+        type = types.bool;
+      };
+      lib = options.mkOption {
+        description = "`lib` output requires that we have a non-empty lib directory containing at least one shared library";
+        type = types.bool;
+      };
+      sample = options.mkOption {
+        description = "`sample` output requires that we have a non-empty `samples` directory";
+        type = types.bool;
+      };
+      static = options.mkOption {
+        description = "`static` output requires that we have a non-empty lib directory containing at least one static library";
+        type = types.bool;
+      };
+    };
+  };
+}
--- a/pkgs/development/cuda-modules/modules/generic/manifests/feature/package.nix
+++ b/pkgs/development/cuda-modules/modules/generic/manifests/feature/package.nix
@@ -0,0 +1,10 @@
+{ lib, ... }:
+let
+  inherit (lib) options types;
+  Outputs = import ./outputs.nix { inherit lib; };
+in
+options.mkOption {
+  description = "Package in the manifest";
+  example = (import ./release.nix { inherit lib; }).linux-x86_64;
+  type = types.submodule { options.outputs = Outputs; };
+}
--- a/pkgs/development/cuda-modules/modules/generic/manifests/feature/release.nix
+++ b/pkgs/development/cuda-modules/modules/generic/manifests/feature/release.nix
@@ -0,0 +1,10 @@
+{ lib, config, ... }:
+let
+  inherit (lib) options types;
+  Package = import ./package.nix { inherit lib config; };
+in
+options.mkOption {
+  description = "Release is an attribute set which includes a mapping from platform to package";
+  example = (import ./manifest.nix { inherit lib; }).cuda_cccl;
+  type = types.attrsOf Package.type;
+}
--- a/pkgs/development/cuda-modules/modules/generic/manifests/redistrib/manifest.nix
+++ b/pkgs/development/cuda-modules/modules/generic/manifests/redistrib/manifest.nix
@@ -0,0 +1,33 @@
+{ lib, ... }:
+let
+  inherit (lib) options trivial types;
+  Release = import ./release.nix { inherit lib; };
+in
+options.mkOption {
+  description = "Redistributable manifest is an attribute set which includes a mapping from package name to release";
+  example = trivial.importJSON ../../../../cuda/manifests/redistrib_11.8.0.json;
+  type = types.submodule {
+    # Allow any attribute name as these will be the package names
+    freeformType = types.attrsOf Release.type;
+    options = {
+      release_date = options.mkOption {
+        description = "Release date of the manifest";
+        type = types.nullOr types.str;
+        default = null;
+        example = "2023-08-29";
+      };
+      release_label = options.mkOption {
+        description = "Release label of the manifest";
+        type = types.nullOr types.str;
+        default = null;
+        example = "12.2.2";
+      };
+      release_product = options.mkOption {
+        example = "cuda";
+        description = "Release product of the manifest";
+        type = types.nullOr types.str;
+        default = null;
+      };
+    };
+  };
+}
--- a/pkgs/development/cuda-modules/modules/generic/manifests/redistrib/package.nix
+++ b/pkgs/development/cuda-modules/modules/generic/manifests/redistrib/package.nix
@@ -0,0 +1,32 @@
+{ lib, ... }:
+let
+  inherit (lib) options types;
+in
+options.mkOption {
+  description = "Package in the manifest";
+  example = (import ./release.nix { inherit lib; }).linux-x86_64;
+  type = types.submodule {
+    options = {
+      relative_path = options.mkOption {
+        description = "Relative path to the package";
+        example = "cuda_cccl/linux-x86_64/cuda_cccl-linux-x86_64-11.5.62-archive.tar.xz";
+        type = types.str;
+      };
+      sha256 = options.mkOption {
+        description = "Sha256 hash of the package";
+        example = "bbe633d6603d5a96a214dcb9f3f6f6fd2fa04d62e53694af97ae0c7afe0121b0";
+        type = types.str;
+      };
+      md5 = options.mkOption {
+        description = "Md5 hash of the package";
+        example = "e5deef4f6cb71f14aac5be5d5745dafe";
+        type = types.str;
+      };
+      size = options.mkOption {
+        description = "Size of the package as a string";
+        type = types.str;
+        example = "960968";
+      };
+    };
+  };
+}
--- a/pkgs/development/cuda-modules/modules/generic/manifests/redistrib/release.nix
+++ b/pkgs/development/cuda-modules/modules/generic/manifests/redistrib/release.nix
@@ -0,0 +1,36 @@
+{ lib, ... }:
+let
+  inherit (lib) options types;
+  Package = import ./package.nix { inherit lib; };
+in
+options.mkOption {
+  description = "Release is an attribute set which includes a mapping from platform to package";
+  example = (import ./manifest.nix { inherit lib; }).cuda_cccl;
+  type = types.submodule {
+    # Allow any attribute name as these will be the platform names
+    freeformType = types.attrsOf Package.type;
+    options = {
+      name = options.mkOption {
+        description = "Full name of the package";
+        example = "CXX Core Compute Libraries";
+        type = types.str;
+      };
+      license = options.mkOption {
+        description = "License of the package";
+        example = "CUDA Toolkit";
+        type = types.str;
+      };
+      license_path = options.mkOption {
+        description = "Path to the license of the package";
+        example = "cuda_cccl/LICENSE.txt";
+        default = null;
+        type = types.nullOr types.str;
+      };
+      version = options.mkOption {
+        description = "Version of the package";
+        example = "11.5.62";
+        type = types.str;
+      };
+    };
+  };
+}
--- a/pkgs/development/cuda-modules/modules/generic/releases/default.nix
+++ b/pkgs/development/cuda-modules/modules/generic/releases/default.nix
@@ -0,0 +1,45 @@
+{ lib, config, ... }:
+let
+  inherit (config.generic.types) majorMinorVersion majorMinorPatchBuildVersion;
+  inherit (lib) options types;
+in
+{
+  options.generic.releases = options.mkOption {
+    description = "Collection of packages targeting different platforms";
+    type =
+      let
+        Package = options.mkOption {
+          description = "Package for a specific platform";
+          example = {
+            version = "8.0.3.4";
+            minCudaVersion = "10.2";
+            maxCudaVersion = "10.2";
+            hash = "sha256-LxcXgwe1OCRfwDsEsNLIkeNsOcx3KuF5Sj+g2dY6WD0=";
+          };
+          type = types.submodule {
+            # TODO(@connorbaker): Figure out how to extend option sets.
+            freeformType = types.attrsOf types.anything;
+            options = {
+              version = options.mkOption {
+                description = "Version of the package";
+                type = majorMinorPatchBuildVersion;
+              };
+              minCudaVersion = options.mkOption {
+                description = "Minimum CUDA version supported";
+                type = majorMinorVersion;
+              };
+              maxCudaVersion = options.mkOption {
+                description = "Maximum CUDA version supported";
+                type = majorMinorVersion;
+              };
+              hash = options.mkOption {
+                description = "Hash of the tarball";
+                type = types.str;
+              };
+            };
+          };
+        };
+      in
+      types.attrsOf (types.listOf Package.type);
+  };
+}
--- a/pkgs/development/cuda-modules/modules/generic/types/default.nix
+++ b/pkgs/development/cuda-modules/modules/generic/types/default.nix
@@ -0,0 +1,39 @@
+{ lib, ... }:
+let
+  inherit (lib) options types;
+in
+{
+  options.generic.types = options.mkOption {
+    type = types.attrsOf types.optionType;
+    default = { };
+    description = "Set of generic types";
+  };
+  config.generic.types = {
+    cudaArch = types.strMatching "^sm_[[:digit:]]+[a-z]?$" // {
+      name = "cudaArch";
+      description = "CUDA architecture name";
+    };
+    # https://github.com/ConnorBaker/cuda-redist-find-features/blob/c841980e146f8664bbcd0ba1399e486b7910617b/cuda_redist_find_features/types/_lib_so_name.py
+    libSoName = types.strMatching ".*\\.so(\\.[[:digit:]]+)*$" // {
+      name = "libSoName";
+      description = "Name of a shared object file";
+    };
+
+    majorMinorVersion = types.strMatching "^([[:digit:]]+)\\.([[:digit:]]+)$" // {
+      name = "majorMinorVersion";
+      description = "Version number with a major and minor component";
+    };
+
+    majorMinorPatchVersion = types.strMatching "^([[:digit:]]+)\\.([[:digit:]]+)\\.([[:digit:]]+)$" // {
+      name = "majorMinorPatchVersion";
+      description = "Version number with a major, minor, and patch component";
+    };
+
+    majorMinorPatchBuildVersion =
+      types.strMatching "^([[:digit:]]+)\\.([[:digit:]]+)\\.([[:digit:]]+)\\.([[:digit:]]+)$"
+      // {
+        name = "majorMinorPatchBuildVersion";
+        description = "Version number with a major, minor, patch, and build component";
+      };
+  };
+}
--- a/pkgs/development/cuda-modules/modules/tensorrt/default.nix
+++ b/pkgs/development/cuda-modules/modules/tensorrt/default.nix
@@ -0,0 +1,16 @@
+{ options, ... }:
+{
+  options.tensorrt.releases = options.generic.releases;
+  # TODO(@connorbaker): Figure out how to add additional options to the
+  # to the generic release.
+  # {
+  #   cudnnVersion = lib.options.mkOption {
+  #     description = "CUDNN version supported";
+  #     type = types.nullOr majorMinorVersion;
+  #   };
+  #   filename = lib.options.mkOption {
+  #     description = "Tarball name";
+  #     type = types.str;
+  #   };
+  # }
+}
--- a/pkgs/development/cuda-modules/packages/autoAddCudaCompatRunpath/auto-add-cuda-compat-runpath.sh
+++ b/pkgs/development/cuda-modules/packages/autoAddCudaCompatRunpath/auto-add-cuda-compat-runpath.sh
@@ -0,0 +1,27 @@
+# shellcheck shell=bash
+# Patch all dynamically linked, ELF files with the CUDA driver (libcuda.so)
+# coming from the cuda_compat package by adding it to the RUNPATH.
+echo "Sourcing auto-add-cuda-compat-runpath-hook"
+
+addCudaCompatRunpath() {
+  local libPath
+  local origRpath
+
+  if [[ $# -eq 0 ]]; then
+    echo "addCudaCompatRunpath: no library path provided" >&2
+    exit 1
+  elif [[ $# -gt 1 ]]; then
+    echo "addCudaCompatRunpath: too many arguments" >&2
+    exit 1
+  elif [[ "$1" == "" ]]; then
+    echo "addCudaCompatRunpath: empty library path" >&2
+    exit 1
+  else
+    libPath="$1"
+  fi
+
+  origRpath="$(patchelf --print-rpath "$libPath")"
+  patchelf --set-rpath "@libcudaPath@:$origRpath" "$libPath"
+}
+
+postFixupHooks+=("autoFixElfFiles addCudaCompatRunpath")
--- a/pkgs/development/cuda-modules/packages/autoAddCudaCompatRunpath/package.nix
+++ b/pkgs/development/cuda-modules/packages/autoAddCudaCompatRunpath/package.nix
@@ -0,0 +1,29 @@
+# autoAddCudaCompatRunpath hook must be added AFTER `setupCudaHook`. Both
+# hooks prepend a path with `libcuda.so` to the `DT_RUNPATH` section of
+# patched elf files, but `cuda_compat` path must take precedence (otherwise,
+# it doesn't have any effect) and thus appear first. Meaning this hook must be
+# executed last.
+{
+  autoFixElfFiles,
+  cuda_compat,
+  makeSetupHook,
+}:
+makeSetupHook {
+  name = "auto-add-cuda-compat-runpath-hook";
+  propagatedBuildInputs = [ autoFixElfFiles ];
+
+  substitutions = {
+    libcudaPath = "${cuda_compat}/compat";
+  };
+
+  meta =
+    let
+      # Handle `null`s in pre-`cuda_compat` releases,
+      # and `badPlatform`s for `!isJetsonBuild`.
+      platforms = cuda_compat.meta.platforms or [ ];
+      badPlatforms = cuda_compat.meta.badPlatforms or platforms;
+    in
+    {
+      inherit badPlatforms platforms;
+    };
+} ./auto-add-cuda-compat-runpath.sh
--- a/pkgs/development/cuda-modules/packages/backendStdenv.nix
+++ b/pkgs/development/cuda-modules/packages/backendStdenv.nix
@@ -0,0 +1,154 @@
+# This is what nvcc uses as a backend,
+# and it has to be an officially supported one (e.g. gcc14 for cuda12).
+#
+# It, however, propagates current stdenv's libstdc++ to avoid "GLIBCXX_* not found errors"
+# when linked with other C++ libraries.
+# E.g. for cudaPackages_12_9 we use gcc14 with gcc's libstdc++
+# Cf. https://github.com/NixOS/nixpkgs/pull/218265 for context
+{
+  config,
+  _cuda,
+  cudaMajorMinorVersion,
+  lib,
+  pkgs,
+  stdenv,
+  stdenvAdapters,
+}:
+let
+  inherit (builtins) toJSON;
+  inherit (_cuda.db) allSortedCudaCapabilities cudaCapabilityToInfo nvccCompatibilities;
+  inherit (_cuda.lib)
+    _cudaCapabilityIsDefault
+    _cudaCapabilityIsSupported
+    _evaluateAssertions
+    getRedistSystem
+    mkVersionedName
+    ;
+  inherit (lib) addErrorContext;
+  inherit (lib.customisation) extendDerivation;
+  inherit (lib.lists) filter intersectLists subtractLists;
+
+  # NOTE: By virtue of processing a sorted list (allSortedCudaCapabilities), our groups will be sorted.
+
+  architectureSpecificCudaCapabilities = filter (
+    cudaCapability: cudaCapabilityToInfo.${cudaCapability}.isArchitectureSpecific
+  ) allSortedCudaCapabilities;
+
+  familySpecificCudaCapabilities = filter (
+    cudaCapability: cudaCapabilityToInfo.${cudaCapability}.isFamilySpecific
+  ) allSortedCudaCapabilities;
+
+  jetsonCudaCapabilities = filter (
+    cudaCapability: cudaCapabilityToInfo.${cudaCapability}.isJetson
+  ) allSortedCudaCapabilities;
+
+  passthruExtra = {
+    nvccHostCCMatchesStdenvCC = backendStdenv.cc == stdenv.cc;
+
+    # The Nix system of the host platform.
+    hostNixSystem = stdenv.hostPlatform.system;
+
+    # The Nix system of the host platform for the CUDA redistributable.
+    hostRedistSystem = getRedistSystem passthruExtra.hasJetsonCudaCapability stdenv.hostPlatform.system;
+
+    # Sets whether packages should be built with forward compatibility.
+    # TODO(@connorbaker): If the requested CUDA capabilities are not supported by the current CUDA version,
+    # should we throw an evaluation warning and build with forward compatibility?
+    cudaForwardCompat = config.cudaForwardCompat or true;
+
+    # CUDA capabilities which are supported by the current CUDA version.
+    supportedCudaCapabilities = filter (
+      cudaCapability:
+      _cudaCapabilityIsSupported cudaMajorMinorVersion cudaCapabilityToInfo.${cudaCapability}
+    ) allSortedCudaCapabilities;
+
+    # Find the default set of capabilities for this CUDA version using the list of supported capabilities.
+    # Includes only baseline capabilities.
+    defaultCudaCapabilities = filter (
+      cudaCapability:
+      _cudaCapabilityIsDefault cudaMajorMinorVersion cudaCapabilityToInfo.${cudaCapability}
+    ) passthruExtra.supportedCudaCapabilities;
+
+    # The resolved requested or default CUDA capabilities.
+    cudaCapabilities =
+      if config.cudaCapabilities or [ ] != [ ] then
+        config.cudaCapabilities
+      else
+        passthruExtra.defaultCudaCapabilities;
+
+    # Requested architecture-specific CUDA capabilities.
+    requestedArchitectureSpecificCudaCapabilities = intersectLists architectureSpecificCudaCapabilities passthruExtra.cudaCapabilities;
+
+    # Whether the requested CUDA capabilities include architecture-specific CUDA capabilities.
+    hasArchitectureSpecificCudaCapability =
+      passthruExtra.requestedArchitectureSpecificCudaCapabilities != [ ];
+
+    # Requested family-specific CUDA capabilities.
+    requestedFamilySpecificCudaCapabilities = intersectLists familySpecificCudaCapabilities passthruExtra.cudaCapabilities;
+
+    # Whether the requested CUDA capabilities include family-specific CUDA capabilities.
+    hasFamilySpecificCudaCapability = passthruExtra.requestedFamilySpecificCudaCapabilities != [ ];
+
+    # Requested Jetson CUDA capabilities.
+    requestedJetsonCudaCapabilities = intersectLists jetsonCudaCapabilities passthruExtra.cudaCapabilities;
+
+    # Whether the requested CUDA capabilities include Jetson CUDA capabilities.
+    hasJetsonCudaCapability = passthruExtra.requestedJetsonCudaCapabilities != [ ];
+  };
+
+  assertions =
+    let
+      # Jetson devices cannot be targeted by the same binaries which target non-Jetson devices. While
+      # NVIDIA provides both `linux-aarch64` and `linux-sbsa` packages, which both target `aarch64`,
+      # they are built with different settings and cannot be mixed.
+      jetsonMesssagePrefix = "Jetson CUDA capabilities (${toJSON passthruExtra.requestedJetsonCudaCapabilities})";
+
+      # Remove all known capabilities from the user's list to find unrecognized capabilities.
+      unrecognizedCudaCapabilities = subtractLists allSortedCudaCapabilities passthruExtra.cudaCapabilities;
+
+      # Remove all supported capabilities from the user's list to find unsupported capabilities.
+      unsupportedCudaCapabilities = subtractLists passthruExtra.supportedCudaCapabilities passthruExtra.cudaCapabilities;
+    in
+    [
+      {
+        message = "Unrecognized CUDA capabilities: ${toJSON unrecognizedCudaCapabilities}";
+        assertion = unrecognizedCudaCapabilities == [ ];
+      }
+      {
+        message = "Unsupported CUDA capabilities: ${toJSON unsupportedCudaCapabilities}";
+        assertion = unsupportedCudaCapabilities == [ ];
+      }
+      {
+        message =
+          "${jetsonMesssagePrefix} require hostPlatform (currently ${passthruExtra.hostNixSystem}) "
+          + "to be aarch64-linux";
+        assertion = passthruExtra.hasJetsonCudaCapability -> passthruExtra.hostNixSystem == "aarch64-linux";
+      }
+      {
+        message =
+          let
+            # Find the capabilities which are not Jetson capabilities.
+            requestedNonJetsonCudaCapabilities = subtractLists (
+              passthruExtra.requestedJetsonCudaCapabilities
+              ++ passthruExtra.requestedArchitectureSpecificCudaCapabilities
+              ++ passthruExtra.requestedFamilySpecificCudaCapabilities
+            ) passthruExtra.cudaCapabilities;
+          in
+          "${jetsonMesssagePrefix} cannot be specified with non-Jetson capabilities "
+          + "(${toJSON requestedNonJetsonCudaCapabilities})";
+        assertion =
+          passthruExtra.hasJetsonCudaCapability
+          -> passthruExtra.requestedJetsonCudaCapabilities == passthruExtra.cudaCapabilities;
+      }
+    ];
+
+  assertCondition = addErrorContext "while evaluating ${mkVersionedName "cudaPackages" cudaMajorMinorVersion}.backendStdenv" (
+    _evaluateAssertions assertions
+  );
+
+  backendStdenv =
+    stdenvAdapters.useLibsFrom stdenv
+      pkgs."gcc${nvccCompatibilities.${cudaMajorMinorVersion}.gcc.maxMajorVersion}Stdenv";
+in
+# TODO: Consider testing whether we in fact use the newer libstdc++
+extendDerivation assertCondition passthruExtra backendStdenv
--- a/pkgs/development/cuda-modules/packages/cudnn-frontend/0001-cmake-float-out-common-python-bindings-option.patch
+++ b/pkgs/development/cuda-modules/packages/cudnn-frontend/0001-cmake-float-out-common-python-bindings-option.patch
@@ -0,0 +1,30 @@
+From eeef96e91bd3453160315bf4618b7b91ae7240ba Mon Sep 17 00:00:00 2001
+From: Connor Baker <ConnorBaker01@gmail.com>
+Date: Sat, 18 Jan 2025 20:48:11 +0000
+Subject: [PATCH 1/4] cmake: float out common python bindings option
+
+---
+ CMakeLists.txt | 3 +--
+ 1 file changed, 1 insertion(+), 2 deletions(-)
+
+diff --git a/CMakeLists.txt b/CMakeLists.txt
+index 9739569..8944621 100644
+--- a/CMakeLists.txt
+++ b/CMakeLists.txt
+@@ -5,12 +5,11 @@ project(cudnn_frontend VERSION 1.9.0)
+ option(CUDNN_FRONTEND_SKIP_JSON_LIB "Defines whether FE should not include nlohmann/json.hpp." OFF)
+ option(CUDNN_FRONTEND_BUILD_SAMPLES "Defines if samples are built or not." ON)
+ option(CUDNN_FRONTEND_BUILD_TESTS "Defines if unittests are built or not." ON)
+option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." OFF)
+ 
+ if(MSVC OR MSYS OR MINGW)
+-    option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." OFF)
+     add_compile_options(/W4 /WX)
+ else()
+-    option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." OFF)
+     add_compile_options(-Wall -Wextra -Wpedantic -Werror -Wno-error=attributes -Wno-attributes -Wno-error=unused-function -Wno-unused-function)
+ endif()
+ 
+-- 
+2.47.0
+
--- a/pkgs/development/cuda-modules/packages/cudnn-frontend/0002-cmake-add-config-so-headers-can-be-discovered-when-i.patch
+++ b/pkgs/development/cuda-modules/packages/cudnn-frontend/0002-cmake-add-config-so-headers-can-be-discovered-when-i.patch
@@ -0,0 +1,84 @@
+From da16ec51ea78f88f333ecf3df2a249fcc65ead24 Mon Sep 17 00:00:00 2001
+From: Connor Baker <ConnorBaker01@gmail.com>
+Date: Sat, 18 Jan 2025 22:01:03 +0000
+Subject: [PATCH 2/4] cmake: add config so headers can be discovered when
+ installed
+
+---
+ CMakeLists.txt                 | 39 +++++++++++++++++++++++++++++++---
+ cudnn_frontend-config.cmake.in |  3 +++
+ 2 files changed, 39 insertions(+), 3 deletions(-)
+ create mode 100644 cudnn_frontend-config.cmake.in
+
+diff --git a/CMakeLists.txt b/CMakeLists.txt
+index 8944621..9b1bfba 100644
+--- a/CMakeLists.txt
+++ b/CMakeLists.txt
+@@ -1,4 +1,4 @@
+-cmake_minimum_required(VERSION 3.17)
+cmake_minimum_required(VERSION 3.23)
+ 
+ project(cudnn_frontend VERSION 1.9.0)
+ 
+@@ -15,6 +15,15 @@ endif()
+ 
+ add_library(cudnn_frontend INTERFACE)
+ 
+# Add header files to library
+file(GLOB_RECURSE CUDNN_FRONTEND_INCLUDE_FILES "include/*")
+target_sources(
+    cudnn_frontend PUBLIC FILE_SET HEADERS
+    BASE_DIRS "$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
+    FILES "${CUDNN_FRONTEND_INCLUDE_FILES}"
+)
+unset(CUDNN_FRONTEND_INCLUDE_FILES)
+
+ target_compile_definitions(
+     cudnn_frontend INTERFACE
+     $<$<BOOL:${CUDNN_FRONTEND_SKIP_JSON_LIB}>:CUDNN_FRONTEND_SKIP_JSON_LIB>
+@@ -58,7 +67,31 @@ endif()
+ # * CMAKE_INSTALL_INCLUDEDIR
+ include(GNUInstallDirs)
+ 
+# See https://cmake.org/cmake/help/latest/module/CMakePackageConfigHelpers.html#example-generating-package-files
+include(CMakePackageConfigHelpers)
+
+# Install and export the header files
+install(
+    TARGETS cudnn_frontend
+    EXPORT cudnn_frontend_targets FILE_SET HEADERS
+)
+export(
+    EXPORT cudnn_frontend_targets
+    FILE "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend/cudnn_frontend-targets.cmake"
+)
+install(
+    EXPORT cudnn_frontend_targets
+    FILE cudnn_frontend-targets.cmake
+    DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
+)
+
+# Install the CMake configuration file for header discovery
+configure_package_config_file(
+    cudnn_frontend-config.cmake.in
+    "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend-config.cmake"
+    INSTALL_DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
+)
+ install(
+-    DIRECTORY ${PROJECT_SOURCE_DIR}/include/
+-    DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
+    FILES "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend-config.cmake"
+    DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
+ )
+diff --git a/cudnn_frontend-config.cmake.in b/cudnn_frontend-config.cmake.in
+new file mode 100644
+index 0000000..8b2d843
+--- /dev/null
+++ b/cudnn_frontend-config.cmake.in
+@@ -0,0 +1,3 @@
+@PACKAGE_INIT@
+
+include(${CMAKE_CURRENT_LIST_DIR}/cudnn_frontend-targets.cmake)
+-- 
+2.47.0
+
--- a/pkgs/development/cuda-modules/packages/cudnn-frontend/0003-cmake-install-samples-and-tests-when-built.patch
+++ b/pkgs/development/cuda-modules/packages/cudnn-frontend/0003-cmake-install-samples-and-tests-when-built.patch
@@ -0,0 +1,85 @@
+From 53d5aaaad09b479cd8c0e148c9428baa33204024 Mon Sep 17 00:00:00 2001
+From: Connor Baker <ConnorBaker01@gmail.com>
+Date: Sat, 18 Jan 2025 22:10:41 +0000
+Subject: [PATCH 3/4] cmake: install samples and tests when built
+
+---
+ CMakeLists.txt                        | 12 +++++++++++-
+ samples/cpp/CMakeLists.txt            |  2 ++
+ samples/legacy_samples/CMakeLists.txt |  2 ++
+ test/cpp/CMakeLists.txt               |  2 ++
+ 4 files changed, 17 insertions(+), 1 deletion(-)
+
+diff --git a/CMakeLists.txt b/CMakeLists.txt
+index 9b1bfba..f6af111 100644
+--- a/CMakeLists.txt
+++ b/CMakeLists.txt
+@@ -70,11 +70,21 @@ include(GNUInstallDirs)
+ # See https://cmake.org/cmake/help/latest/module/CMakePackageConfigHelpers.html#example-generating-package-files
+ include(CMakePackageConfigHelpers)
+ 
+-# Install and export the header files
+# Install the components
+ install(
+     TARGETS cudnn_frontend
+     EXPORT cudnn_frontend_targets FILE_SET HEADERS
+ )
+
+if (CUDNN_FRONTEND_BUILD_SAMPLES)
+    install(TARGETS legacy_samples samples RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
+endif()
+
+if (CUDNN_FRONTEND_BUILD_TESTS)
+    install(TARGETS tests RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
+endif()
+
+# Export the targets
+ export(
+     EXPORT cudnn_frontend_targets
+     FILE "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend/cudnn_frontend-targets.cmake"
+diff --git a/samples/cpp/CMakeLists.txt b/samples/cpp/CMakeLists.txt
+index 9b8a5eb..01b09bb 100644
+--- a/samples/cpp/CMakeLists.txt
+++ b/samples/cpp/CMakeLists.txt
+@@ -69,8 +69,10 @@ target_link_libraries(
+     _cudnn_frontend_pch
+     CUDNN::cudnn
+ 
+    CUDA::cublasLt
+     CUDA::cudart
+     CUDA::cuda_driver # Needed as calls all CUDA calls will eventually move to driver
+    CUDA::nvrtc
+ )
+ 
+ # target cmake properties
+diff --git a/samples/legacy_samples/CMakeLists.txt b/samples/legacy_samples/CMakeLists.txt
+index 019f17c..3b56329 100644
+--- a/samples/legacy_samples/CMakeLists.txt
+++ b/samples/legacy_samples/CMakeLists.txt
+@@ -44,7 +44,9 @@ target_link_libraries(
+     _cudnn_frontend_pch
+     CUDNN::cudnn
+ 
+    CUDA::cublasLt
+     CUDA::cudart
+    CUDA::nvrtc
+ )
+ 
+ # target cmake properties
+diff --git a/test/cpp/CMakeLists.txt b/test/cpp/CMakeLists.txt
+index e244cd0..2750294 100644
+--- a/test/cpp/CMakeLists.txt
+++ b/test/cpp/CMakeLists.txt
+@@ -55,7 +55,9 @@ target_link_libraries(
+ 
+     CUDNN::cudnn
+ 
+    CUDA::cublasLt
+     CUDA::cudart
+    CUDA::nvrtc
+ )
+ 
+ # cuDNN dlopen's its libraries
+-- 
+2.47.0
+
--- a/pkgs/development/cuda-modules/packages/cudnn-frontend/0004-samples-fix-instances-of-maybe-uninitialized.patch
+++ b/pkgs/development/cuda-modules/packages/cudnn-frontend/0004-samples-fix-instances-of-maybe-uninitialized.patch
@@ -0,0 +1,591 @@
+From 4ce40a0c3de0e8a7065caf1cf59a90493e084682 Mon Sep 17 00:00:00 2001
+From: Connor Baker <ConnorBaker01@gmail.com>
+Date: Sat, 18 Jan 2025 22:22:21 +0000
+Subject: [PATCH 4/4] samples: fix instances of maybe-uninitialized
+
+---
+ samples/cpp/convolution/dgrads.cpp                 |  6 +++---
+ samples/cpp/convolution/fp8_fprop.cpp              |  2 +-
+ samples/cpp/convolution/fprop.cpp                  | 10 +++++-----
+ samples/cpp/convolution/int8_fprop.cpp             |  2 +-
+ samples/cpp/convolution/wgrads.cpp                 |  4 ++--
+ samples/cpp/matmul/fp8_matmul.cpp                  |  2 +-
+ samples/cpp/matmul/int8_matmul.cpp                 |  2 +-
+ samples/cpp/matmul/matmuls.cpp                     |  8 ++++----
+ samples/cpp/matmul/mixed_matmul.cpp                |  2 +-
+ samples/cpp/misc/pointwise.cpp                     |  6 +++---
+ samples/cpp/misc/resample.cpp                      |  6 +++---
+ samples/cpp/misc/serialization.cpp                 |  4 ++--
+ samples/cpp/misc/slice.cpp                         |  2 +-
+ samples/cpp/misc/sm_carveout.cpp                   |  2 +-
+ samples/cpp/norm/batchnorm.cpp                     |  8 ++++----
+ samples/cpp/norm/layernorm.cpp                     |  8 ++++----
+ samples/cpp/norm/rmsnorm.cpp                       |  6 +++---
+ samples/cpp/sdpa/fp16_bwd.cpp                      |  2 +-
+ samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp |  2 +-
+ samples/cpp/sdpa/fp16_cached.cpp                   |  2 +-
+ samples/cpp/sdpa/fp16_fwd.cpp                      |  2 +-
+ samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp  |  2 +-
+ samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp |  2 +-
+ samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp    |  2 +-
+ samples/cpp/sdpa/fp8_bwd.cpp                       |  4 ++--
+ samples/cpp/sdpa/fp8_fwd.cpp                       |  2 +-
+ 26 files changed, 50 insertions(+), 50 deletions(-)
+
+diff --git a/samples/cpp/convolution/dgrads.cpp b/samples/cpp/convolution/dgrads.cpp
+index 589cb5f..f66abf4 100644
+--- a/samples/cpp/convolution/dgrads.cpp
+++ b/samples/cpp/convolution/dgrads.cpp
+@@ -65,7 +65,7 @@ TEST_CASE("Convolution Dgrad", "[dgrad][graph]") {
+     Surface<half> w_tensor(64 * 32 * 3 * 3, false);
+     Surface<half> dx_tensor(4 * 32 * 16 * 16, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+ 
+     Surface<int8_t> workspace(workspace_size, false);
+@@ -122,7 +122,7 @@ TEST_CASE("Dgrad Drelu Graph", "[dgrad][graph]") {
+     Surface<half> x_tensor(4 * 32 * 16 * 16, false);
+     Surface<half> dx_tensor(4 * 32 * 16 * 16, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -234,7 +234,7 @@ TEST_CASE("Dgrad Drelu DBNweight Graph", "[dgrad][graph]") {
+     Surface<float> eq_scale_x_tensor(1 * 32 * 1 * 1, false);
+     Surface<float> eq_bias_tensor(1 * 32 * 1 * 1, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/convolution/fp8_fprop.cpp b/samples/cpp/convolution/fp8_fprop.cpp
+index dfcb7e2..8246ce4 100644
+--- a/samples/cpp/convolution/fp8_fprop.cpp
+++ b/samples/cpp/convolution/fp8_fprop.cpp
+@@ -116,7 +116,7 @@ TEST_CASE("Convolution fp8 precision", "[conv][graph]") {
+     Surface<float> Y_scale_gpu(1, false);
+     Surface<float> amax_gpu(1, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/convolution/fprop.cpp b/samples/cpp/convolution/fprop.cpp
+index bc1aaf0..d61fa4e 100644
+--- a/samples/cpp/convolution/fprop.cpp
+++ b/samples/cpp/convolution/fprop.cpp
+@@ -80,7 +80,7 @@ TEST_CASE("Convolution fprop", "[conv][graph][caching]") {
+     std::unordered_map<int64_t, void *> variant_pack = {
+         {X->get_uid(), x_tensor.devPtr}, {W->get_uid(), w_tensor.devPtr}, {Y->get_uid(), y_tensor.devPtr}};
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -303,7 +303,7 @@ TEST_CASE("CSBR Graph", "[conv][graph][caching]") {
+     Surface<half> b_tensor(k, false);
+     Surface<half> y_tensor(n * k * h * w, false);  // Should be p, q.
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -550,7 +550,7 @@ TEST_CASE("SBRCS", "[conv][genstats][graph]") {
+         {SUM, sum_tensor.devPtr},
+         {SQ_SUM, sq_sum_tensor.devPtr}};
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -651,7 +651,7 @@ TEST_CASE("CBR Graph NCHW", "[conv][graph][caching]") {
+     Surface<half> y_tensor(n * k * h * w, false);  // Should be p, q.
+     Surface<half> z_tensor(n * k * h * w, false);  // Should be p, q.
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -734,7 +734,7 @@ TEST_CASE("Convolution fprop large", "[conv][graph][caching]") {
+     std::unordered_map<int64_t, void *> variant_pack = {
+         {X->get_uid(), x_tensor.devPtr}, {W->get_uid(), w_tensor.devPtr}, {Y->get_uid(), y_tensor.devPtr}};
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/convolution/int8_fprop.cpp b/samples/cpp/convolution/int8_fprop.cpp
+index 3d5ac2f..e9248f5 100644
+--- a/samples/cpp/convolution/int8_fprop.cpp
+++ b/samples/cpp/convolution/int8_fprop.cpp
+@@ -94,7 +94,7 @@ TEST_CASE("Conv with Int8 datatypes", "[conv][graph][caching]") {
+     std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {
+         {X, x_tensor.devPtr}, {W, w_tensor.devPtr}, {Y, y_tensor.devPtr}};
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/convolution/wgrads.cpp b/samples/cpp/convolution/wgrads.cpp
+index 2c58b26..26887dc 100644
+--- a/samples/cpp/convolution/wgrads.cpp
+++ b/samples/cpp/convolution/wgrads.cpp
+@@ -64,7 +64,7 @@ TEST_CASE("Convolution Wgrad", "[wgrad][graph][wgrad][Conv_wgrad]") {
+     Surface<half> dy_tensor(4 * 64 * 16 * 16, false);
+     Surface<half> dw_tensor(64 * 64 * 3 * 3, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -137,7 +137,7 @@ TEST_CASE("scale-bias-relu-wgrad Graph", "[wgrad][graph][scale-bias-relu-wgrad][
+     Surface<half> dy_tensor(4 * 64 * 16 * 16, false);
+     Surface<half> dw_tensor(64 * 64 * 3 * 3, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/matmul/fp8_matmul.cpp b/samples/cpp/matmul/fp8_matmul.cpp
+index c6470cd..f32c627 100644
+--- a/samples/cpp/matmul/fp8_matmul.cpp
+++ b/samples/cpp/matmul/fp8_matmul.cpp
+@@ -115,7 +115,7 @@ TEST_CASE("Matmul fp8 precision", "[matmul][graph]") {
+     REQUIRE(graph.build_plans(handle, fe::BuildPlanPolicy_t::HEURISTICS_CHOICE).is_good());
+ 
+     Surface<float> C_gpu(b * m * n, false);
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/matmul/int8_matmul.cpp b/samples/cpp/matmul/int8_matmul.cpp
+index cf4353a..cb3ce34 100644
+--- a/samples/cpp/matmul/int8_matmul.cpp
+++ b/samples/cpp/matmul/int8_matmul.cpp
+@@ -104,7 +104,7 @@ TEST_CASE("Int8 Matmul", "[matmul][graph]") {
+     // note this is a bf16 tensor, but half is used just for memory allocation
+     Surface<float> C_gpu(b * m * n, false);
+     Surface<float> Bias_gpu(b * m * n, false);
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/matmul/matmuls.cpp b/samples/cpp/matmul/matmuls.cpp
+index ed0f10b..5c95713 100644
+--- a/samples/cpp/matmul/matmuls.cpp
+++ b/samples/cpp/matmul/matmuls.cpp
+@@ -250,7 +250,7 @@ TEST_CASE("Matmul", "[matmul][graph]") {
+ 
+     // Run cudnn graph
+     Surface<float> C_gpu(b * m * n, false);
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -319,7 +319,7 @@ TEST_CASE("Abs + Matmul", "[matmul][graph]") {
+ 
+     // Run cudnn graph
+     Surface<float> C_gpu(b * m * n, false);
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -539,7 +539,7 @@ TEST_CASE("Matmul SBR Graph", "[matmul][graph]") {
+     auto [graph, A, B, bias, scale, O] = lookup_cache_or_build_graph(
+         handle, x_tensor.devPtr, w_tensor.devPtr, s_tensor.devPtr, b_tensor.devPtr, y_tensor.devPtr);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -606,7 +606,7 @@ TEST_CASE("Matmul with restricted shared memory", "[matmul][graph]") {
+ 
+     // Run cudnn graph
+     Surface<float> C_gpu(b * m * n, false);
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/matmul/mixed_matmul.cpp b/samples/cpp/matmul/mixed_matmul.cpp
+index ab3e195..a2b05bd 100644
+--- a/samples/cpp/matmul/mixed_matmul.cpp
+++ b/samples/cpp/matmul/mixed_matmul.cpp
+@@ -96,7 +96,7 @@ TEST_CASE("Mixed Precision Matmul", "[matmul][graph]") {
+     //// Run cudnn graph
+     // note this is a bf16 tensor, but half is used just for memory allocation
+     Surface<half> C_gpu(b * m * n, false);
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/misc/pointwise.cpp b/samples/cpp/misc/pointwise.cpp
+index 8f8d699..e8f4cb1 100644
+--- a/samples/cpp/misc/pointwise.cpp
+++ b/samples/cpp/misc/pointwise.cpp
+@@ -51,7 +51,7 @@ TEST_CASE("Reduction", "[reduction]") {
+     Surface<float> C_gpu(n * n * n * n, false);
+     std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {{A, A_gpu.devPtr},
+                                                                                              {C, C_gpu.devPtr}};
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -88,7 +88,7 @@ TEST_CASE("Fused scalar", "[scalar][graph]") {
+ 
+     std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {{A, A_gpu.devPtr},
+                                                                                              {C, C_gpu.devPtr}};
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -148,7 +148,7 @@ TEST_CASE("Fused Amax Reduction and type conversion", "[reduction]") {
+ 
+     std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {
+         {A, A_gpu.devPtr}, {scale, scale_gpu.devPtr}, {amax, amax_gpu.devPtr}, {C, C_gpu.devPtr}};
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/misc/resample.cpp b/samples/cpp/misc/resample.cpp
+index 3f782e7..21998c3 100644
+--- a/samples/cpp/misc/resample.cpp
+++ b/samples/cpp/misc/resample.cpp
+@@ -69,7 +69,7 @@ TEST_CASE("Resample Max Pooling NHWC Inference", "[resample][pooling][max][graph
+     Surface<half> Y_gpu(N * H * W * C, false);
+     std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {{X, X_gpu.devPtr},
+                                                                                              {Y, Y_gpu.devPtr}};
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -132,7 +132,7 @@ TEST_CASE("Resample Max Pooling NHWC Training", "[resample][pooling][max][graph]
+     Surface<int8_t> Index_gpu(N * H * W * C / 8, false);
+     std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {
+         {X, X_gpu.devPtr}, {Y, Y_gpu.devPtr}, {Index, Index_gpu.devPtr}};
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -186,7 +186,7 @@ TEST_CASE("Resample Avg Pooling", "[resample][pooling][average][graph]") {
+     Surface<half> Y_gpu(N * H * W * C, false);
+     std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {{X, X_gpu.devPtr},
+                                                                                              {Y, Y_gpu.devPtr}};
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/misc/serialization.cpp b/samples/cpp/misc/serialization.cpp
+index a130406..278bad8 100644
+--- a/samples/cpp/misc/serialization.cpp
+++ b/samples/cpp/misc/serialization.cpp
+@@ -178,7 +178,7 @@ TEST_CASE("CSBR Graph with serialization", "[conv][graph][serialization]") {
+     Surface<half> b_device_memory(k, false);
+     Surface<half> y_device_memory(n * k * h * w, false);  // Should be p, q.
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -401,7 +401,7 @@ TEST_CASE("SDPA Graph with serialization", "[sdpa][graph][serialization]") {
+     Surface<int32_t> dropoutSeed(scaleSize, false, seed_value);
+     Surface<int32_t> dropoutOffset(scaleSize, false, (int32_t)1);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/misc/slice.cpp b/samples/cpp/misc/slice.cpp
+index 087ba36..78962c6 100644
+--- a/samples/cpp/misc/slice.cpp
+++ b/samples/cpp/misc/slice.cpp
+@@ -80,7 +80,7 @@ TEST_CASE("Slice gemm", "[slice][gemm][graph][fusion]") {
+     Surface<half> C_gpu(B * M * N, false);
+     std::unordered_map<int64_t, void *> variant_pack = {
+         {a_uid, A_gpu.devPtr}, {b_uid, B_gpu.devPtr}, {c_uid, C_gpu.devPtr}};
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/misc/sm_carveout.cpp b/samples/cpp/misc/sm_carveout.cpp
+index d6818c0..b0e0651 100644
+--- a/samples/cpp/misc/sm_carveout.cpp
+++ b/samples/cpp/misc/sm_carveout.cpp
+@@ -121,7 +121,7 @@ TEST_CASE("SGBN with SM carveout", "[batchnorm][graph][sm_carveout]") {
+     Surface<float> Peer_stats_0_tensor(2 * 4 * c, false, true);
+     Surface<float> Peer_stats_1_tensor(2 * 4 * c, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/norm/batchnorm.cpp b/samples/cpp/norm/batchnorm.cpp
+index 5949365..a91a9bd 100644
+--- a/samples/cpp/norm/batchnorm.cpp
+++ b/samples/cpp/norm/batchnorm.cpp
+@@ -96,7 +96,7 @@ TEST_CASE("BN Finalize Graph", "[batchnorm][graph]") {
+     Surface<float> eq_scale_tensor(32, false);
+     Surface<float> eq_bias_tensor(32, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -226,7 +226,7 @@ TEST_CASE("SGBN Add Relu Graph", "[batchnorm][graph]") {
+     Surface<float> Peer_stats_0_tensor(2 * 4 * 32, false, true);
+     Surface<float> Peer_stats_1_tensor(2 * 4 * 32, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -346,7 +346,7 @@ TEST_CASE("DBN Add Relu Graph", "[BN][graph][backward]") {
+     Surface<float> Peer_stats_0_tensor(2 * 4 * 32, false, true);
+     Surface<float> Peer_stats_1_tensor(2 * 4 * 32, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -454,7 +454,7 @@ TEST_CASE("BN_inference DRelu DBN Graph", "[Batchnorm][graph][backward]") {
+     Surface<float> Dbias_tensor(32, false);
+     Surface<half> DX_tensor(4 * 32 * 16 * 16, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/norm/layernorm.cpp b/samples/cpp/norm/layernorm.cpp
+index bac996f..7f69f34 100644
+--- a/samples/cpp/norm/layernorm.cpp
+++ b/samples/cpp/norm/layernorm.cpp
+@@ -133,7 +133,7 @@ layernorm_fwd_dynamic_shapes(bool train = true) {
+         Surface<float> Mean_tensor(max_stats_volume, false);
+         Surface<float> Var_tensor(max_stats_volume, false);
+ 
+-        int64_t workspace_size;
+        int64_t workspace_size = 0;
+         REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+         Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -232,7 +232,7 @@ TEST_CASE("LayerNorm Training", "[layernorm][graph]") {
+     Surface<float> Bias_tensor(hidden_size, false);
+     Surface<half> Y_tensor(batch_size * seq_length * hidden_size, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -310,7 +310,7 @@ TEST_CASE("LayerNorm Inference", "[layernorm][graph]") {
+     Surface<float> Bias_tensor(hidden_size, false);
+     Surface<half> Y_tensor(batch_size * seq_length * hidden_size, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -392,7 +392,7 @@ TEST_CASE("LayerNorm Backward", "[layernorm][graph]") {
+     Surface<float> Dbias_tensor(hidden_size, false);
+     Surface<half> DX_tensor(batch_size * seq_length * hidden_size, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/norm/rmsnorm.cpp b/samples/cpp/norm/rmsnorm.cpp
+index 878086c..d5c919b 100644
+--- a/samples/cpp/norm/rmsnorm.cpp
+++ b/samples/cpp/norm/rmsnorm.cpp
+@@ -78,7 +78,7 @@ TEST_CASE("RmsNorm Training", "[rmsnorm][graph]") {
+     Surface<float> Scale_tensor(hidden_size, false);
+     Surface<float> Y_tensor(batch_size * seq_length * hidden_size, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -150,7 +150,7 @@ TEST_CASE("RmsNorm Inference", "[rmsnorm][graph]") {
+     Surface<float> Bias_tensor(hidden_size, false);
+     Surface<float> Y_tensor(batch_size * seq_length * hidden_size, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -227,7 +227,7 @@ TEST_CASE("RmsNorm Backward", "[rmsnorm][graph]") {
+     Surface<float> Dbias_tensor(hidden_size, false);
+     Surface<float> DX_tensor(batch_size * seq_length * hidden_size, false);
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/sdpa/fp16_bwd.cpp b/samples/cpp/sdpa/fp16_bwd.cpp
+index 749cbed..1145008 100644
+--- a/samples/cpp/sdpa/fp16_bwd.cpp
+++ b/samples/cpp/sdpa/fp16_bwd.cpp
+@@ -275,7 +275,7 @@ TEST_CASE("Toy sdpa backward", "[graph][sdpa][flash][backward]") {
+     }
+ 
+     // Allocate workspace
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp b/samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp
+index 62d6bb3..50205c3 100644
+--- a/samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp
+++ b/samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp
+@@ -195,7 +195,7 @@ TEST_CASE("Toy sdpa backward with flexible graph", "[graph][sdpa][flash][backwar
+                                                                                    {DV_UID, dV_tensor.devPtr}};
+ 
+     // Allocate workspace
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/sdpa/fp16_cached.cpp b/samples/cpp/sdpa/fp16_cached.cpp
+index d046271..4f0d3f8 100644
+--- a/samples/cpp/sdpa/fp16_cached.cpp
+++ b/samples/cpp/sdpa/fp16_cached.cpp
+@@ -146,7 +146,7 @@ TEST_CASE("Cached sdpa", "[graph][sdpa][flash]") {
+                     {O_UID, o_tensor.devPtr},
+                     {STATS_UID, stats_tensor.devPtr}};
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(fwd_graph2->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> fwd_workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/sdpa/fp16_fwd.cpp b/samples/cpp/sdpa/fp16_fwd.cpp
+index b3acf5e..63697a1 100644
+--- a/samples/cpp/sdpa/fp16_fwd.cpp
+++ b/samples/cpp/sdpa/fp16_fwd.cpp
+@@ -210,7 +210,7 @@ TEST_CASE("Toy sdpa forward", "[graph][sdpa][flash][forward]") {
+         variant_pack[STATS_UID] = statsTensor.devPtr;
+     }
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp b/samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp
+index 36cfba4..0cb9d2f 100644
+--- a/samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp
+++ b/samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp
+@@ -178,7 +178,7 @@ TEST_CASE("Toy sdpa forward with dropout", "[graph][sdpa][flash][forward]") {
+         variant_pack[STATS_UID] = statsTensor.devPtr;
+     }
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp b/samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp
+index 810de63..7d81afe 100644
+--- a/samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp
+++ b/samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp
+@@ -186,7 +186,7 @@ TEST_CASE("Toy sdpa forward with flexible graph", "[graph][sdpa][flash][forward]
+         variant_pack[STATS_UID] = statsTensor.devPtr;
+     }
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp b/samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp
+index 18dd937..d195f6b 100644
+--- a/samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp
+++ b/samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp
+@@ -268,7 +268,7 @@ TEST_CASE("Toy sdpa forward with paged caches", "[graph][sdpa][flash][paged][for
+         variant_pack[STATS_UID] = statsTensor.devPtr;
+     }
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(graph->get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/sdpa/fp8_bwd.cpp b/samples/cpp/sdpa/fp8_bwd.cpp
+index 82e542b..296f2f9 100644
+--- a/samples/cpp/sdpa/fp8_bwd.cpp
+++ b/samples/cpp/sdpa/fp8_bwd.cpp
+@@ -214,7 +214,7 @@ TEST_CASE("sdpa_fp8_bprop", "[graph][sdpa][fp8][backward]") {
+         {Amax_dV, AMax_dV_Tensor.devPtr},
+         {Amax_dP, AMax_dP_Tensor.devPtr}};
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(mha_graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+@@ -385,7 +385,7 @@ TEST_CASE("sdpa_fp8_gqa_bprop", "[graph][sdpa][fp8][backward]") {
+         {amax_dV, amax_dV_gpu.devPtr},
+         {amax_dP, amax_dP_gpu.devPtr}};
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(mha_graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+diff --git a/samples/cpp/sdpa/fp8_fwd.cpp b/samples/cpp/sdpa/fp8_fwd.cpp
+index 6ede98d..23abc3f 100644
+--- a/samples/cpp/sdpa/fp8_fwd.cpp
+++ b/samples/cpp/sdpa/fp8_fwd.cpp
+@@ -146,7 +146,7 @@ TEST_CASE("sdpa_fp8_fprop", "[graph][sdpa][fp8][forward]") {
+         variant_pack[Stats] = stats_tensor.devPtr;
+     }
+ 
+-    int64_t workspace_size;
+    int64_t workspace_size = 0;
+     REQUIRE(mha_graph.get_workspace_size(workspace_size).is_good());
+     Surface<int8_t> workspace(workspace_size, false);
+ 
+-- 
+2.47.0
+
--- a/pkgs/development/cuda-modules/packages/cudnn-frontend/CMakeLists.txt
+++ b/pkgs/development/cuda-modules/packages/cudnn-frontend/CMakeLists.txt
@@ -0,0 +1,133 @@
+cmake_minimum_required(VERSION 3.23)
+
+project(cudnn_frontend VERSION 1.8.0)
+
+option(CUDNN_FRONTEND_SKIP_JSON_LIB "Defines whether FE should not include nlohmann/json.hpp." OFF)
+option(CUDNN_FRONTEND_BUILD_SAMPLES "Defines if samples are built or not." ON)
+option(CUDNN_FRONTEND_BUILD_TESTS "Defines if unittests are built or not." ON)
+option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." OFF)
+
+if(MSVC OR MSYS OR MINGW)
+    add_compile_options(/W4 /WX)
+else()
+    add_compile_options(-Wall -Wextra -Wpedantic -Werror -Wno-error=attributes -Wno-attributes -Wno-error=unused-function -Wno-unused-function)
+endif()
+
+add_library(cudnn_frontend INTERFACE)
+
+# Add header files to library
+file(GLOB_RECURSE CUDNN_FRONTEND_INCLUDE_FILES "include/*")
+target_sources(
+    cudnn_frontend
+    PUBLIC
+        FILE_SET
+            HEADERS
+            BASE_DIRS
+                "$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
+            FILES
+                "${CUDNN_FRONTEND_INCLUDE_FILES}"
+)
+unset(CUDNN_FRONTEND_INCLUDE_FILES)
+
+target_compile_definitions(cudnn_frontend INTERFACE $<$<BOOL:${CUDNN_FRONTEND_SKIP_JSON_LIB}>:CUDNN_FRONTEND_SKIP_JSON_LIB>)
+
+target_include_directories(
+    cudnn_frontend
+    INTERFACE
+        "$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
+        "$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>"
+)
+
+# Find the cuda compiler
+find_package(CUDAToolkit REQUIRED)
+
+target_include_directories(cudnn_frontend INTERFACE ${CUDAToolkit_INCLUDE_DIRS})
+
+target_compile_features(cudnn_frontend INTERFACE cxx_std_17)
+
+# Make PCH for targets to link against
+add_library(_cudnn_frontend_pch INTERFACE)
+target_precompile_headers(_cudnn_frontend_pch INTERFACE ${PROJECT_SOURCE_DIR}/include/cudnn_frontend.h)
+
+if (CUDNN_FRONTEND_BUILD_SAMPLES)
+    add_subdirectory(samples)
+    target_link_libraries(
+        samples
+        PRIVATE
+            CUDA::cublasLt
+            CUDA::nvrtc
+    )
+    target_link_libraries(
+        legacy_samples
+        PRIVATE
+            CUDA::cublasLt
+            CUDA::nvrtc
+    )
+endif()
+
+if (CUDNN_FRONTEND_BUILD_TESTS)
+    add_subdirectory(test)
+    target_link_libraries(
+        tests
+        CUDA::cublasLt
+        CUDA::nvrtc
+    )
+endif()
+
+if (CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS)
+    add_subdirectory(python)
+endif()
+
+# Introduce variables:
+# * CMAKE_INSTALL_LIBDIR
+# * CMAKE_INSTALL_BINDIR
+# * CMAKE_INSTALL_INCLUDEDIR
+include(GNUInstallDirs)
+
+# Install and export the header files
+install(
+    TARGETS
+        cudnn_frontend
+    EXPORT
+        cudnn_frontend_targets
+    FILE_SET HEADERS
+)
+
+if (CUDNN_FRONTEND_BUILD_SAMPLES)
+    install(TARGETS legacy_samples samples RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
+endif()
+
+if (CUDNN_FRONTEND_BUILD_TESTS)
+    install(TARGETS tests RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
+endif()
+
+# See https://cmake.org/cmake/help/latest/module/CMakePackageConfigHelpers.html#example-generating-package-files
+include(CMakePackageConfigHelpers)
+
+export(
+    EXPORT
+        cudnn_frontend_targets
+    FILE
+        "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend/cudnn_frontend-targets.cmake"
+)
+install(
+    EXPORT
+        cudnn_frontend_targets
+    FILE
+        cudnn_frontend-targets.cmake
+    DESTINATION
+        "${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
+)
+
+configure_package_config_file(
+    cudnn_frontend-config.cmake.in
+    "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend-config.cmake"
+    INSTALL_DESTINATION
+        "${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
+)
+install(
+    FILES
+        "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend-config.cmake"
+    DESTINATION
+        "${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
+)
--- a/pkgs/development/cuda-modules/packages/cudnn-frontend/cudnn_frontend-config.cmake.in
+++ b/pkgs/development/cuda-modules/packages/cudnn-frontend/cudnn_frontend-config.cmake.in
@@ -0,0 +1,3 @@
+@PACKAGE_INIT@
+
+include(${CMAKE_CURRENT_LIST_DIR}/cudnn_frontend-targets.cmake)
--- a/pkgs/development/cuda-modules/packages/cudnn-frontend/package.nix
+++ b/pkgs/development/cuda-modules/packages/cudnn-frontend/package.nix
@@ -0,0 +1,132 @@
+{
+  autoAddDriverRunpath,
+  catch2_3,
+  cmake,
+  fetchFromGitHub,
+  gitUpdater,
+  lib,
+  ninja,
+  nlohmann_json,
+  stdenv,
+  cuda_cccl ? null,
+  cuda_cudart ? null,
+  cuda_nvcc ? null,
+  cuda_nvrtc ? null,
+  cudnn ? null,
+  libcublas ? null,
+}:
+let
+  inherit (lib.lists) optionals;
+  inherit (lib.strings)
+    cmakeBool
+    cmakeFeature
+    optionalString
+    ;
+in
+
+# TODO(@connorbaker): This should be a hybrid C++/Python package.
+stdenv.mkDerivation (finalAttrs: {
+  pname = "cudnn-frontend";
+  version = "1.9.0";
+
+  src = fetchFromGitHub {
+    owner = "NVIDIA";
+    repo = "cudnn-frontend";
+    tag = "v${finalAttrs.version}";
+    hash = "sha256-Vc5jqB1XHcJEdKG0nxbWLewW2fDezRVwjUSzPDubSGE=";
+  };
+
+  patches = [
+    # https://github.com/NVIDIA/cudnn-frontend/pull/125
+    ./0001-cmake-float-out-common-python-bindings-option.patch
+    ./0002-cmake-add-config-so-headers-can-be-discovered-when-i.patch
+    ./0003-cmake-install-samples-and-tests-when-built.patch
+    ./0004-samples-fix-instances-of-maybe-uninitialized.patch
+  ];
+
+  # nlohmann_json should be the only vendored dependency.
+  postPatch = ''
+    echo "patching source to use nlohmann_json from nixpkgs"
+    rm -rf include/cudnn_frontend/thirdparty/nlohmann
+    rmdir include/cudnn_frontend/thirdparty
+    substituteInPlace include/cudnn_frontend_utils.h \
+      --replace-fail \
+        '#include "cudnn_frontend/thirdparty/nlohmann/json.hpp"' \
+        '#include <nlohmann/json.hpp>'
+  '';
+
+  # TODO: As a header-only library, we should make sure we have an `include` directory or similar which is not a
+  # superset of the `out` (`bin`) or `dev` outputs (which is what the multiple-outputs setup hook does by default).
+  outputs = [
+    "out"
+  ]
+  ++ optionals finalAttrs.doCheck [
+    "legacy_samples"
+    "samples"
+    "tests"
+  ];
+
+  nativeBuildInputs = [
+    autoAddDriverRunpath # Needed for samples because it links against CUDA::cuda_driver
+    cmake
+    cuda_nvcc
+    ninja
+  ];
+
+  buildInputs = [
+    cuda_cccl
+    cuda_cudart
+  ];
+
+  cmakeFlags = [
+    (cmakeBool "FETCHCONTENT_FULLY_DISCONNECTED" true)
+    (cmakeFeature "FETCHCONTENT_TRY_FIND_PACKAGE_MODE" "ALWAYS")
+    (cmakeBool "CUDNN_FRONTEND_BUILD_SAMPLES" finalAttrs.doCheck)
+    (cmakeBool "CUDNN_FRONTEND_BUILD_TESTS" finalAttrs.doCheck)
+    (cmakeBool "CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS" false)
+  ];
+
+  checkInputs = [
+    cudnn
+    cuda_nvrtc
+    catch2_3
+    libcublas
+  ];
+
+  enableParallelBuilding = true;
+
+  propagatedBuildInputs = [
+    nlohmann_json
+  ];
+
+  doCheck = true;
+
+  postInstall = optionalString finalAttrs.doCheck ''
+    moveToOutput "bin/legacy_samples" "$legacy_samples"
+    moveToOutput "bin/samples" "$samples"
+    moveToOutput "bin/tests" "$tests"
+    if [[ -e "$out/bin" ]]
+    then
+      nixErrorLog "The bin directory in \$out should no longer exist."
+      exit 1
+    fi
+  '';
+
+  passthru.updateScript = gitUpdater {
+    inherit (finalAttrs) pname version;
+    rev-prefix = "v";
+  };
+
+  meta = {
+    description = "A c++ wrapper for the cudnn backend API";
+    homepage = "https://github.com/NVIDIA/cudnn-frontend";
+    license = lib.licenses.mit;
+    badPlatforms = optionals (cudnn == null) finalAttrs.meta.platforms;
+    platforms = [
+      "aarch64-linux"
+      "x86_64-linux"
+    ];
+    maintainers = with lib.maintainers; [ connorbaker ];
+    teams = [ lib.teams.cuda ];
+  };
+})
--- a/pkgs/development/cuda-modules/packages/markForCudatoolkitRootHook/mark-for-cudatoolkit-root-hook.sh
+++ b/pkgs/development/cuda-modules/packages/markForCudatoolkitRootHook/mark-for-cudatoolkit-root-hook.sh
@@ -0,0 +1,25 @@
+# shellcheck shell=bash
+
+(( ${hostOffset:?} == -1 && ${targetOffset:?} == 0)) || return 0
+
+echo "Sourcing mark-for-cudatoolkit-root-hook" >&2
+
+markForCUDAToolkit_ROOT() {
+    mkdir -p "${prefix:?}/nix-support"
+    local markerPath="$prefix/nix-support/include-in-cudatoolkit-root"
+
+    # Return early if the file already exists.
+    [[ -f "$markerPath" ]] && return 0
+
+    # Always create the file, even if it's empty, since setup-cuda-hook relies on its existence.
+    # However, only populate it if strictDeps is not set.
+    touch "$markerPath"
+
+    # Return early if strictDeps is set.
+    [[ -n "${strictDeps-}" ]] && return 0
+
+    # Populate the file with the package name and output.
+    echo "${pname:?}-${output:?}" > "$markerPath"
+}
+
+fixupOutputHooks+=(markForCUDAToolkit_ROOT)
--- a/pkgs/development/cuda-modules/packages/markForCudatoolkitRootHook/package.nix
+++ b/pkgs/development/cuda-modules/packages/markForCudatoolkitRootHook/package.nix
@@ -0,0 +1,4 @@
+# Internal hook, used by cudatoolkit and cuda redist packages
+# to accommodate automatic CUDAToolkit_ROOT construction
+{ makeSetupHook }:
+makeSetupHook { name = "mark-for-cudatoolkit-root-hook"; } ./mark-for-cudatoolkit-root-hook.sh
--- a/pkgs/development/cuda-modules/packages/nccl-tests.nix
+++ b/pkgs/development/cuda-modules/packages/nccl-tests.nix
@@ -0,0 +1,83 @@
+# NOTE: Though NCCL tests is called within the cudaPackages package set, we avoid passing in
+# the names of dependencies from that package set directly to avoid evaluation errors
+# in the case redistributable packages are not available.
+{
+  config,
+  cudaPackages,
+  fetchFromGitHub,
+  gitUpdater,
+  lib,
+  mpi,
+  mpiSupport ? false,
+  which,
+}:
+let
+  inherit (cudaPackages)
+    backendStdenv
+    cuda_cccl
+    cuda_cudart
+    cuda_nvcc
+    cudaAtLeast
+    nccl
+    ;
+in
+backendStdenv.mkDerivation (finalAttrs: {
+
+  pname = "nccl-tests";
+  version = "2.15.0";
+
+  src = fetchFromGitHub {
+    owner = "NVIDIA";
+    repo = "nccl-tests";
+    rev = "v${finalAttrs.version}";
+    hash = "sha256-OgffbW9Vx/sm1I1tpaPGdAhIpV4jbB4hJa9UcEAWkdE=";
+  };
+
+  postPatch = ''
+    # fix build failure with GCC14
+    substituteInPlace src/Makefile --replace-fail "-std=c++11" "-std=c++14"
+  '';
+
+  strictDeps = true;
+
+  nativeBuildInputs = [
+    which
+    cuda_nvcc
+  ];
+
+  buildInputs = [
+    nccl
+    cuda_nvcc # crt/host_config.h
+    cuda_cudart
+    cuda_cccl # <nv/target>
+  ]
+  ++ lib.optionals mpiSupport [ mpi ];
+
+  makeFlags = [
+    "NCCL_HOME=${nccl}"
+    "CUDA_HOME=${cuda_nvcc}"
+  ]
+  ++ lib.optionals mpiSupport [ "MPI=1" ];
+
+  enableParallelBuilding = true;
+
+  installPhase = ''
+    mkdir -p $out/bin
+    cp -r build/* $out/bin/
+  '';
+
+  passthru.updateScript = gitUpdater {
+    inherit (finalAttrs) pname version;
+    rev-prefix = "v";
+  };
+
+  meta = with lib; {
+    description = "Tests to check both the performance and the correctness of NVIDIA NCCL operations";
+    homepage = "https://github.com/NVIDIA/nccl-tests";
+    platforms = platforms.linux;
+    license = licenses.bsd3;
+    broken = !config.cudaSupport || (mpiSupport && mpi == null);
+    maintainers = with maintainers; [ jmillerpdt ];
+    teams = [ teams.cuda ];
+  };
+})
--- a/pkgs/development/cuda-modules/packages/nccl.nix
+++ b/pkgs/development/cuda-modules/packages/nccl.nix
@@ -0,0 +1,98 @@
+# NOTE: Though NCCL is called within the cudaPackages package set, we avoid passing in
+# the names of dependencies from that package set directly to avoid evaluation errors
+# in the case redistributable packages are not available.
+{
+  lib,
+  fetchFromGitHub,
+  python3,
+  which,
+  autoAddDriverRunpath,
+  cudaPackages,
+  # passthru.updateScript
+  gitUpdater,
+}:
+let
+  inherit (cudaPackages)
+    backendStdenv
+    cuda_cccl
+    cuda_cudart
+    cuda_nvcc
+    cudaAtLeast
+    flags
+    ;
+  version = "2.27.6-1";
+  hash = "sha256-/BiLSZaBbVIqOfd8nQlgUJub0YR3SR4B93x2vZpkeiU=";
+in
+backendStdenv.mkDerivation (finalAttrs: {
+  pname = "nccl";
+  version = version;
+
+  src = fetchFromGitHub {
+    owner = "NVIDIA";
+    repo = "nccl";
+    rev = "v${finalAttrs.version}";
+    hash = hash;
+  };
+
+  __structuredAttrs = true;
+  strictDeps = true;
+
+  outputs = [
+    "out"
+    "dev"
+  ];
+
+  nativeBuildInputs = [
+    which
+    autoAddDriverRunpath
+    python3
+    cuda_nvcc
+  ];
+
+  buildInputs = [
+    cuda_nvcc # crt/host_config.h
+    cuda_cudart
+    cuda_cccl
+  ];
+
+  env.NIX_CFLAGS_COMPILE = toString [ "-Wno-unused-function" ];
+
+  postPatch = ''
+    patchShebangs ./src/device/generate.py
+    patchShebangs ./src/device/symmetric/generate.py
+  '';
+
+  makeFlags = [
+    "PREFIX=$(out)"
+    "NVCC_GENCODE=${flags.gencodeString}"
+    "CUDA_HOME=${cuda_nvcc}"
+    "CUDA_LIB=${lib.getLib cuda_cudart}/lib"
+    "CUDA_INC=${lib.getDev cuda_cudart}/include"
+  ];
+
+  enableParallelBuilding = true;
+
+  postFixup = ''
+    moveToOutput lib/libnccl_static.a $dev
+  '';
+
+  passthru.updateScript = gitUpdater {
+    inherit (finalAttrs) pname version;
+    rev-prefix = "v";
+  };
+
+  meta = with lib; {
+    description = "Multi-GPU and multi-node collective communication primitives for NVIDIA GPUs";
+    homepage = "https://developer.nvidia.com/nccl";
+    license = licenses.bsd3;
+    platforms = platforms.linux;
+    # NCCL is not supported on Jetson, because it does not use NVLink or PCI-e for inter-GPU communication.
+    # https://forums.developer.nvidia.com/t/can-jetson-orin-support-nccl/232845/9
+    badPlatforms = lib.optionals flags.isJetsonBuild [ "aarch64-linux" ];
+    maintainers = with maintainers; [
+      mdaiter
+      orivej
+    ];
+    teams = [ teams.cuda ];
+  };
+})
--- a/pkgs/development/cuda-modules/packages/saxpy/package.nix
+++ b/pkgs/development/cuda-modules/packages/saxpy/package.nix
@@ -0,0 +1,63 @@
+{
+  autoAddDriverRunpath,
+  cmake,
+  cudaPackages,
+  lib,
+  saxpy,
+}:
+let
+  inherit (cudaPackages)
+    backendStdenv
+    cuda_cccl
+    cuda_cudart
+    cuda_nvcc
+    cudaAtLeast
+    flags
+    libcublas
+    ;
+  inherit (lib) getDev getLib getOutput;
+in
+backendStdenv.mkDerivation {
+  pname = "saxpy";
+  version = "unstable-2023-07-11";
+
+  src = ./src;
+
+  __structuredAttrs = true;
+  strictDeps = true;
+
+  nativeBuildInputs = [
+    cmake
+    autoAddDriverRunpath
+    cuda_nvcc
+  ];
+
+  buildInputs = [
+    (getDev libcublas)
+    (getLib libcublas)
+    (getOutput "static" libcublas)
+    cuda_cudart
+    cuda_cccl
+  ];
+
+  cmakeFlags = [
+    (lib.cmakeBool "CMAKE_VERBOSE_MAKEFILE" true)
+    (lib.cmakeFeature "CMAKE_CUDA_ARCHITECTURES" flags.cmakeCudaArchitecturesString)
+  ];
+
+  passthru.gpuCheck = saxpy.overrideAttrs (_: {
+    requiredSystemFeatures = [ "cuda" ];
+    doInstallCheck = true;
+    postInstallCheck = ''
+      $out/bin/${saxpy.meta.mainProgram or (lib.getName saxpy)}
+    '';
+  });
+
+  meta = {
+    description = "Simple (Single-precision AX Plus Y) FindCUDAToolkit.cmake example for testing cross-compilation";
+    license = lib.licenses.mit;
+    teams = [ lib.teams.cuda ];
+    mainProgram = "saxpy";
+    platforms = lib.platforms.unix;
+  };
+}
--- a/pkgs/development/cuda-modules/packages/saxpy/src/CMakeLists.txt
+++ b/pkgs/development/cuda-modules/packages/saxpy/src/CMakeLists.txt
@@ -0,0 +1,12 @@
+cmake_minimum_required(VERSION 3.25)
+project(saxpy LANGUAGES CXX CUDA)
+
+find_package(CUDAToolkit REQUIRED COMPONENTS cudart cublas)
+
+add_executable(saxpy saxpy.cu)
+target_link_libraries(saxpy PUBLIC CUDA::cublas CUDA::cudart m)
+target_compile_features(saxpy PRIVATE cxx_std_14)
+target_compile_options(saxpy PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:
+                                     --expt-relaxed-constexpr>)
+
+install(TARGETS saxpy)
--- a/pkgs/development/cuda-modules/packages/saxpy/src/saxpy.cu
+++ b/pkgs/development/cuda-modules/packages/saxpy/src/saxpy.cu
@@ -0,0 +1,68 @@
+#include <cublas_v2.h>
+#include <cuda_runtime.h>
+#include <vector>
+
+#include <stdio.h>
+
+static inline void check(cudaError_t err, const char *context) {
+  if (err != cudaSuccess) {
+    fprintf(stderr, "CUDA error at %s: %s\n", context, cudaGetErrorString(err));
+    std::exit(EXIT_FAILURE);
+  }
+}
+
+#define CHECK(x) check(x, #x)
+
+__global__ void saxpy(int n, float a, float *x, float *y) {
+  int i = blockIdx.x * blockDim.x + threadIdx.x;
+  if (i < n)
+    y[i] = a * x[i] + y[i];
+}
+
+int main(void) {
+  setbuf(stderr, NULL);
+  fprintf(stderr, "Start\n");
+
+  int rtVersion, driverVersion;
+  CHECK(cudaRuntimeGetVersion(&rtVersion));
+  CHECK(cudaDriverGetVersion(&driverVersion));
+
+  fprintf(stderr, "Runtime version: %d\n", rtVersion);
+  fprintf(stderr, "Driver version: %d\n", driverVersion);
+
+  constexpr int N = 1 << 10;
+
+  std::vector<float> xHost(N), yHost(N);
+  for (int i = 0; i < N; i++) {
+    xHost[i] = 1.0f;
+    yHost[i] = 2.0f;
+  }
+
+  fprintf(stderr, "Host memory initialized, copying to the device\n");
+  fflush(stderr);
+
+  float *xDevice, *yDevice;
+  CHECK(cudaMalloc(&xDevice, N * sizeof(float)));
+  CHECK(cudaMalloc(&yDevice, N * sizeof(float)));
+
+  CHECK(cudaMemcpy(xDevice, xHost.data(), N * sizeof(float),
+                   cudaMemcpyHostToDevice));
+  CHECK(cudaMemcpy(yDevice, yHost.data(), N * sizeof(float),
+                   cudaMemcpyHostToDevice));
+  fprintf(stderr, "Scheduled a cudaMemcpy, calling the kernel\n");
+
+  saxpy<<<(N + 255) / 256, 256>>>(N, 2.0f, xDevice, yDevice);
+  fprintf(stderr, "Scheduled a kernel call\n");
+  CHECK(cudaGetLastError());
+
+  CHECK(cudaMemcpy(yHost.data(), yDevice, N * sizeof(float),
+                   cudaMemcpyDeviceToHost));
+
+  float maxError = 0.0f;
+  for (int i = 0; i < N; i++)
+    maxError = max(maxError, abs(yHost[i] - 4.0f));
+  fprintf(stderr, "Max error: %f\n", maxError);
+
+  CHECK(cudaFree(xDevice));
+  CHECK(cudaFree(yDevice));
+}
--- a/pkgs/development/cuda-modules/packages/setupCudaHook/package.nix
+++ b/pkgs/development/cuda-modules/packages/setupCudaHook/package.nix
@@ -0,0 +1,14 @@
+# Currently propagated by cuda_nvcc or cudatoolkit, rather than used directly
+{ makeSetupHook, backendStdenv }:
+makeSetupHook {
+  name = "setup-cuda-hook";
+
+  substitutions.setupCudaHook = placeholder "out";
+
+  # Point NVCC at a compatible compiler
+  substitutions.ccRoot = "${backendStdenv.cc}";
+
+  # Required in addition to ccRoot as otherwise bin/gcc is looked up
+  # when building CMakeCUDACompilerId.cu
+  substitutions.ccFullPath = "${backendStdenv.cc}/bin/${backendStdenv.cc.targetPrefix}c++";
+} ./setup-cuda-hook.sh
--- a/pkgs/development/cuda-modules/packages/setupCudaHook/setup-cuda-hook.sh
+++ b/pkgs/development/cuda-modules/packages/setupCudaHook/setup-cuda-hook.sh
@@ -0,0 +1,128 @@
+# shellcheck shell=bash
+
+# Only run the hook from nativeBuildInputs
+(( "$hostOffset" == -1 && "$targetOffset" == 0)) || return 0
+
+guard=Sourcing
+reason=
+
+[[ -n ${cudaSetupHookOnce-} ]] && guard=Skipping && reason=" because the hook has been propagated more than once"
+
+if (( "${NIX_DEBUG:-0}" >= 1 )) ; then
+    echo "$guard hostOffset=$hostOffset targetOffset=$targetOffset setup-cuda-hook$reason" >&2
+else
+    echo "$guard setup-cuda-hook$reason" >&2
+fi
+
+[[ "$guard" = Sourcing ]] || return 0
+
+declare -g cudaSetupHookOnce=1
+declare -Ag cudaHostPathsSeen=()
+declare -Ag cudaOutputToPath=()
+
+extendcudaHostPathsSeen() {
+    (( "${NIX_DEBUG:-0}" >= 1 )) && echo "extendcudaHostPathsSeen $1" >&2
+
+    local markerPath="$1/nix-support/include-in-cudatoolkit-root"
+    [[ ! -f "${markerPath}" ]] && return 0
+    [[ -v cudaHostPathsSeen[$1] ]] && return 0
+
+    cudaHostPathsSeen["$1"]=1
+
+    # E.g. cuda_cudart-lib
+    local cudaOutputName
+    # Fail gracefully if the file is empty.
+    # One reason the file may be empty: the package was built with strictDeps set, but the current build does not have
+    # strictDeps set.
+    read -r cudaOutputName < "$markerPath" || return 0
+
+    [[ -z "$cudaOutputName" ]] && return 0
+
+    local oldPath="${cudaOutputToPath[$cudaOutputName]-}"
+    [[ -n "$oldPath" ]] && echo "extendcudaHostPathsSeen: warning: overwriting $cudaOutputName from $oldPath to $1" >&2
+    cudaOutputToPath["$cudaOutputName"]="$1"
+}
+addEnvHooks "$targetOffset" extendcudaHostPathsSeen
+
+setupCUDAToolkit_ROOT() {
+    (( "${NIX_DEBUG:-0}" >= 1 )) && echo "setupCUDAToolkit_ROOT: cudaHostPathsSeen=${!cudaHostPathsSeen[*]}" >&2
+
+    for path in "${!cudaHostPathsSeen[@]}" ; do
+        addToSearchPathWithCustomDelimiter ";" CUDAToolkit_ROOT "$path"
+        if [[ -d "$path/include" ]] ; then
+            addToSearchPathWithCustomDelimiter ";" CUDAToolkit_INCLUDE_DIR "$path/include"
+        fi
+    done
+
+    # Use array form so semicolon-separated lists are passed safely.
+    if [[ -n "${CUDAToolkit_INCLUDE_DIR-}" ]]; then
+        cmakeFlagsArray+=("-DCUDAToolkit_INCLUDE_DIR=${CUDAToolkit_INCLUDE_DIR}")
+    fi
+    if [[ -n "${CUDAToolkit_ROOT-}" ]]; then
+        cmakeFlagsArray+=("-DCUDAToolkit_ROOT=${CUDAToolkit_ROOT}")
+    fi
+}
+preConfigureHooks+=(setupCUDAToolkit_ROOT)
+
+setupCUDAToolkitCompilers() {
+    echo Executing setupCUDAToolkitCompilers >&2
+
+    if [[ -n "${dontSetupCUDAToolkitCompilers-}" ]] ; then
+        return 0
+    fi
+
+    # Point NVCC at a compatible compiler
+
+    # For CMake-based projects:
+    # https://cmake.org/cmake/help/latest/module/FindCUDA.html#input-variables
+    # https://cmake.org/cmake/help/latest/envvar/CUDAHOSTCXX.html
+    # https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_HOST_COMPILER.html
+
+    appendToVar cmakeFlags "-DCUDA_HOST_COMPILER=@ccFullPath@"
+    appendToVar cmakeFlags "-DCMAKE_CUDA_HOST_COMPILER=@ccFullPath@"
+
+    # For non-CMake projects:
+    # We prepend --compiler-bindir to nvcc flags.
+    # Downstream packages can override these, because NVCC
+    # uses the last --compiler-bindir it gets on the command line.
+    # FIXME: this results in "incompatible redefinition" warnings.
+    # https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#compiler-bindir-directory-ccbin
+    if [ -z "${CUDAHOSTCXX-}" ]; then
+      export CUDAHOSTCXX="@ccFullPath@";
+    fi
+
+    appendToVar NVCC_PREPEND_FLAGS "--compiler-bindir=@ccRoot@/bin"
+
+    # NOTE: We set -Xfatbin=-compress-all, which reduces the size of the compiled
+    #   binaries. If binaries grow over 2GB, they will fail to link. This is a problem for us, as
+    #   the default set of CUDA capabilities we build can regularly cause this to occur (for
+    #   example, with Magma).
+    #
+    # @SomeoneSerge: original comment was made by @ConnorBaker in .../cudatoolkit/common.nix
+    if [[ -z "${dontCompressFatbin-}" ]]; then
+        appendToVar NVCC_PREPEND_FLAGS "-Xfatbin=-compress-all"
+    fi
+}
+preConfigureHooks+=(setupCUDAToolkitCompilers)
+
+propagateCudaLibraries() {
+    (( "${NIX_DEBUG:-0}" >= 1 )) && echo "propagateCudaLibraries: cudaPropagateToOutput=$cudaPropagateToOutput cudaHostPathsSeen=${!cudaHostPathsSeen[*]}" >&2
+
+    [[ -z "${cudaPropagateToOutput-}" ]] && return 0
+
+    mkdir -p "${!cudaPropagateToOutput}/nix-support"
+    # One'd expect this should be propagated-bulid-build-deps, but that doesn't seem to work
+    echo "@setupCudaHook@" >> "${!cudaPropagateToOutput}/nix-support/propagated-native-build-inputs"
+
+    local propagatedBuildInputs=( "${!cudaHostPathsSeen[@]}" )
+    for output in $(getAllOutputNames) ; do
+        if [[ ! "$output" = "$cudaPropagateToOutput" ]] ; then
+            appendToVar propagatedBuildInputs "${!output}"
+        fi
+        break
+    done
+
+    # One'd expect this should be propagated-host-host-deps, but that doesn't seem to work
+    printWords "${propagatedBuildInputs[@]}" >> "${!cudaPropagateToOutput}/nix-support/propagated-build-inputs"
+}
+postFixupHooks+=(propagateCudaLibraries)
--- a/pkgs/development/cuda-modules/packages/writeGpuTestPython.nix
+++ b/pkgs/development/cuda-modules/packages/writeGpuTestPython.nix
@@ -0,0 +1,77 @@
+{
+  lib,
+  runCommand,
+  python3Packages,
+  makeWrapper,
+  writableTmpDirAsHomeHook,
+}:
+{
+  feature ? "cuda",
+  name ? if feature == null then "cpu" else feature,
+  libraries ? [ ], # [PythonPackage] | (PackageSet -> [PythonPackage])
+  gpuCheckArgs ? { },
+  ...
+}@args:
+
+let
+  inherit (builtins) isFunction all;
+  librariesFun = if isFunction libraries then libraries else (_: libraries);
+in
+
+assert lib.assertMsg (
+  isFunction libraries || all (python3Packages.hasPythonModule) libraries
+) "writeGpuTestPython was passed `libraries` from the wrong python release";
+
+content:
+
+let
+  interpreter = python3Packages.python.withPackages librariesFun;
+  tester =
+    runCommand "tester-${name}"
+      (
+        lib.removeAttrs args [
+          "gpuCheckArgs"
+          "libraries"
+          "name"
+        ]
+        // {
+          inherit content;
+          nativeBuildInputs = args.nativeBuildInputs or [ ] ++ [ makeWrapper ];
+          passAsFile = args.passAsFile or [ ] ++ [ "content" ];
+        }
+      )
+      ''
+        mkdir -p "$out"/bin
+        cat << EOF >"$out/bin/$name"
+        #!${lib.getExe interpreter}
+        EOF
+        cat "$contentPath" >>"$out/bin/$name"
+        chmod +x "$out/bin/$name"
+
+        if [[ -n "''${makeWrapperArgs+''${makeWrapperArgs[@]}}" ]] ; then
+          wrapProgram "$out/bin/$name" ''${makeWrapperArgs[@]}
+        fi
+      '';
+  tester' = tester.overrideAttrs (oldAttrs: {
+    passthru.gpuCheck =
+      runCommand "test-${name}"
+        (
+          gpuCheckArgs
+          // {
+            nativeBuildInputs = [
+              tester'
+            ]
+            ++ gpuCheckArgs.nativeBuildInputs or [ ];
+
+            requiredSystemFeatures =
+              lib.optionals (feature != null) [ feature ] ++ gpuCheckArgs.requiredSystemFeatures or [ ];
+          }
+        )
+        ''
+          set -e
+          ${tester.meta.mainProgram or (lib.getName tester')}
+          touch $out
+        '';
+  });
+in
+tester'
--- a/pkgs/development/cuda-modules/tensorrt/releases.nix
+++ b/pkgs/development/cuda-modules/tensorrt/releases.nix
@@ -0,0 +1,50 @@
+# NOTE: Check https://developer.nvidia.com/nvidia-tensorrt-8x-download
+#             https://developer.nvidia.com/nvidia-tensorrt-10x-download
+
+# Version policy is to keep the latest minor release for each major release.
+{
+  tensorrt.releases = {
+    # jetson
+    linux-aarch64 = [ ];
+    # powerpc
+    linux-ppc64le = [ ];
+    # server-grade arm
+    linux-sbsa = [
+      {
+        version = "10.8.0.43";
+        minCudaVersion = "12.8";
+        maxCudaVersion = "12.8";
+        cudnnVersion = "9.7";
+        filename = "TensorRT-10.8.0.43.Linux.aarch64-gnu.cuda-12.8.tar.gz";
+        hash = "sha256-sB5d0sfGQyUhGdA9ku6pcCNBjpL0Wjvg0Ilulikj5Do=";
+      }
+      {
+        version = "10.9.0.34";
+        minCudaVersion = "12.8";
+        maxCudaVersion = "12.8";
+        cudnnVersion = "9.7";
+        filename = "TensorRT-10.9.0.34.Linux.aarch64-gnu.cuda-12.8.tar.gz";
+        hash = "sha256-uB7CoGf2fwgsE8rsLc71Q4W0Kp3mpOyubzGKotQZZPI=";
+      }
+    ];
+    # x86_64
+    linux-x86_64 = [
+      {
+        version = "10.8.0.43";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        cudnnVersion = "9.7";
+        filename = "TensorRT-10.8.0.43.Linux.x86_64-gnu.cuda-12.8.tar.gz";
+        hash = "sha256-V31tivU4FTQUuYZ8ZmtPZYUvwusefA6jogbl+vvH1J4=";
+      }
+      {
+        version = "10.9.0.34";
+        minCudaVersion = "12.0";
+        maxCudaVersion = "12.8";
+        cudnnVersion = "9.7";
+        filename = "TensorRT-10.9.0.34.Linux.x86_64-gnu.cuda-12.8.tar.gz";
+        hash = "sha256-M74OYeO/F3u7yrtIkr8BPwyKxx0r5z8oA4SKOCyxQnI=";
+      }
+    ];
+  };
+}
--- a/pkgs/development/cuda-modules/tensorrt/shims.nix
+++ b/pkgs/development/cuda-modules/tensorrt/shims.nix
@@ -0,0 +1,24 @@
+# Shims to mimic the shape of ../modules/generic/manifests/{feature,redistrib}/release.nix
+{
+  package,
+  # redistSystem :: String
+  # String is `"unsupported"` if the given architecture is unsupported.
+  redistSystem,
+}:
+{
+  featureRelease = {
+    inherit (package) cudnnVersion minCudaVersion maxCudaVersion;
+    ${redistSystem}.outputs = {
+      bin = true;
+      lib = true;
+      static = true;
+      dev = true;
+      sample = true;
+      python = true;
+    };
+  };
+  redistribRelease = {
+    name = "TensorRT: a high-performance deep learning interface";
+    inherit (package) hash filename version;
+  };
+}
--- a/pkgs/development/cuda-modules/tests/flags.nix
+++ b/pkgs/development/cuda-modules/tests/flags.nix
@@ -0,0 +1,79 @@
+{
+  _cuda,
+  cudaNamePrefix,
+  lib,
+  runCommand,
+}:
+let
+  inherit (builtins) deepSeq toJSON tryEval;
+  inherit (_cuda.bootstrapData) cudaCapabilityToInfo;
+  inherit (_cuda.lib) formatCapabilities;
+  inherit (lib.asserts) assertMsg;
+in
+# When changing names or formats: pause, validate, and update the assert
+assert assertMsg (
+  cudaCapabilityToInfo ? "7.5" && cudaCapabilityToInfo ? "8.6"
+) "The following test requires both 7.5 and 8.6 be known CUDA capabilities";
+assert
+  let
+    expected = {
+      cudaCapabilities = [
+        "7.5"
+        "8.6"
+      ];
+      cudaForwardCompat = true;
+
+      # Sorted alphabetically
+      archNames = [
+        "Ampere"
+        "Turing"
+      ];
+
+      realArches = [
+        "sm_75"
+        "sm_86"
+      ];
+
+      virtualArches = [
+        "compute_75"
+        "compute_86"
+      ];
+
+      arches = [
+        "sm_75"
+        "sm_86"
+        "compute_86"
+      ];
+
+      gencode = [
+        "-gencode=arch=compute_75,code=sm_75"
+        "-gencode=arch=compute_86,code=sm_86"
+        "-gencode=arch=compute_86,code=compute_86"
+      ];
+
+      gencodeString = "-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86";
+
+      cmakeCudaArchitecturesString = "75;86";
+    };
+    actual = formatCapabilities {
+      inherit cudaCapabilityToInfo;
+      cudaCapabilities = [
+        "7.5"
+        "8.6"
+      ];
+      cudaForwardCompat = true;
+    };
+    actualWrapped = (tryEval (deepSeq actual actual)).value;
+  in
+  assertMsg (expected == actualWrapped) ''
+    Expected: ${toJSON expected}
+    Actual: ${toJSON actualWrapped}
+  '';
+runCommand "${cudaNamePrefix}-tests-flags"
+  {
+    __structuredAttrs = true;
+    strictDeps = true;
+  }
+  ''
+    touch "$out"
+  ''
--- a/pkgs/development/cuda-modules/tests/opencv-and-torch/default.nix
+++ b/pkgs/development/cuda-modules/tests/opencv-and-torch/default.nix
@@ -0,0 +1,81 @@
+{
+  cudaPackages,
+  lib,
+  writeGpuTestPython,
+  # Configuration flags
+  openCVFirst,
+  useOpenCVDefaultCuda,
+  useTorchDefaultCuda,
+}:
+let
+  inherit (lib.strings) optionalString;
+
+  openCVBlock = ''
+
+    import cv2
+    print("OpenCV version:", cv2.__version__)
+
+    # Ensure OpenCV can access the GPU.
+    assert cv2.cuda.getCudaEnabledDeviceCount() > 0, "No CUDA devices found for OpenCV"
+    print("OpenCV CUDA device:", cv2.cuda.printCudaDeviceInfo(cv2.cuda.getDevice()))
+
+    # Ensure OpenCV can access the GPU.
+    print(cv2.getBuildInformation())
+
+    a = cv2.cuda.GpuMat(size=(256, 256), type=cv2.CV_32S, s=1)
+    b = cv2.cuda.GpuMat(size=(256, 256), type=cv2.CV_32S, s=1)
+    c = int(cv2.cuda.sum(cv2.cuda.add(a, b))[0]) # OpenCV returns a Scalar float object.
+
+    assert c == 2 * 256 * 256, f"Expected {2 * 256 * 256} OpenCV, got {c}"
+
+  '';
+
+  torchBlock = ''
+
+    import torch
+    print("Torch version:", torch.__version__)
+
+    # Set up the GPU.
+    torch.cuda.init()
+    # Ensure the GPU is available.
+    assert torch.cuda.is_available(), "CUDA is not available to Torch"
+    print("Torch CUDA device:", torch.cuda.get_device_properties(torch.cuda.current_device()))
+
+    a = torch.ones(256, 256, dtype=torch.int32).cuda()
+    b = torch.ones(256, 256, dtype=torch.int32).cuda()
+    c = (a + b).sum().item()
+    assert c == 2 * 256 * 256, f"Expected {2 * 256 * 256} for Torch, got {c}"
+
+  '';
+
+  content = if openCVFirst then openCVBlock + torchBlock else torchBlock + openCVBlock;
+
+  torchName = "torch" + optionalString useTorchDefaultCuda "-with-default-cuda";
+  openCVName = "opencv4" + optionalString useOpenCVDefaultCuda "-with-default-cuda";
+in
+# TODO: Ensure the expected CUDA libraries are loaded.
+# TODO: Ensure GPU access works as expected.
+writeGpuTestPython {
+  name = if openCVFirst then "${openCVName}-then-${torchName}" else "${torchName}-then-${openCVName}";
+  libraries =
+    # NOTE: These are purposefully in this order.
+    pythonPackages:
+    let
+      effectiveOpenCV = pythonPackages.opencv4.override (prevAttrs: {
+        cudaPackages = if useOpenCVDefaultCuda then prevAttrs.cudaPackages else cudaPackages;
+      });
+      effectiveTorch = pythonPackages.torchWithCuda.override (prevAttrs: {
+        cudaPackages = if useTorchDefaultCuda then prevAttrs.cudaPackages else cudaPackages;
+      });
+    in
+    if openCVFirst then
+      [
+        effectiveOpenCV
+        effectiveTorch
+      ]
+    else
+      [
+        effectiveTorch
+        effectiveOpenCV
+      ];
+} content
				`@@ -0,0 +1 @@`
				`{ cuda_cupti }: prevAttrs: { buildInputs = prevAttrs.buildInputs or [ ] ++ [ cuda_cupti ]; }`
				`@@ -0,0 +1 @@`
				`{ zlib }: prevAttrs: { buildInputs = prevAttrs.buildInputs or [ ] ++ [ zlib ]; }`