push sheeet
Some checks failed
Periodic Merges (6h) / master → staging-nixos (push) Failing after 12m50s
Periodic Merges (6h) / master → staging-next (push) Failing after 12m54s
Periodic Merges (24h) / merge-base(master,staging) → haskell-updates (push) Failing after 11m54s
Periodic Merges (6h) / staging-next → staging (push) Failing after 12m13s
Periodic Merges (24h) / staging-next-25.05 → staging-25.05 (push) Failing after 13m24s
Periodic Merges (24h) / release-25.05 → staging-next-25.05 (push) Failing after 14m28s

This commit is contained in:
Dark Steveneq
2025-10-09 14:15:47 +02:00
commit 646b892680
49168 changed files with 5897842 additions and 0 deletions

View File

@@ -0,0 +1,90 @@
# CUDA Modules
> [!NOTE]
> This document is meant to help CUDA maintainers understand the structure of
> the CUDA packages in Nixpkgs. It is not meant to be a user-facing document.
> For a user-facing document, see [the CUDA section of the manual](../../../doc/languages-frameworks/cuda.section.md).
The files in this directory are added (in some way) to the `cudaPackages`
package set by [cuda-packages.nix](../../top-level/cuda-packages.nix).
## Top-level directories
- `cuda`: CUDA redistributables! Provides extension to `cudaPackages` scope.
- `cudatoolkit`: monolithic CUDA Toolkit run-file installer. Provides extension
to `cudaPackages` scope.
- `cudnn`: NVIDIA cuDNN library.
- `cutensor`: NVIDIA cuTENSOR library.
- `fixups`: Each file or directory (excluding `default.nix`) should contain a
`callPackage`-able expression to be provided to the `overrideAttrs` attribute
of a package produced by the generic manifest builder.
These fixups are applied by `pname`, so packages with multiple versions
(e.g., `cudnn`, `cudnn_8_9`, etc.) all share a single fixup function
(i.e., `fixups/cudnn.nix`).
- `generic-builders`:
- Contains a builder `manifest.nix` which operates on the `Manifest` type
defined in `modules/generic/manifests`. Most packages are built using this
builder.
- Contains a builder `multiplex.nix` which leverages the Manifest builder. In
short, the Multiplex builder adds multiple versions of a single package to
single instance of the CUDA Packages package set. It is used primarily for
packages like `cudnn` and `cutensor`.
- `modules`: Nixpkgs modules to check the shape and content of CUDA
redistributable and feature manifests. These modules additionally use shims
provided by some CUDA packages to allow them to re-use the
`genericManifestBuilder`, even if they don't have manifest files of their
own. `cudnn` and `tensorrt` are examples of packages which provide such
shims. These modules are further described in the
[Modules](./modules/README.md) documentation.
- `packages`: Contains packages which exist in every instance of the CUDA
package set. These packages are built in a `by-name` fashion.
- `setup-hooks`: Nixpkgs setup hooks for CUDA.
- `tensorrt`: NVIDIA TensorRT library.
## Distinguished packages
### CUDA Compatibility
[CUDA Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/),
available as `cudaPackages.cuda_compat`, is a component which makes it possible
to run applications built against a newer CUDA toolkit (for example CUDA 12) on
a machine with an older CUDA driver (for example CUDA 11), which isn't possible
out of the box. At the time of writing, CUDA Compatibility is only available on
the Nvidia Jetson architecture, but Nvidia might release support for more
architectures in the future.
As CUDA Compatibility strictly increases the range of supported applications, we
try our best to enable it by default on supported platforms.
#### Functioning
`cuda_compat` simply provides a new `libcuda.so` (and associated variants) that
needs to be used in place of the default CUDA driver's `libcuda.so`. However,
the other shared libraries of the default driver must still be accessible:
`cuda_compat` isn't a complete drop-in replacement for the driver (and that's
the point, otherwise, it would just be a newer driver).
Nvidia's recommendation is to set `LD_LIBRARY_PATH` to point to `cuda_compat`'s
driver. This is fine for a manual, one-shot usage, but in general setting
`LD_LIBRARY_PATH` is a red flag. This is global state which short-circuits most
of other dynamic library resolution mechanisms and can break things in
non-obvious ways, especially with other Nix-built software.
#### CUDA Compat with Nix
Since `cuda_compat` is a known derivation, the easy way to do this in Nix would
be to add `cuda_compat` as a dependency of CUDA libraries and applications and
let Nix do its magic by filling the `DT_RUNPATH` fields. However,
`cuda_compat` itself depends on `libnvrm_mem` and `libnvrm_gpu` which are loaded
dynamically at runtime from `/run/opengl-driver`. This doesn't please the Nix
sandbox when building, which can't find those (a second minor issue is that
`addOpenGLRunpathHook` prepends the `/run/opengl-driver` path, so that would
still take precedence).
The current solution is to do something similar to `addOpenGLRunpathHook`: the
`addCudaCompatRunpathHook` prepends to the path to `cuda_compat`'s `libcuda.so`
to the `DT_RUNPATH` of whichever package includes the hook as a dependency, and
we include the hook by default for packages in `cudaPackages` (by adding it as a
inputs in `genericManifestBuilder`). We also make sure it's included after
`addOpenGLRunpathHook`, so that it appears _before_ in the `DT_RUNPATH` and
takes precedence.

View File

@@ -0,0 +1,273 @@
{ lib }:
{
/**
Attribute set of supported CUDA capability mapped to information about that capability.
NOTE: For more on baseline, architecture-specific, and family-specific feature sets, see
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features.
NOTE: For information on when support for a given architecture was added, see
https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes
NOTE: For baseline feature sets, `dontDefaultAfterCudaMajorMinorVersion` is generally set to the CUDA release
immediately prior to TensorRT removing support for that architecture.
Many thanks to Arnon Shimoni for maintaining a list of these architectures and capabilities.
Without your work, this would have been much more difficult.
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
# Type
```
cudaCapabilityToInfo ::
AttrSet
CudaCapability
{ archName :: String
, cudaCapability :: CudaCapability
, isJetson :: Bool
, isArchitectureSpecific :: Bool
, isFamilySpecific :: Bool
, minCudaMajorMinorVersion :: MajorMinorVersion
, maxCudaMajorMinorVersion :: MajorMinorVersion
, dontDefaultAfterCudaMajorMinorVersion :: Null | MajorMinorVersion
}
```
`archName`
: The name of the microarchitecture
`cudaCapability`
: The CUDA capability
`isJetson`
: Whether this capability is part of NVIDIA's line of Jetson embedded computers. This field is notable
because it tells us what architecture to build for (as Jetson devices are aarch64).
More on Jetson devices here: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/
NOTE: These architectures are only built upon request.
`isArchitectureSpecific`
: Whether this capability is an architecture-specific feature set.
NOTE: These architectures are only built upon request.
`isFamilySpecific`
: Whether this capability is a family-specific feature set.
NOTE: These architectures are only built upon request.
`minCudaMajorMinorVersion`
: The minimum (inclusive) CUDA version that supports this capability.
`maxCudaMajorMinorVersion`
: The maximum (exclusive) CUDA version that supports this capability.
`null` means there is no maximum.
`dontDefaultAfterCudaMajorMinorVersion`
: The CUDA version after which to exclude this capability from the list of default capabilities we build.
*/
cudaCapabilityToInfo =
lib.mapAttrs
(
cudaCapability:
# Supplies default values.
{
archName,
isJetson ? false,
isArchitectureSpecific ? (lib.hasSuffix "a" cudaCapability),
isFamilySpecific ? (lib.hasSuffix "f" cudaCapability),
minCudaMajorMinorVersion,
maxCudaMajorMinorVersion ? null,
dontDefaultAfterCudaMajorMinorVersion ? null,
}:
{
inherit
archName
cudaCapability
isJetson
isArchitectureSpecific
isFamilySpecific
minCudaMajorMinorVersion
maxCudaMajorMinorVersion
dontDefaultAfterCudaMajorMinorVersion
;
}
)
{
# Tesla/Quadro M series
"5.0" = {
archName = "Maxwell";
minCudaMajorMinorVersion = "10.0";
dontDefaultAfterCudaMajorMinorVersion = "11.0";
};
# Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X
"5.2" = {
archName = "Maxwell";
minCudaMajorMinorVersion = "10.0";
dontDefaultAfterCudaMajorMinorVersion = "11.0";
};
# Quadro GP100, Tesla P100, DGX-1 (Generic Pascal)
"6.0" = {
archName = "Pascal";
minCudaMajorMinorVersion = "10.0";
# Removed from TensorRT 10.0, which corresponds to CUDA 12.4 release.
# https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1001/support-matrix/index.html
dontDefaultAfterCudaMajorMinorVersion = "12.3";
};
# GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030 (GP108), GT 1010 (GP108) Titan Xp, Tesla
# P40, Tesla P4, Discrete GPU on the NVIDIA Drive PX2
"6.1" = {
archName = "Pascal";
minCudaMajorMinorVersion = "10.0";
# Removed from TensorRT 10.0, which corresponds to CUDA 12.4 release.
# https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1001/support-matrix/index.html
dontDefaultAfterCudaMajorMinorVersion = "12.3";
};
# DGX-1 with Volta, Tesla V100, GTX 1180 (GV104), Titan V, Quadro GV100
"7.0" = {
archName = "Volta";
minCudaMajorMinorVersion = "10.0";
# Removed from TensorRT 10.5, which corresponds to CUDA 12.6 release.
# https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1050/support-matrix/index.html
dontDefaultAfterCudaMajorMinorVersion = "12.5";
};
# GTX/RTX Turing GTX 1660 Ti, RTX 2060, RTX 2070, RTX 2080, Titan RTX, Quadro RTX 4000,
# Quadro RTX 5000, Quadro RTX 6000, Quadro RTX 8000, Quadro T1000/T2000, Tesla T4
"7.5" = {
archName = "Turing";
minCudaMajorMinorVersion = "10.0";
};
# NVIDIA A100 (the name “Tesla” has been dropped GA100), NVIDIA DGX-A100
"8.0" = {
archName = "Ampere";
minCudaMajorMinorVersion = "11.2";
};
# Tesla GA10x cards, RTX Ampere RTX 3080, GA102 RTX 3090, RTX A2000, A3000, RTX A4000,
# A5000, A6000, NVIDIA A40, GA106 RTX 3060, GA104 RTX 3070, GA107 RTX 3050, RTX A10, RTX
# A16, RTX A40, A2 Tensor Core GPU
"8.6" = {
archName = "Ampere";
minCudaMajorMinorVersion = "11.2";
};
# Jetson AGX Orin and Drive AGX Orin only
"8.7" = {
archName = "Ampere";
minCudaMajorMinorVersion = "11.5";
isJetson = true;
};
# NVIDIA GeForce RTX 4090, RTX 4080, RTX 6000, Tesla L40
"8.9" = {
archName = "Ada";
minCudaMajorMinorVersion = "11.8";
};
# NVIDIA H100 (GH100)
"9.0" = {
archName = "Hopper";
minCudaMajorMinorVersion = "11.8";
};
"9.0a" = {
archName = "Hopper";
minCudaMajorMinorVersion = "12.0";
};
# NVIDIA B100
"10.0" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.7";
};
"10.0a" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.7";
};
"10.0f" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.9";
};
# NVIDIA Jetson Thor Blackwell
"10.1" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.7";
isJetson = true;
};
"10.1a" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.7";
isJetson = true;
};
"10.1f" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.9";
isJetson = true;
};
# NVIDIA ???
"10.3" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.9";
};
"10.3a" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.9";
};
"10.3f" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.9";
};
# NVIDIA GeForce RTX 5090 (GB202) etc.
"12.0" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.8";
};
"12.0a" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.8";
};
"12.0f" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.9";
};
# NVIDIA ???
"12.1" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.9";
};
"12.1a" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.9";
};
"12.1f" = {
archName = "Blackwell";
minCudaMajorMinorVersion = "12.9";
};
};
}

View File

@@ -0,0 +1,30 @@
{ lib }:
{
# See ./cuda.nix for documentation.
inherit (import ./cuda.nix { inherit lib; })
cudaCapabilityToInfo
;
# See ./nvcc.nix for documentation.
inherit (import ./nvcc.nix)
nvccCompatibilities
;
# See ./redist.nix for documentation.
inherit (import ./redist.nix)
redistNames
redistSystems
redistUrlPrefix
;
/**
The path to the CUDA packages root directory, for use with `callPackage` to create new package sets.
# Type
```
cudaPackagesPath :: Path
```
*/
cudaPackagesPath = ./../../..;
}

View File

@@ -0,0 +1,70 @@
{
/**
Mapping of CUDA versions to NVCC compatibilities
Taken from
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#host-compiler-support-policy
NVCC performs a version check on the host compiler's major version and so newer minor versions
of the compilers listed below will be supported, but major versions falling outside the range
will not be supported.
NOTE: These constraints don't apply to Jetson, which uses something else.
NOTE: NVIDIA can and will add support for newer compilers even during patch releases.
E.g.: CUDA 12.2.1 maxxed out with support for Clang 15.0; 12.2.2 added support for Clang 16.0.
NOTE: Because all platforms NVIDIA supports use GCC and Clang, we omit the architectures here.
# Type
```
nvccCompatibilities ::
AttrSet
String
{ clang :: { maxMajorVersion :: String, minMajorVersion :: String }
, gcc :: { maxMajorVersion :: String, minMajorVersion :: String }
}
```
*/
nvccCompatibilities = {
# Our baseline
# https://docs.nvidia.com/cuda/archive/12.6.0/cuda-installation-guide-linux/index.html#host-compiler-support-policy
"12.6" = {
clang = {
maxMajorVersion = "18";
minMajorVersion = "7";
};
gcc = {
maxMajorVersion = "13";
minMajorVersion = "6";
};
};
# Maximum Clang version is 19, maximum GCC version is 14
# https://docs.nvidia.com/cuda/archive/12.8.1/cuda-installation-guide-linux/index.html#host-compiler-support-policy
"12.8" = {
clang = {
maxMajorVersion = "19";
minMajorVersion = "7";
};
gcc = {
maxMajorVersion = "14";
minMajorVersion = "6";
};
};
# No changes from 12.8 to 12.9
# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#host-compiler-support-policy
"12.9" = {
clang = {
maxMajorVersion = "19";
minMajorVersion = "7";
};
gcc = {
maxMajorVersion = "14";
minMajorVersion = "6";
};
};
};
}

View File

@@ -0,0 +1,56 @@
{
/**
A list of redistributable names to use in creation of the `redistName` option type.
# Type
```
redistNames :: [String]
```
*/
redistNames = [
"cublasmp"
"cuda"
"cudnn"
"cudss"
"cuquantum"
"cusolvermp"
"cusparselt"
"cutensor"
"nppplus"
"nvcomp"
# "nvidia-driver", # NOTE: Some of the earlier manifests don't follow our scheme.
"nvjpeg2000"
"nvpl"
"nvtiff"
"tensorrt" # NOTE: not truly a redist; uses different naming convention
];
/**
A list of redistributable systems to use in creation of the `redistSystem` option type.
# Type
```
redistSystems :: [String]
```
*/
redistSystems = [
"linux-aarch64"
"linux-all" # Taken to mean all other linux systems
"linux-sbsa"
"linux-x86_64"
"source" # Source-agnostic platform
];
/**
The prefix of the URL for redistributable files.
# Type
```
redistUrlPrefix :: String
```
*/
redistUrlPrefix = "https://developer.download.nvidia.com/compute";
}

View File

@@ -0,0 +1,65 @@
{
lib,
bootstrapData,
db,
}:
bootstrapData
// {
/**
All CUDA capabilities, sorted by version.
NOTE: Since the capabilities are sorted by version and architecture/family-specific features are
appended to the minor version component, the sorted list groups capabilities by baseline feature
set.
# Type
```
allSortedCudaCapabilities :: [CudaCapability]
```
# Example
```
allSortedCudaCapabilities = [
"5.0"
"5.2"
"6.0"
"6.1"
"7.0"
"7.2"
"7.5"
"8.0"
"8.6"
"8.7"
"8.9"
"9.0"
"9.0a"
"10.0"
"10.0a"
"10.0f"
"10.1"
"10.1a"
"10.1f"
"10.3"
"10.3a"
"10.3f"
];
```
*/
allSortedCudaCapabilities = lib.sort lib.versionOlder (lib.attrNames db.cudaCapabilityToInfo);
/**
Mapping of CUDA micro-architecture name to capabilities belonging to that micro-architecture.
# Type
```
cudaArchNameToCapabilities :: AttrSet NonEmptyStr (NonEmptyListOf CudaCapability)
```
*/
cudaArchNameToCapabilities = lib.groupBy (
cudaCapability: db.cudaCapabilityToInfo.${cudaCapability}.archName
) db.allSortedCudaCapabilities;
}

View File

@@ -0,0 +1,31 @@
# The _cuda attribute set is a fixed-point which contains the static functionality required to construct CUDA package
# sets. For example, `_cuda.bootstrapData` includes information about NVIDIA's redistributables (such as the names
# NVIDIA uses for different systems), `_cuda.lib` contains utility functions like `formatCapabilities` (which generate
# common arguments passed to NVCC and `cmakeFlags`), and `_cuda.fixups` contains `callPackage`-able functions which
# are provided to the corresponding package's `overrideAttrs` attribute to provide package-specific fixups
# out of scope of the generic redistributable builder.
#
# Since this attribute set is used to construct the CUDA package sets, it must exist outside the fixed point of the
# package sets. Make these attributes available directly in the package set construction could cause confusion if
# users override the attribute set with the expection that changes will be reflected in the enclosing CUDA package
# set. To avoid this, we declare `_cuda` and inherit its members here, at top-level. (This also allows us to benefit
# from import caching, as it should be evaluated once per system, rather than per-system and CUDA package set.)
let
lib = import ../../../../lib;
in
lib.fixedPoints.makeExtensible (final: {
bootstrapData = import ./db/bootstrap {
inherit lib;
};
db = import ./db {
inherit (final) bootstrapData db;
inherit lib;
};
extensions = [ ]; # Extensions applied to every CUDA package set.
fixups = import ./fixups { inherit lib; };
lib = import ./lib {
_cuda = final;
inherit lib;
};
})

View File

@@ -0,0 +1,12 @@
{ flags, lib }:
prevAttrs: {
autoPatchelfIgnoreMissingDeps = prevAttrs.autoPatchelfIgnoreMissingDeps or [ ] ++ [
"libnvrm_gpu.so"
"libnvrm_mem.so"
"libnvdla_runtime.so"
];
# `cuda_compat` only works on aarch64-linux, and only when building for Jetson devices.
badPlatformsConditions = prevAttrs.badPlatformsConditions or { } // {
"Trying to use cuda_compat on aarch64-linux targeting non-Jetson devices" = !flags.isJetsonBuild;
};
}

View File

@@ -0,0 +1,37 @@
# TODO(@connorbaker): cuda_cudart.dev depends on crt/host_config.h, which is from
# (getDev cuda_nvcc). It would be nice to be able to encode that.
{ addDriverRunpath, lib }:
prevAttrs: {
# Remove once cuda-find-redist-features has a special case for libcuda
outputs =
prevAttrs.outputs or [ ]
++ lib.lists.optionals (!(builtins.elem "stubs" prevAttrs.outputs)) [ "stubs" ];
allowFHSReferences = false;
# The libcuda stub's pkg-config doesn't follow the general pattern:
postPatch =
prevAttrs.postPatch or ""
+ ''
while IFS= read -r -d $'\0' path; do
sed -i \
-e "s|^libdir\s*=.*/lib\$|libdir=''${!outputLib}/lib/stubs|" \
-e "s|^Libs\s*:\(.*\)\$|Libs: \1 -Wl,-rpath,${addDriverRunpath.driverLink}/lib|" \
"$path"
done < <(find -iname 'cuda-*.pc' -print0)
''
# Namelink may not be enough, add a soname.
# Cf. https://gitlab.kitware.com/cmake/cmake/-/issues/25536
+ ''
if [[ -f lib/stubs/libcuda.so && ! -f lib/stubs/libcuda.so.1 ]]; then
ln -s libcuda.so lib/stubs/libcuda.so.1
fi
'';
postFixup = prevAttrs.postFixup or "" + ''
mv "''${!outputDev}/share" "''${!outputDev}/lib"
moveToOutput lib/stubs "$stubs"
ln -s "$stubs"/lib/stubs/* "$stubs"/lib/
ln -s "$stubs"/lib/stubs "''${!outputLib}/lib/stubs"
'';
}

View File

@@ -0,0 +1,18 @@
{
libglut,
libcufft,
libcurand,
libGLU,
libglvnd,
libgbm,
}:
prevAttrs: {
buildInputs = prevAttrs.buildInputs or [ ] ++ [
libglut
libcufft
libcurand
libGLU
libglvnd
libgbm
];
}

View File

@@ -0,0 +1,35 @@
{
cudaAtLeast,
gmp,
expat,
libxcrypt-legacy,
ncurses6,
python310,
python311,
python312,
stdenv,
lib,
}:
prevAttrs: {
buildInputs =
prevAttrs.buildInputs or [ ]
++ [
gmp
libxcrypt-legacy
ncurses6
python310
python311
python312
]
# aarch64,sbsa needs expat
++ lib.lists.optionals (stdenv.hostPlatform.isAarch64) [ expat ];
installPhase =
prevAttrs.installPhase or ""
# Python 3.8 is not in nixpkgs anymore, delete Python 3.8 cuda-gdb support
# to avoid autopatchelf failing to find libpython3.8.so.
+ ''
find $bin -name '*python3.8*' -delete
find $bin -name '*python3.9*' -delete
'';
}

View File

@@ -0,0 +1,62 @@
{
lib,
backendStdenv,
setupCudaHook,
}:
prevAttrs: {
# Merge "bin" and "dev" into "out" to avoid circular references
outputs = builtins.filter (
x:
!(builtins.elem x [
"dev"
"bin"
])
) prevAttrs.outputs or [ ];
# Patch the nvcc.profile.
# Syntax:
# - `=` for assignment,
# - `?=` for conditional assignment,
# - `+=` to "prepend",
# - `=+` to "append".
# Cf. https://web.archive.org/web/20230308044351/https://arcb.csc.ncsu.edu/~mueller/cluster/nvidia/2.0/nvcc_2.0.pdf
# We set all variables with the lowest priority (=+), but we do force
# nvcc to use the fixed backend toolchain. Cf. comments in
# backend-stdenv.nix
postPatch =
prevAttrs.postPatch or ""
+ ''
substituteInPlace bin/nvcc.profile \
--replace-fail \
'$(TOP)/$(_TARGET_DIR_)/include' \
"''${!outputDev}/include"
''
+ ''
cat << EOF >> bin/nvcc.profile
# Fix a compatible backend compiler
PATH += "${backendStdenv.cc}/bin":
# Expose the split-out nvvm
LIBRARIES =+ "-L''${!outputBin}/nvvm/lib"
INCLUDES =+ "-I''${!outputBin}/nvvm/include"
EOF
'';
# Entries here will be in nativeBuildInputs when cuda_nvcc is in nativeBuildInputs.
propagatedBuildInputs = prevAttrs.propagatedBuildInputs or [ ] ++ [ setupCudaHook ];
postInstall = prevAttrs.postInstall or "" + ''
moveToOutput "nvvm" "''${!outputBin}"
'';
# The nvcc and cicc binaries contain hard-coded references to /usr
allowFHSReferences = true;
meta = prevAttrs.meta or { } // {
mainProgram = "nvcc";
};
}

View File

@@ -0,0 +1 @@
{ cuda_cupti }: prevAttrs: { buildInputs = prevAttrs.buildInputs or [ ] ++ [ cuda_cupti ]; }

View File

@@ -0,0 +1 @@
_: _: { outputs = [ "out" ]; }

View File

@@ -0,0 +1,75 @@
{
cudaOlder,
cudaMajorMinorVersion,
fetchurl,
lib,
libcublas,
patchelf,
zlib,
}:
let
inherit (lib)
attrsets
maintainers
meta
strings
;
in
finalAttrs: prevAttrs: {
src = fetchurl { inherit (finalAttrs.passthru.redistribRelease) hash url; };
# Useful for inspecting why something went wrong.
badPlatformsConditions =
let
cudaTooOld = cudaOlder finalAttrs.passthru.featureRelease.minCudaVersion;
cudaTooNew =
(finalAttrs.passthru.featureRelease.maxCudaVersion != null)
&& strings.versionOlder finalAttrs.passthru.featureRelease.maxCudaVersion cudaMajorMinorVersion;
in
prevAttrs.badPlatformsConditions or { }
// {
"CUDA version is too old" = cudaTooOld;
"CUDA version is too new" = cudaTooNew;
};
buildInputs = prevAttrs.buildInputs or [ ] ++ [
zlib
(attrsets.getLib libcublas)
];
# Tell autoPatchelf about runtime dependencies. *_infer* libraries only
# exist in CuDNN 8.
# NOTE: Versions from CUDNN releases have four components.
postFixup =
prevAttrs.postFixup or ""
+
strings.optionalString
(
strings.versionAtLeast finalAttrs.version "8.0.5.0"
&& strings.versionOlder finalAttrs.version "9.0.0.0"
)
''
${meta.getExe patchelf} $lib/lib/libcudnn.so --add-needed libcudnn_cnn_infer.so
${meta.getExe patchelf} $lib/lib/libcudnn_ops_infer.so --add-needed libcublas.so --add-needed libcublasLt.so
'';
meta = prevAttrs.meta or { } // {
homepage = "https://developer.nvidia.com/cudnn";
maintainers =
prevAttrs.meta.maintainers or [ ]
++ (with maintainers; [
mdaiter
samuela
connorbaker
]);
# TODO(@connorbaker): Temporary workaround to avoid changing the derivation hash since introducing more
# brokenConditions would change the derivation as they're top-level and __structuredAttrs is set.
teams = prevAttrs.meta.teams or [ ];
license = {
shortName = "cuDNN EULA";
fullName = "NVIDIA cuDNN Software License Agreement (EULA)";
url = "https://docs.nvidia.com/deeplearning/sdk/cudnn-sla/index.html#supplement";
free = false;
};
};
}

View File

@@ -0,0 +1,11 @@
{ lib }:
lib.concatMapAttrs (
fileName: _type:
let
# Fixup is in `./${attrName}.nix` or in `./${fileName}/default.nix`:
attrName = lib.removeSuffix ".nix" fileName;
fixup = import (./. + "/${fileName}");
isFixup = fileName != "default.nix";
in
lib.optionalAttrs isFixup { ${attrName} = fixup; }
) (builtins.readDir ./.)

View File

@@ -0,0 +1,5 @@
_: prevAttrs: {
badPlatformsConditions = prevAttrs.badPlatformsConditions or { } // {
"Package is not supported; use drivers from linuxPackages" = true;
};
}

View File

@@ -0,0 +1 @@
{ zlib }: prevAttrs: { buildInputs = prevAttrs.buildInputs or [ ] ++ [ zlib ]; }

View File

@@ -0,0 +1 @@
{ zlib }: prevAttrs: { buildInputs = prevAttrs.buildInputs or [ ] ++ [ zlib ]; }

View File

@@ -0,0 +1,12 @@
{
libcublas,
numactl,
rdma-core,
}:
prevAttrs: {
buildInputs = prevAttrs.buildInputs or [ ] ++ [
libcublas
numactl
rdma-core
];
}

View File

@@ -0,0 +1,19 @@
{
cudaAtLeast,
lib,
libcublas,
libcusparse ? null,
libnvjitlink ? null,
}:
prevAttrs: {
buildInputs = prevAttrs.buildInputs or [ ] ++ [
libcublas
libnvjitlink
libcusparse
];
brokenConditions = prevAttrs.brokenConditions or { } // {
"libnvjitlink missing (CUDA >= 12.0)" = libnvjitlink == null;
"libcusparse missing (CUDA >= 12.1)" = libcusparse == null;
};
}

View File

@@ -0,0 +1,12 @@
{
cudaAtLeast,
lib,
libnvjitlink ? null,
}:
prevAttrs: {
buildInputs = prevAttrs.buildInputs or [ ] ++ [ libnvjitlink ];
brokenConditions = prevAttrs.brokenConditions or { } // {
"libnvjitlink missing (CUDA >= 12.0)" = libnvjitlink == null;
};
}

View File

@@ -0,0 +1,23 @@
{
cuda_cudart,
lib,
libcublas,
}:
finalAttrs: prevAttrs: {
buildInputs =
prevAttrs.buildInputs or [ ]
++ [ (lib.getLib libcublas) ]
# For some reason, the 1.4.x release of cusparselt requires the cudart library.
++ lib.optionals (lib.hasPrefix "1.4" finalAttrs.version) [ (lib.getLib cuda_cudart) ];
meta = prevAttrs.meta or { } // {
description = "cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication";
homepage = "https://developer.nvidia.com/cusparselt-downloads";
maintainers = prevAttrs.meta.maintainers or [ ] ++ [ lib.maintainers.sepiabrown ];
teams = prevAttrs.meta.teams or [ ];
license = lib.licenses.unfreeRedistributable // {
shortName = "cuSPARSELt EULA";
fullName = "cuSPARSELt SUPPLEMENT TO SOFTWARE LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS";
url = "https://docs.nvidia.com/cuda/cusparselt/license.html";
};
};
}

View File

@@ -0,0 +1,23 @@
{
cuda_cudart,
lib,
libcublas,
}:
finalAttrs: prevAttrs: {
buildInputs =
prevAttrs.buildInputs or [ ]
++ [ (lib.getLib libcublas) ]
# For some reason, the 1.4.x release of cuTENSOR requires the cudart library.
++ lib.optionals (lib.hasPrefix "1.4" finalAttrs.version) [ (lib.getLib cuda_cudart) ];
meta = prevAttrs.meta or { } // {
description = "cuTENSOR: A High-Performance CUDA Library For Tensor Primitives";
homepage = "https://developer.nvidia.com/cutensor";
maintainers = prevAttrs.meta.maintainers or [ ] ++ [ lib.maintainers.obsidian-systems-maintenance ];
teams = prevAttrs.meta.teams;
license = lib.licenses.unfreeRedistributable // {
shortName = "cuTENSOR EULA";
fullName = "cuTENSOR SUPPLEMENT TO SOFTWARE LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS";
url = "https://docs.nvidia.com/cuda/cutensor/license.html";
};
};
}

View File

@@ -0,0 +1,86 @@
{
cudaAtLeast,
cudaMajorMinorVersion,
cudaOlder,
e2fsprogs,
elfutils,
flags,
gst_all_1,
lib,
libjpeg8,
qt6,
rdma-core,
stdenv,
ucx,
}:
prevAttrs:
let
qtwayland = lib.getLib qt6.qtwayland;
inherit (qt6) wrapQtAppsHook qtwebview;
archDir =
{
aarch64-linux = "linux-" + (if flags.isJetsonBuild then "v4l_l4t" else "desktop") + "-t210-a64";
x86_64-linux = "linux-desktop-glibc_2_11_3-x64";
}
.${stdenv.hostPlatform.system} or (throw "Unsupported system: ${stdenv.hostPlatform.system}");
in
{
outputs = [ "out" ]; # NOTE(@connorbaker): Force a single output so relative lookups work.
nativeBuildInputs = prevAttrs.nativeBuildInputs or [ ] ++ [ wrapQtAppsHook ];
buildInputs =
prevAttrs.buildInputs or [ ]
++ [
qtwayland
qtwebview
(qt6.qtwebengine or qt6.full)
rdma-core
]
++ lib.optionals (cudaOlder "12.7") [
e2fsprogs
ucx
]
++ lib.optionals (cudaMajorMinorVersion == "12.9") [
elfutils
];
dontWrapQtApps = true;
preInstall = prevAttrs.preInstall or "" + ''
if [[ -d nsight-compute ]]; then
nixLog "Lifting components of Nsight Compute to the top level"
mv -v nsight-compute/*/* .
nixLog "Removing empty directories"
rmdir -pv nsight-compute/*
fi
rm -rf host/${archDir}/Mesa/
'';
postInstall =
prevAttrs.postInstall or ""
+ ''
moveToOutput 'ncu' "''${!outputBin}/bin"
moveToOutput 'ncu-ui' "''${!outputBin}/bin"
moveToOutput 'host/${archDir}' "''${!outputBin}/bin"
moveToOutput 'target/${archDir}' "''${!outputBin}/bin"
wrapQtApp "''${!outputBin}/bin/host/${archDir}/ncu-ui.bin"
''
# NOTE(@connorbaker): No idea what this platform is or how to patchelf for it.
+ lib.optionalString (flags.isJetsonBuild && cudaOlder "12.9") ''
nixLog "Removing QNX 700 target directory for Jetson builds"
rm -rfv "''${!outputBin}/target/qnx-700-t210-a64"
''
+ lib.optionalString (flags.isJetsonBuild && cudaAtLeast "12.8") ''
nixLog "Removing QNX 800 target directory for Jetson builds"
rm -rfv "''${!outputBin}/target/qnx-800-tegra-a64"
'';
# lib needs libtiff.so.5, but nixpkgs provides libtiff.so.6
preFixup = prevAttrs.preFixup or "" + ''
patchelf --replace-needed libtiff.so.5 libtiff.so "''${!outputBin}/bin/host/${archDir}/Plugins/imageformats/libqtiff.so"
'';
autoPatchelfIgnoreMissingDeps = prevAttrs.autoPatchelfIgnoreMissingDeps or [ ] ++ [
"libnvidia-ml.so.1"
];
# NOTE(@connorbaker): It might be a problem that when nsight_compute contains hosts and targets of different
# architectures, that we patchelf just the binaries matching the builder's platform; autoPatchelfHook prints
# messages like
# skipping [$out]/host/linux-desktop-glibc_2_11_3-x64/libQt6Core.so.6 because its architecture (x64) differs from
# target (AArch64)
}

View File

@@ -0,0 +1,136 @@
{
boost178,
cuda_cudart,
cudaAtLeast,
e2fsprogs,
gst_all_1,
lib,
nss,
numactl,
pulseaudio,
qt6,
rdma-core,
stdenv,
ucx,
wayland,
xorg,
}:
prevAttrs:
let
qtwayland = lib.getLib qt6.qtwayland;
qtWaylandPlugins = "${qtwayland}/${qt6.qtbase.qtPluginPrefix}";
# NOTE(@connorbaker): nsight_systems doesn't support Jetson, so no need for case splitting on aarch64-linux.
hostDir =
{
aarch64-linux = "host-linux-armv8";
x86_64-linux = "host-linux-x64";
}
.${stdenv.hostPlatform.system} or (throw "Unsupported system: ${stdenv.hostPlatform.system}");
targetDir =
{
aarch64-linux = "target-linux-sbsa-armv8";
x86_64-linux = "target-linux-x64";
}
.${stdenv.hostPlatform.system} or (throw "Unsupported system: ${stdenv.hostPlatform.system}");
in
{
outputs = [ "out" ]; # NOTE(@connorbaker): Force a single output so relative lookups work.
# An ad hoc replacement for
# https://github.com/ConnorBaker/cuda-redist-find-features/issues/11
env = prevAttrs.env or { } // {
rmPatterns =
prevAttrs.env.rmPatterns or ""
+ toString [
"${hostDir}/lib{arrow,jpeg}*"
"${hostDir}/lib{ssl,ssh,crypto}*"
"${hostDir}/libboost*"
"${hostDir}/libexec"
"${hostDir}/libstdc*"
"${hostDir}/python/bin/python"
"${hostDir}/Mesa"
];
};
# NOTE(@connorbaker): nsight-exporter and nsight-sys are deprecated scripts wrapping nsys, it's fine to remove them.
prePatch = prevAttrs.prePatch or "" + ''
if [[ -d bin ]]; then
nixLog "Removing bin wrapper scripts"
for knownWrapper in bin/{nsys{,-ui},nsight-{exporter,sys}}; do
[[ -e $knownWrapper ]] && rm -v "$knownWrapper"
done
unset -v knownWrapper
nixLog "Removing empty bin directory"
rmdir -v bin
fi
if [[ -d nsight-systems ]]; then
nixLog "Lifting components of Nsight System to the top level"
mv -v nsight-systems/*/* .
nixLog "Removing empty nsight-systems directory"
rmdir -pv nsight-systems/*
fi
'';
postPatch = prevAttrs.postPatch or "" + ''
for path in $rmPatterns; do
rm -r "$path"
done
patchShebangs nsight-systems
'';
nativeBuildInputs = prevAttrs.nativeBuildInputs or [ ] ++ [ qt6.wrapQtAppsHook ];
dontWrapQtApps = true;
buildInputs =
prevAttrs.buildInputs or [ ]
++ [
(qt6.qtdeclarative or qt6.full)
(qt6.qtsvg or qt6.full)
(qt6.qtimageformats or qt6.full)
(qt6.qtpositioning or qt6.full)
(qt6.qtscxml or qt6.full)
(qt6.qttools or qt6.full)
(qt6.qtwebengine or qt6.full)
(qt6.qtwayland or qt6.full)
boost178
cuda_cudart.stubs
e2fsprogs
gst_all_1.gst-plugins-base
gst_all_1.gstreamer
nss
numactl
pulseaudio
qt6.qtbase
qtWaylandPlugins
rdma-core
ucx
wayland
xorg.libXcursor
xorg.libXdamage
xorg.libXrandr
xorg.libXtst
]
# NOTE(@connorbaker): Seems to be required only for aarch64-linux.
++ lib.optionals stdenv.hostPlatform.isAarch64 [
gst_all_1.gst-plugins-bad
];
postInstall = prevAttrs.postInstall or "" + ''
moveToOutput '${hostDir}' "''${!outputBin}"
moveToOutput '${targetDir}' "''${!outputBin}"
moveToOutput 'bin' "''${!outputBin}"
wrapQtApp "''${!outputBin}/${hostDir}/nsys-ui.bin"
'';
# lib needs libtiff.so.5, but nixpkgs provides libtiff.so.6
preFixup = prevAttrs.preFixup or "" + ''
patchelf --replace-needed libtiff.so.5 libtiff.so "''${!outputBin}/${hostDir}/Plugins/imageformats/libqtiff.so"
'';
autoPatchelfIgnoreMissingDeps = prevAttrs.autoPatchelfIgnoreMissingDeps or [ ] ++ [
"libnvidia-ml.so.1"
];
}

View File

@@ -0,0 +1,5 @@
_: prevAttrs: {
badPlatformsConditions = prevAttrs.badPlatformsConditions or { } // {
"Package is not supported; use drivers from linuxPackages" = true;
};
}

View File

@@ -0,0 +1,128 @@
{
_cuda,
cudaOlder,
cudaPackages,
cudaMajorMinorVersion,
lib,
patchelf,
requireFile,
stdenv,
}:
let
inherit (lib)
attrsets
maintainers
meta
strings
versions
;
inherit (stdenv) hostPlatform;
# targetArch :: String
targetArch = attrsets.attrByPath [ hostPlatform.system ] "unsupported" {
x86_64-linux = "x86_64-linux-gnu";
aarch64-linux = "aarch64-linux-gnu";
};
in
finalAttrs: prevAttrs: {
# Useful for inspecting why something went wrong.
brokenConditions =
let
cudaTooOld = cudaOlder finalAttrs.passthru.featureRelease.minCudaVersion;
cudaTooNew =
(finalAttrs.passthru.featureRelease.maxCudaVersion != null)
&& strings.versionOlder finalAttrs.passthru.featureRelease.maxCudaVersion cudaMajorMinorVersion;
cudnnVersionIsSpecified = finalAttrs.passthru.featureRelease.cudnnVersion != null;
cudnnVersionSpecified = versions.majorMinor finalAttrs.passthru.featureRelease.cudnnVersion;
cudnnVersionProvided = versions.majorMinor finalAttrs.passthru.cudnn.version;
cudnnTooOld =
cudnnVersionIsSpecified && (strings.versionOlder cudnnVersionProvided cudnnVersionSpecified);
cudnnTooNew =
cudnnVersionIsSpecified && (strings.versionOlder cudnnVersionSpecified cudnnVersionProvided);
in
prevAttrs.brokenConditions or { }
// {
"CUDA version is too old" = cudaTooOld;
"CUDA version is too new" = cudaTooNew;
"CUDNN version is too old" = cudnnTooOld;
"CUDNN version is too new" = cudnnTooNew;
};
src = requireFile {
name = finalAttrs.passthru.redistribRelease.filename;
inherit (finalAttrs.passthru.redistribRelease) hash;
message = ''
To use the TensorRT derivation, you must join the NVIDIA Developer Program and
download the ${finalAttrs.version} TAR package for CUDA ${cudaMajorMinorVersion} from
${finalAttrs.meta.homepage}.
Once you have downloaded the file, add it to the store with the following
command, and try building this derivation again.
$ nix-store --add-fixed sha256 ${finalAttrs.passthru.redistribRelease.filename}
'';
};
# We need to look inside the extracted output to get the files we need.
sourceRoot = "TensorRT-${finalAttrs.version}";
buildInputs = prevAttrs.buildInputs or [ ] ++ [ (finalAttrs.passthru.cudnn.lib or null) ];
preInstall =
prevAttrs.preInstall or ""
+ strings.optionalString (targetArch != "unsupported") ''
# Replace symlinks to bin and lib with the actual directories from targets.
for dir in bin lib; do
rm "$dir"
mv "targets/${targetArch}/$dir" "$dir"
done
# Remove broken symlinks
for dir in include samples; do
rm "targets/${targetArch}/$dir" || :
done
'';
# Tell autoPatchelf about runtime dependencies.
postFixup =
let
versionTriple = "${versions.majorMinor finalAttrs.version}.${versions.patch finalAttrs.version}";
in
prevAttrs.postFixup or ""
+ ''
${meta.getExe' patchelf "patchelf"} --add-needed libnvinfer.so \
"$lib/lib/libnvinfer.so.${versionTriple}" \
"$lib/lib/libnvinfer_plugin.so.${versionTriple}" \
"$lib/lib/libnvinfer_builder_resource.so.${versionTriple}"
'';
passthru = prevAttrs.passthru or { } // {
# The CUDNN used with TensorRT.
# If null, the default cudnn derivation will be used.
# If a version is specified, the cudnn derivation with that version will be used,
# unless it is not available, in which case the default cudnn derivation will be used.
cudnn =
let
desiredName = _cuda.lib.mkVersionedName "cudnn" (
lib.versions.majorMinor finalAttrs.passthru.featureRelease.cudnnVersion
);
in
if finalAttrs.passthru.featureRelease.cudnnVersion == null || (cudaPackages ? desiredName) then
cudaPackages.cudnn
else
cudaPackages.${desiredName};
};
meta = prevAttrs.meta or { } // {
badPlatforms =
prevAttrs.meta.badPlatforms or [ ]
++ lib.optionals (targetArch == "unsupported") [ hostPlatform.system ];
homepage = "https://developer.nvidia.com/tensorrt";
maintainers = prevAttrs.meta.maintainers or [ ] ++ [ maintainers.aidalgol ];
teams = prevAttrs.meta.teams or [ ];
# Building TensorRT on Hydra is impossible because of the non-redistributable
# license and because the source needs to be manually downloaded from the
# NVIDIA Developer Program (see requireFile above).
hydraPlatforms = lib.platforms.none;
};
}

View File

@@ -0,0 +1,139 @@
{ _cuda, lib }:
{
/**
Evaluate assertions and add error context to return value.
NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
# Type
```
_evaluateAssertions
:: (assertions :: List { assertion :: Bool, message :: String })
-> Bool
```
*/
_evaluateAssertions =
assertions:
let
failedAssertionsString = _cuda.lib._mkFailedAssertionsString assertions;
in
if failedAssertionsString == "" then
true
else
lib.addErrorContext "with failed assertions:${failedAssertionsString}" false;
/**
Function to generate a string of failed assertions.
NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
# Type
```
_mkFailedAssertionsString
:: (assertions :: List { assertion :: Bool, message :: String })
-> String
```
# Inputs
`assertions`
: A list of assertions to evaluate
# Examples
:::{.example}
## `_cuda.lib._mkFailedAssertionsString` usage examples
```nix
_mkFailedAssertionsString [
{ assertion = false; message = "Assertion 1 failed"; }
{ assertion = true; message = "Assertion 2 failed"; }
]
=> "\n- Assertion 1 failed"
```
```nix
_mkFailedAssertionsString [
{ assertion = false; message = "Assertion 1 failed"; }
{ assertion = false; message = "Assertion 2 failed"; }
]
=> "\n- Assertion 1 failed\n- Assertion 2 failed"
```
:::
*/
_mkFailedAssertionsString = lib.foldl' (
failedAssertionsString:
{ assertion, message }:
failedAssertionsString + lib.optionalString (!assertion) ("\n- " + message)
) "";
/**
Utility function to generate assertions for missing packages.
Used to mark a package as unsupported if any of its required packages are missing (null).
Expects a set of attributes.
Most commonly used in overrides files on a callPackage-provided attribute set of packages.
NOTE: We typically use platfromAssertions instead of brokenAssertions because the presence of packages set to null
means evaluation will fail if package attributes are accessed without checking for null first. OfBorg evaluation
sets allowBroken to true, which means we can't rely on brokenAssertions to prevent evaluation of a package with
missing dependencies.
NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
# Type
```
_mkMissingPackagesAssertions
:: (attrs :: AttrSet)
-> (assertions :: List { assertion :: Bool, message :: String })
```
# Inputs
`attrs`
: The attributes to check for null
# Examples
:::{.example}
## `_cuda.lib._mkMissingPackagesAssertions` usage examples
```nix
{
lib,
libcal ? null,
libcublas,
utils,
}:
let
inherit (lib.attrsets) recursiveUpdate;
inherit (_cuda.lib) _mkMissingPackagesAssertions;
in
prevAttrs: {
passthru = prevAttrs.passthru or { } // {
platformAssertions =
prevAttrs.passthru.platformAssertions or [ ]
++ _mkMissingPackagesAssertions { inherit libcal; };
};
}
```
:::
*/
_mkMissingPackagesAssertions = lib.flip lib.pipe [
# Take the attributes that are null.
(lib.filterAttrs (_: value: value == null))
lib.attrNames
# Map them to assertions.
(lib.map (name: {
message = "${name} is available";
assertion = false;
}))
];
}

View File

@@ -0,0 +1,129 @@
{ lib }:
{
/**
Returns whether a capability should be built by default for a particular CUDA version.
Capabilities built by default are baseline, non-Jetson capabilities with relatively recent CUDA support.
NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
# Type
```
_cudaCapabilityIsDefault
:: (cudaMajorMinorVersion :: Version)
-> (cudaCapabilityInfo :: CudaCapabilityInfo)
-> Bool
```
# Inputs
`cudaMajorMinorVersion`
: The CUDA version to check
`cudaCapabilityInfo`
: The capability information to check
*/
_cudaCapabilityIsDefault =
cudaMajorMinorVersion: cudaCapabilityInfo:
let
recentCapability =
cudaCapabilityInfo.dontDefaultAfterCudaMajorMinorVersion == null
|| lib.versionAtLeast cudaCapabilityInfo.dontDefaultAfterCudaMajorMinorVersion cudaMajorMinorVersion;
in
recentCapability
&& !cudaCapabilityInfo.isJetson
&& !cudaCapabilityInfo.isArchitectureSpecific
&& !cudaCapabilityInfo.isFamilySpecific;
/**
Returns whether a capability is supported for a particular CUDA version.
NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
# Type
```
_cudaCapabilityIsSupported
:: (cudaMajorMinorVersion :: Version)
-> (cudaCapabilityInfo :: CudaCapabilityInfo)
-> Bool
```
# Inputs
`cudaMajorMinorVersion`
: The CUDA version to check
`cudaCapabilityInfo`
: The capability information to check
*/
_cudaCapabilityIsSupported =
cudaMajorMinorVersion: cudaCapabilityInfo:
let
lowerBoundSatisfied = lib.versionAtLeast cudaMajorMinorVersion cudaCapabilityInfo.minCudaMajorMinorVersion;
upperBoundSatisfied =
cudaCapabilityInfo.maxCudaMajorMinorVersion == null
|| lib.versionAtLeast cudaCapabilityInfo.maxCudaMajorMinorVersion cudaMajorMinorVersion;
in
lowerBoundSatisfied && upperBoundSatisfied;
/**
Generates a CUDA variant name from a version.
NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
# Type
```
_mkCudaVariant :: (version :: String) -> String
```
# Inputs
`version`
: The version string
# Examples
:::{.example}
## `_cuda.lib._mkCudaVariant` usage examples
```nix
_mkCudaVariant "11.0"
=> "cuda11"
```
:::
*/
_mkCudaVariant = version: "cuda${lib.versions.major version}";
/**
A predicate which, given a package, returns true if the package has a free license or one of NVIDIA's licenses.
This function is intended to be provided as `config.allowUnfreePredicate` when `import`-ing Nixpkgs.
# Type
```
allowUnfreeCudaPredicate :: (package :: Package) -> Bool
```
*/
allowUnfreeCudaPredicate =
package:
lib.all (
license:
license.free
|| lib.elem license.shortName [
"CUDA EULA"
"cuDNN EULA"
"cuSPARSELt EULA"
"cuTENSOR EULA"
"NVidia OptiX EULA"
]
) (lib.toList package.meta.license);
}

View File

@@ -0,0 +1,52 @@
{
_cuda,
lib,
}:
{
# See ./assertions.nix for documentation.
inherit (import ./assertions.nix { inherit _cuda lib; })
_evaluateAssertions
_mkFailedAssertionsString
_mkMissingPackagesAssertions
;
# See ./cuda.nix for documentation.
inherit (import ./cuda.nix { inherit lib; })
_cudaCapabilityIsDefault
_cudaCapabilityIsSupported
_mkCudaVariant
allowUnfreeCudaPredicate
;
# See ./meta.nix for documentation.
inherit (import ./meta.nix { inherit _cuda lib; })
_mkMetaBadPlatforms
_mkMetaBroken
;
# See ./redist.nix for documentation.
inherit (import ./redist.nix { inherit _cuda lib; })
_redistSystemIsSupported
getNixSystems
getRedistSystem
mkRedistUrl
;
# See ./strings.nix for documentation.
inherit (import ./strings.nix { inherit _cuda lib; })
dotsToUnderscores
dropDots
formatCapabilities
mkCmakeCudaArchitecturesString
mkGencodeFlag
mkRealArchitecture
mkVersionedName
mkVirtualArchitecture
;
# See ./versions.nix for documentation.
inherit (import ./versions.nix { inherit _cuda lib; })
majorMinorPatch
trimComponents
;
}

View File

@@ -0,0 +1,71 @@
{ _cuda, lib }:
{
/**
Returns a list of bad platforms for a given package if assertsions in `finalAttrs.passthru.platformAssertions`
fail, optionally logging evaluation warnings for each reason.
NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
NOTE: This function requires `finalAttrs.passthru.platformAssertions` to be a list of assertions and
`finalAttrs.finalPackage.name` and `finalAttrs.finalPackage.stdenv` to be available.
# Type
```
_mkMetaBadPlatforms :: (warn :: Bool) -> (finalAttrs :: AttrSet) -> List String
```
*/
_mkMetaBadPlatforms =
warn: finalAttrs:
let
failedAssertionsString = _cuda.lib._mkFailedAssertionsString finalAttrs.passthru.platformAssertions;
hasFailedAssertions = failedAssertionsString != "";
finalStdenv = finalAttrs.finalPackage.stdenv;
in
lib.warnIf (warn && hasFailedAssertions)
"Package ${finalAttrs.finalPackage.name} is unsupported on this platform due to the following failed assertions:${failedAssertionsString}"
(
lib.optionals hasFailedAssertions (
lib.unique [
finalStdenv.buildPlatform.system
finalStdenv.hostPlatform.system
finalStdenv.targetPlatform.system
]
)
);
/**
Returns a boolean indicating whether the package is broken as a result of `finalAttrs.passthru.brokenAssertions`,
optionally logging evaluation warnings for each reason.
NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
NOTE: This function requires `finalAttrs.passthru.brokenAssertions` to be a list of assertions and
`finalAttrs.finalPackage.name` to be available.
# Type
```
_mkMetaBroken :: (warn :: Bool) -> (finalAttrs :: AttrSet) -> Bool
```
# Inputs
`warn`
: A boolean indicating whether to log warnings
`finalAttrs`
: The final attributes of the package
*/
_mkMetaBroken =
warn: finalAttrs:
let
failedAssertionsString = _cuda.lib._mkFailedAssertionsString finalAttrs.passthru.brokenAssertions;
hasFailedAssertions = failedAssertionsString != "";
in
lib.warnIf (warn && hasFailedAssertions)
"Package ${finalAttrs.finalPackage.name} is marked as broken due to the following failed assertions:${failedAssertionsString}"
hasFailedAssertions;
}

View File

@@ -0,0 +1,196 @@
{ _cuda, lib }:
{
/**
Returns a boolean indicating whether the provided redist system is supported by any of the provided redist systems.
NOTE: No guarantees are made about this function's stability. You may use it at your own risk.
# Type
```
_redistSystemIsSupported
:: (redistSystem :: RedistSystem)
-> (redistSystems :: List RedistSystem)
-> Bool
```
# Inputs
`redistSystem`
: The redist system to check
`redistSystems`
: The list of redist systems to check against
# Examples
:::{.example}
## `cudaLib._redistSystemIsSupported` usage examples
```nix
_redistSystemIsSupported "linux-x86_64" [ "linux-x86_64" ]
=> true
```
```nix
_redistSystemIsSupported "linux-x86_64" [ "linux-aarch64" ]
=> false
```
```nix
_redistSystemIsSupported "linux-x86_64" [ "linux-aarch64" "linux-x86_64" ]
=> true
```
```nix
_redistSystemIsSupported "linux-x86_64" [ "linux-aarch64" "linux-all" ]
=> true
```
:::
*/
_redistSystemIsSupported =
redistSystem: redistSystems:
lib.findFirst (
redistSystem':
redistSystem' == redistSystem || redistSystem' == "linux-all" || redistSystem' == "source"
) null redistSystems != null;
/**
Maps a NVIDIA redistributable system to Nix systems.
NOTE: This function returns a list of systems because the redistributable systems `"linux-all"` and `"source"` can
be built on multiple systems.
NOTE: This function *will* be called by unsupported systems because `cudaPackages` is evaluated on all systems. As
such, we need to handle unsupported systems gracefully.
# Type
```
getNixSystems :: (redistSystem :: RedistSystem) -> [String]
```
# Inputs
`redistSystem`
: The NVIDIA redistributable system
# Examples
:::{.example}
## `cudaLib.getNixSystems` usage examples
```nix
getNixSystems "linux-sbsa"
=> [ "aarch64-linux" ]
```
```nix
getNixSystems "linux-aarch64"
=> [ "aarch64-linux" ]
```
:::
*/
getNixSystems =
redistSystem:
if redistSystem == "linux-x86_64" then
[ "x86_64-linux" ]
else if redistSystem == "linux-sbsa" || redistSystem == "linux-aarch64" then
[ "aarch64-linux" ]
else if redistSystem == "linux-all" || redistSystem == "source" then
[
"aarch64-linux"
"x86_64-linux"
]
else
[ ];
/**
Maps a Nix system to a NVIDIA redistributable system.
NOTE: We swap out the default `linux-sbsa` redist (for server-grade ARM chips) with the `linux-aarch64` redist
(which is for Jetson devices) if we're building any Jetson devices. Since both are based on aarch64, we can only
have one or the other, otherwise there's an ambiguity as to which should be used.
NOTE: This function *will* be called by unsupported systems because `cudaPackages` is evaluated on all systems. As
such, we need to handle unsupported systems gracefully.
# Type
```
getRedistSystem :: (hasJetsonCudaCapability :: Bool) -> (nixSystem :: String) -> String
```
# Inputs
`hasJetsonCudaCapability`
: If configured for a Jetson device
`nixSystem`
: The Nix system
# Examples
:::{.example}
## `cudaLib.getRedistSystem` usage examples
```nix
getRedistSystem true "aarch64-linux"
=> "linux-aarch64"
```
```nix
getRedistSystem false "aarch64-linux"
=> "linux-sbsa"
```
:::
*/
getRedistSystem =
hasJetsonCudaCapability: nixSystem:
if nixSystem == "x86_64-linux" then
"linux-x86_64"
else if nixSystem == "aarch64-linux" then
if hasJetsonCudaCapability then "linux-aarch64" else "linux-sbsa"
else
"unsupported";
/**
Function to generate a URL for something in the redistributable tree.
# Type
```
mkRedistUrl :: (redistName :: RedistName) -> (relativePath :: NonEmptyStr) -> RedistUrl
```
# Inputs
`redistName`
: The name of the redistributable
`relativePath`
: The relative path to a file in the redistributable tree
*/
mkRedistUrl =
redistName: relativePath:
lib.concatStringsSep "/" (
[ _cuda.db.redistUrlPrefix ]
++ (
if redistName != "tensorrt" then
[
redistName
"redist"
]
else
[ "machine-learning" ]
)
++ [ relativePath ]
);
}

View File

@@ -0,0 +1,382 @@
{ _cuda, lib }:
let
cudaLib = _cuda.lib;
in
{
/**
Replaces dots in a string with underscores.
# Type
```
dotsToUnderscores :: (str :: String) -> String
```
# Inputs
`str`
: The string for which dots shall be replaced by underscores
# Examples
:::{.example}
## `cudaLib.dotsToUnderscores` usage examples
```nix
dotsToUnderscores "1.2.3"
=> "1_2_3"
```
:::
*/
dotsToUnderscores = lib.replaceStrings [ "." ] [ "_" ];
/**
Removes the dots from a string.
# Type
```
dropDots :: (str :: String) -> String
```
# Inputs
`str`
: The string to remove dots from
# Examples
:::{.example}
## `cudaLib.dropDots` usage examples
```nix
dropDots "1.2.3"
=> "123"
```
:::
*/
dropDots = lib.replaceStrings [ "." ] [ "" ];
/**
Produces an attribute set of useful data and functionality for packaging CUDA software within Nixpkgs.
# Type
```
formatCapabilities
:: { cudaCapabilityToInfo :: AttrSet CudaCapability CudaCapabilityInfo
, cudaCapabilities :: List CudaCapability
, cudaForwardCompat :: Bool
}
-> { cudaCapabilities :: List CudaCapability
, cudaForwardCompat :: Bool
, gencode :: List String
, realArches :: List String
, virtualArches :: List String
, archNames :: List String
, arches :: List String
, gencodeString :: String
, cmakeCudaArchitecturesString :: String
}
```
# Inputs
`cudaCapabilityToInfo`
: A mapping of CUDA capabilities to their information
`cudaCapabilities`
: A list of CUDA capabilities to use
`cudaForwardCompat`
: A boolean indicating whether to include the forward compatibility gencode (+PTX) to support future GPU
generations
*/
formatCapabilities =
{
cudaCapabilityToInfo,
cudaCapabilities,
cudaForwardCompat,
}:
let
/**
The real architectures for the given CUDA capabilities.
# Type
```
realArches :: List String
```
*/
realArches = lib.map cudaLib.mkRealArchitecture cudaCapabilities;
/**
The virtual architectures for the given CUDA capabilities.
These are typically used for forward compatibility, when trying to support an architecture newer than the CUDA
version allows.
# Type
```
virtualArches :: List String
```
*/
virtualArches = lib.map cudaLib.mkVirtualArchitecture cudaCapabilities;
/**
The gencode flags for the given CUDA capabilities.
# Type
```
gencode :: List String
```
*/
gencode =
let
base = lib.map (cudaLib.mkGencodeFlag "sm") cudaCapabilities;
forward = cudaLib.mkGencodeFlag "compute" (lib.last cudaCapabilities);
in
base ++ lib.optionals cudaForwardCompat [ forward ];
in
{
inherit
cudaCapabilities
cudaForwardCompat
gencode
realArches
virtualArches
;
/**
The architecture names for the given CUDA capabilities.
# Type
```
archNames :: List String
```
*/
# E.g. [ "Ampere" "Turing" ]
archNames = lib.pipe cudaCapabilities [
(lib.map (cudaCapability: cudaCapabilityToInfo.${cudaCapability}.archName))
lib.unique
lib.naturalSort
];
/**
The architectures for the given CUDA capabilities, including both real and virtual architectures.
When `cudaForwardCompat` is enabled, the last architecture in the list is used as the forward compatibility architecture.
# Type
```
arches :: List String
```
*/
# E.g. [ "sm_75" "sm_86" "compute_86" ]
arches = realArches ++ lib.optionals cudaForwardCompat [ (lib.last virtualArches) ];
/**
The CMake-compatible CUDA architectures string for the given CUDA capabilities.
# Type
```
cmakeCudaArchitecturesString :: String
```
*/
cmakeCudaArchitecturesString = cudaLib.mkCmakeCudaArchitecturesString cudaCapabilities;
/**
The gencode string for the given CUDA capabilities.
# Type
```
gencodeString :: String
```
*/
gencodeString = lib.concatStringsSep " " gencode;
};
/**
Produces a CMake-compatible CUDA architecture string from a list of CUDA capabilities.
# Type
```
mkCmakeCudaArchitecturesString :: (cudaCapabilities :: List String) -> String
```
# Inputs
`cudaCapabilities`
: The CUDA capabilities to convert
# Examples
:::{.example}
## `cudaLib.mkCmakeCudaArchitecturesString` usage examples
```nix
mkCmakeCudaArchitecturesString [ "8.9" "10.0a" ]
=> "89;100a"
```
:::
*/
mkCmakeCudaArchitecturesString = lib.concatMapStringsSep ";" cudaLib.dropDots;
/**
Produces a gencode flag from a CUDA capability.
# Type
```
mkGencodeFlag :: (archPrefix :: String) -> (cudaCapability :: String) -> String
```
# Inputs
`archPrefix`
: The architecture prefix to use for the `code` field
`cudaCapability`
: The CUDA capability to convert
# Examples
:::{.example}
## `cudaLib.mkGencodeFlag` usage examples
```nix
mkGencodeFlag "sm" "8.9"
=> "-gencode=arch=compute_89,code=sm_89"
```
```nix
mkGencodeFlag "compute" "10.0a"
=> "-gencode=arch=compute_100a,code=compute_100a"
```
:::
*/
mkGencodeFlag =
archPrefix: cudaCapability:
let
cap = cudaLib.dropDots cudaCapability;
in
"-gencode=arch=compute_${cap},code=${archPrefix}_${cap}";
/**
Produces a real architecture string from a CUDA capability.
# Type
```
mkRealArchitecture :: (cudaCapability :: String) -> String
```
# Inputs
`cudaCapability`
: The CUDA capability to convert
# Examples
:::{.example}
## `cudaLib.mkRealArchitecture` usage examples
```nix
mkRealArchitecture "8.9"
=> "sm_89"
```
```nix
mkRealArchitecture "10.0a"
=> "sm_100a"
```
:::
*/
mkRealArchitecture = cudaCapability: "sm_" + cudaLib.dropDots cudaCapability;
/**
Create a versioned attribute name from a version by replacing dots with underscores.
# Type
```
mkVersionedName :: (name :: String) -> (version :: Version) -> String
```
# Inputs
`name`
: The name to use
`version`
: The version to use
# Examples
:::{.example}
## `cudaLib.mkVersionedName` usage examples
```nix
mkVersionedName "hello" "1.2.3"
=> "hello_1_2_3"
```
```nix
mkVersionedName "cudaPackages" "12.8"
=> "cudaPackages_12_8"
```
:::
*/
mkVersionedName = name: version: "${name}_${cudaLib.dotsToUnderscores version}";
/**
Produces a virtual architecture string from a CUDA capability.
# Type
```
mkVirtualArchitecture :: (cudaCapability :: String) -> String
```
# Inputs
`cudaCapability`
: The CUDA capability to convert
# Examples
:::{.example}
## `cudaLib.mkVirtualArchitecture` usage examples
```nix
mkVirtualArchitecture "8.9"
=> "compute_89"
```
```nix
mkVirtualArchitecture "10.0a"
=> "compute_100a"
```
:::
*/
mkVirtualArchitecture = cudaCapability: "compute_" + cudaLib.dropDots cudaCapability;
}

View File

@@ -0,0 +1,79 @@
{ _cuda, lib }:
let
cudaLib = _cuda.lib;
in
{
/**
Extracts the major, minor, and patch version from a string.
# Type
```
majorMinorPatch :: (version :: String) -> String
```
# Inputs
`version`
: The version string
# Examples
:::{.example}
## `_cuda.lib.majorMinorPatch` usage examples
```nix
majorMinorPatch "11.0.3.4"
=> "11.0.3"
```
:::
*/
majorMinorPatch = cudaLib.trimComponents 3;
/**
Get a version string with no more than than the specified number of components.
# Type
```
trimComponents :: (numComponents :: Integer) -> (version :: String) -> String
```
# Inputs
`numComponents`
: A positive integer corresponding to the maximum number of components to keep
`version`
: A version string
# Examples
:::{.example}
## `_cuda.lib.trimComponents` usage examples
```nix
trimComponents 1 "1.2.3.4"
=> "1"
```
```nix
trimComponents 3 "1.2.3.4"
=> "1.2.3"
```
```nix
trimComponents 9 "1.2.3.4"
=> "1.2.3.4"
```
:::
*/
trimComponents =
n: v:
lib.pipe v [
lib.splitVersion
(lib.take n)
(lib.concatStringsSep ".")
];
}

View File

@@ -0,0 +1,28 @@
# Packages which have been deprecated or removed from cudaPackages
{ lib }:
let
mkRenamed =
oldName:
{ path, package }:
lib.warn "cudaPackages.${oldName} is deprecated, use ${path} instead" package;
in
final: _:
builtins.mapAttrs mkRenamed {
# A comment to prevent empty { } from collapsing into a single line
cudaFlags = {
path = "cudaPackages.flags";
package = final.flags;
};
cudaVersion = {
path = "cudaPackages.cudaMajorMinorVersion";
package = final.cudaMajorMinorVersion;
};
cudatoolkit-legacy-runfile = {
path = "cudaPackages.cudatoolkit";
package = final.cudatoolkit;
};
}

View File

@@ -0,0 +1,16 @@
{ lib, stdenv }:
let
inherit (stdenv) hostPlatform;
# Samples are built around the CUDA Toolkit, which is not available for
# aarch64. Check for both CUDA version and platform.
platformIsSupported = hostPlatform.isx86_64 && hostPlatform.isLinux;
# Build our extension
extension =
final: _:
lib.attrsets.optionalAttrs platformIsSupported {
cuda-library-samples = final.callPackage ./generic.nix { };
};
in
extension

View File

@@ -0,0 +1,137 @@
{
addDriverRunpath,
autoAddDriverRunpath,
autoPatchelfHook,
backendStdenv,
cmake,
cuda_cccl ? null,
cuda_cudart ? null,
cuda_nvcc ? null,
cudatoolkit,
cusparselt ? null,
cutensor ? null,
fetchFromGitHub,
lib,
libcusparse ? null,
setupCudaHook,
}:
let
base = backendStdenv.mkDerivation (finalAttrs: {
src = fetchFromGitHub {
owner = "NVIDIA";
repo = "CUDALibrarySamples";
rev = "e57b9c483c5384b7b97b7d129457e5a9bdcdb5e1";
sha256 = "0g17afsmb8am0darxchqgjz1lmkaihmnn7k1x4ahg5gllcmw8k3l";
};
version =
lib.strings.substring 0 7 finalAttrs.src.rev + "-" + lib.versions.majorMinor cudatoolkit.version;
nativeBuildInputs = [
cmake
addDriverRunpath
];
buildInputs = [ cudatoolkit ];
postFixup = ''
for exe in $out/bin/*; do
addDriverRunpath $exe
done
'';
meta = {
description = "examples of using libraries using CUDA";
longDescription = ''
CUDA Library Samples contains examples demonstrating the use of
features in the math and image processing libraries cuBLAS, cuTENSOR,
cuSPARSE, cuSOLVER, cuFFT, cuRAND, NPP and nvJPEG.
'';
license = lib.licenses.bsd3;
platforms = [ "x86_64-linux" ];
maintainers = with lib.maintainers; [ obsidian-systems-maintenance ];
teams = [ lib.teams.cuda ];
};
});
in
{
cublas = base.overrideAttrs (
finalAttrs: _: {
pname = "cuda-library-samples-cublas";
sourceRoot = "${finalAttrs.src.name}/cuBLASLt";
}
);
cusolver = base.overrideAttrs (
finalAttrs: _: {
pname = "cuda-library-samples-cusolver";
sourceRoot = "${finalAttrs.src.name}/cuSOLVER/gesv";
}
);
cutensor = base.overrideAttrs (
finalAttrs: prevAttrs: {
pname = "cuda-library-samples-cutensor";
sourceRoot = "${finalAttrs.src.name}/cuTENSOR";
buildInputs = prevAttrs.buildInputs or [ ] ++ [ cutensor ];
cmakeFlags = prevAttrs.cmakeFlags or [ ] ++ [
"-DCUTENSOR_EXAMPLE_BINARY_INSTALL_DIR=${placeholder "out"}/bin"
];
# CUTENSOR_ROOT is double escaped
postPatch = prevAttrs.postPatch or "" + ''
substituteInPlace CMakeLists.txt \
--replace-fail "\''${CUTENSOR_ROOT}/include" "${lib.getDev cutensor}/include"
'';
CUTENSOR_ROOT = cutensor;
meta = prevAttrs.meta or { } // {
broken = cutensor == null;
};
}
);
cusparselt = base.overrideAttrs (
finalAttrs: prevAttrs: {
pname = "cuda-library-samples-cusparselt";
sourceRoot = "${finalAttrs.src.name}/cuSPARSELt/matmul";
nativeBuildInputs = prevAttrs.nativeBuildInputs or [ ] ++ [
cmake
addDriverRunpath
(lib.getDev cusparselt)
(lib.getDev libcusparse)
cuda_nvcc
(lib.getDev cuda_cudart) # <cuda_runtime_api.h>
cuda_cccl # <nv/target>
];
postPatch = prevAttrs.postPatch or "" + ''
substituteInPlace CMakeLists.txt \
--replace-fail "''${CUSPARSELT_ROOT}/lib64/libcusparseLt.so" "${lib.getLib cusparselt}/lib/libcusparseLt.so" \
--replace-fail "''${CUSPARSELT_ROOT}/lib64/libcusparseLt_static.a" "${lib.getStatic cusparselt}/lib/libcusparseLt_static.a"
'';
postInstall = prevAttrs.postInstall or "" + ''
mkdir -p $out/bin
cp matmul_example $out/bin/
cp matmul_example_static $out/bin/
'';
CUDA_TOOLKIT_PATH = lib.getLib cudatoolkit;
CUSPARSELT_PATH = lib.getLib cusparselt;
meta = prevAttrs.meta or { } // {
broken =
# Base dependencies
cusparselt == null
|| libcusparse == null
|| cuda_nvcc == null
|| cuda_cudart == null
|| cuda_cccl == null;
};
}
);
}

View File

@@ -0,0 +1,70 @@
{ cudaMajorMinorVersion, lib }:
let
inherit (lib) attrsets modules trivial;
redistName = "cuda";
# Manifest files for CUDA redistributables (aka redist). These can be found at
# https://developer.download.nvidia.com/compute/cuda/redist/
# Maps a cuda version to the specific version of the manifest.
cudaVersionMap = {
"12.6" = "12.6.3";
"12.8" = "12.8.1";
"12.9" = "12.9.1";
};
# Check if the current CUDA version is supported.
cudaVersionMappingExists = builtins.hasAttr cudaMajorMinorVersion cudaVersionMap;
# fullCudaVersion : String
fullCudaVersion = cudaVersionMap.${cudaMajorMinorVersion};
evaluatedModules = modules.evalModules {
modules = [
../modules
# We need to nest the manifests in a config.cuda.manifests attribute so the
# module system can evaluate them.
{
cuda.manifests = {
redistrib = trivial.importJSON (./manifests + "/redistrib_${fullCudaVersion}.json");
feature = trivial.importJSON (./manifests + "/feature_${fullCudaVersion}.json");
};
}
];
};
# Generally we prefer to do things involving getting attribute names with feature_manifest instead
# of redistrib_manifest because the feature manifest will have *only* the redist system
# names as the keys, whereas the redistrib manifest will also have things like version, name, license,
# and license_path.
featureManifest = evaluatedModules.config.cuda.manifests.feature;
redistribManifest = evaluatedModules.config.cuda.manifests.redistrib;
# Builder function which builds a single redist package for a given platform.
# buildRedistPackage : callPackage -> PackageName -> Derivation
buildRedistPackage =
callPackage: pname:
callPackage ../generic-builders/manifest.nix {
inherit pname redistName;
# We pass the whole release to the builder because it has logic to handle
# the case we're trying to build on an unsupported platform.
redistribRelease = redistribManifest.${pname};
featureRelease = featureManifest.${pname};
};
# Build all the redist packages given final and prev.
redistPackages =
final: _prev:
# Wrap the whole thing in an optionalAttrs so we can return an empty set if the CUDA version
# is not supported.
# NOTE: We cannot include the call to optionalAttrs *in* the pipe as we would strictly evaluate the
# attrNames before we check if the CUDA version is supported.
attrsets.optionalAttrs cudaVersionMappingExists (
trivial.pipe featureManifest [
# Get all the package names
builtins.attrNames
# Build the redist packages
(trivial.flip attrsets.genAttrs (buildRedistPackage final.callPackage))
]
);
in
redistPackages

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,80 @@
{
lib,
symlinkJoin,
backendStdenv,
cudaMajorMinorVersion,
cuda_cccl ? null,
cuda_cudart ? null,
cuda_cuobjdump ? null,
cuda_cupti ? null,
cuda_cuxxfilt ? null,
cuda_gdb ? null,
cuda_nvcc ? null,
cuda_nvdisasm ? null,
cuda_nvml_dev ? null,
cuda_nvprune ? null,
cuda_nvrtc ? null,
cuda_nvtx ? null,
cuda_profiler_api ? null,
cuda_sanitizer_api ? null,
libcublas ? null,
libcufft ? null,
libcurand ? null,
libcusolver ? null,
libcusparse ? null,
libnpp ? null,
}:
let
getAllOutputs = p: [
(lib.getBin p)
(lib.getLib p)
(lib.getDev p)
];
hostPackages = [
cuda_cuobjdump
cuda_gdb
cuda_nvcc
cuda_nvdisasm
cuda_nvprune
];
targetPackages = [
cuda_cccl
cuda_cudart
cuda_cupti
cuda_cuxxfilt
cuda_nvml_dev
cuda_nvrtc
cuda_nvtx
cuda_profiler_api
cuda_sanitizer_api
libcublas
libcufft
libcurand
libcusolver
libcusparse
libnpp
];
# This assumes we put `cudatoolkit` in `buildInputs` instead of `nativeBuildInputs`:
allPackages = (map (p: p.__spliced.buildHost or p) hostPackages) ++ targetPackages;
in
symlinkJoin rec {
name = "cuda-merged-${cudaMajorMinorVersion}";
version = cudaMajorMinorVersion;
paths = builtins.concatMap getAllOutputs allPackages;
passthru = {
cc = lib.warn "cudaPackages.cudatoolkit is deprecated, refer to the manual and use splayed packages instead" backendStdenv.cc;
lib = symlinkJoin {
inherit name;
paths = map (p: lib.getLib p) allPackages;
};
};
meta = with lib; {
description = "Wrapper substituting the deprecated runfile-based CUDA installation";
license = licenses.nvidiaCuda;
};
}

View File

@@ -0,0 +1,112 @@
# NOTE: Check the following URLs for support matrices:
# v8 -> https://docs.nvidia.com/deeplearning/cudnn/archives/index.html
# v9 -> https://docs.nvidia.com/deeplearning/cudnn/frontend/latest/reference/support-matrix.html
# Version policy is to keep the latest minor release for each major release.
# https://developer.download.nvidia.com/compute/cudnn/redist/
{
cudnn.releases = {
# jetson
linux-aarch64 = [
{
version = "8.9.5.30";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-aarch64/cudnn-linux-aarch64-8.9.5.30_cuda12-archive.tar.xz";
hash = "sha256-BJH3sC9VwiB362eL8xTB+RdSS9UHz1tlgjm/mKRyM6E=";
}
{
version = "9.7.1.26";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-aarch64/cudnn-linux-aarch64-9.7.1.26_cuda12-archive.tar.xz";
hash = "sha256-jDPWAXKOiJYpblPwg5FUSh7F0Dgg59LLnd+pX9y7r1w=";
}
{
version = "9.8.0.87";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-aarch64/cudnn-linux-aarch64-9.8.0.87_cuda12-archive.tar.xz";
hash = "sha256-8D7OP/B9FxnwYhiXOoeXzsG+OHzDF7qrW7EY3JiBmec=";
}
];
# powerpc
linux-ppc64le = [ ];
# server-grade arm
linux-sbsa = [
{
version = "8.9.7.29";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-8.9.7.29_cuda12-archive.tar.xz";
hash = "sha256-6Yt8gAEHheXVygHuTOm1sMjHNYfqb4ZIvjTT+NHUe9E=";
}
{
version = "9.3.0.75";
minCudaVersion = "12.0";
maxCudaVersion = "12.6";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.3.0.75_cuda12-archive.tar.xz";
hash = "sha256-Eibdm5iciYY4VSlj0ACjz7uKCgy5uvjLCear137X1jk=";
}
{
version = "9.7.1.26";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.7.1.26_cuda12-archive.tar.xz";
hash = "sha256-koJFUKlesnWwbJCZhBDhLOBRQOBQjwkFZExlTJ7Xp2Q=";
}
{
version = "9.8.0.87";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.8.0.87_cuda12-archive.tar.xz";
hash = "sha256-IvYvR08MuzW+9UCtsdhB2mPJzT33azxOQwEPQ2ss2Fw=";
}
{
version = "9.11.0.98";
minCudaVersion = "12.0";
maxCudaVersion = "12.9";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.11.0.98_cuda12-archive.tar.xz";
hash = "sha256-X81kUdiKnTt/rLwASB+l4rsV8sptxvhuCysgG8QuzVY=";
}
];
# x86_64
linux-x86_64 = [
{
version = "8.9.7.29";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz";
hash = "sha256-R1MzYlx+QqevPKCy91BqEG4wyTsaoAgc2cE++24h47s=";
}
{
version = "9.3.0.75";
minCudaVersion = "12.0";
maxCudaVersion = "12.6";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.3.0.75_cuda12-archive.tar.xz";
hash = "sha256-PW7xCqBtyTOaR34rBX4IX/hQC73ueeQsfhNlXJ7/LCY=";
}
{
version = "9.7.1.26";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.7.1.26_cuda12-archive.tar.xz";
hash = "sha256-EJpeXGvN9Dlub2Pz+GLtLc8W7pPuA03HBKGxG98AwLE=";
}
{
version = "9.8.0.87";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.8.0.87_cuda12-archive.tar.xz";
hash = "sha256-MhubM7sSh0BNk9VnLTUvFv6rxLIgrGrguG5LJ/JX3PQ=";
}
{
version = "9.11.0.98";
minCudaVersion = "12.0";
maxCudaVersion = "12.9";
url = "https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.11.0.98_cuda12-archive.tar.xz";
hash = "sha256-tgyPrQH6FSHS5x7TiIe5BHjX8Hs9pJ/WirEYqf7k2kg=";
}
];
};
}

View File

@@ -0,0 +1,21 @@
# Shims to mimic the shape of ../modules/generic/manifests/{feature,redistrib}/release.nix
{
package,
# redistSystem :: String
# String is "unsupported" if the given architecture is unsupported.
redistSystem,
}:
{
featureRelease = {
inherit (package) minCudaVersion maxCudaVersion;
${redistSystem}.outputs = {
lib = true;
static = true;
dev = true;
};
};
redistribRelease = {
name = "NVIDIA CUDA Deep Neural Network library (cuDNN)";
inherit (package) hash url version;
};
}

View File

@@ -0,0 +1,96 @@
# Support matrix can be found at
# https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-880/support-matrix/index.html
{
cudaLib,
lib,
redistSystem,
}:
let
inherit (lib)
attrsets
lists
modules
trivial
;
redistName = "cusparselt";
pname = "libcusparse_lt";
cusparseltVersions = [
"0.7.1"
];
# Manifests :: { redistrib, feature }
# Each release of cusparselt gets mapped to an evaluated module for that release.
# From there, we can get the min/max CUDA versions supported by that release.
# listOfManifests :: List Manifests
listOfManifests =
let
configEvaluator =
fullCusparseltVersion:
modules.evalModules {
modules = [
../modules
# We need to nest the manifests in a config.cusparselt.manifests attribute so the
# module system can evaluate them.
{
cusparselt.manifests = {
redistrib = trivial.importJSON (./manifests + "/redistrib_${fullCusparseltVersion}.json");
feature = trivial.importJSON (./manifests + "/feature_${fullCusparseltVersion}.json");
};
}
];
};
# Un-nest the manifests attribute set.
releaseGrabber = evaluatedModules: evaluatedModules.config.cusparselt.manifests;
in
lists.map (trivial.flip trivial.pipe [
configEvaluator
releaseGrabber
]) cusparseltVersions;
# platformIsSupported :: Manifests -> Boolean
platformIsSupported =
{ feature, redistrib, ... }:
(attrsets.attrByPath [
pname
redistSystem
] null feature) != null;
# TODO(@connorbaker): With an auxiliary file keeping track of the CUDA versions each release supports,
# we could filter out releases that don't support our CUDA version.
# However, we don't have that currently, so we make a best-effort to try to build TensorRT with whatever
# libPath corresponds to our CUDA version.
# supportedManifests :: List Manifests
supportedManifests = builtins.filter platformIsSupported listOfManifests;
# Compute versioned attribute name to be used in this package set
# Patch version changes should not break the build, so we only use major and minor
# computeName :: RedistribRelease -> String
computeName =
{ version, ... }: cudaLib.mkVersionedName redistName (lib.versions.majorMinor version);
in
final: _:
let
# buildCusparseltPackage :: Manifests -> AttrSet Derivation
buildCusparseltPackage =
{ redistrib, feature }:
let
drv = final.callPackage ../generic-builders/manifest.nix {
inherit pname redistName;
redistribRelease = redistrib.${pname};
featureRelease = feature.${pname};
};
in
attrsets.nameValuePair (computeName redistrib.${pname}) drv;
extension =
let
nameOfNewest = computeName (lists.last supportedManifests).redistrib.${pname};
drvs = builtins.listToAttrs (lists.map buildCusparseltPackage supportedManifests);
containsDefault = attrsets.optionalAttrs (drvs != { }) { cusparselt = drvs.${nameOfNewest}; };
in
drvs // containsDefault;
in
extension

View File

@@ -0,0 +1,44 @@
{
"libcusparse_lt": {
"linux-aarch64": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": true,
"sample": false,
"static": true
}
},
"linux-sbsa": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": true,
"sample": false,
"static": true
}
},
"linux-x86_64": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": true,
"sample": false,
"static": true
}
},
"windows-x86_64": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": false,
"sample": false,
"static": false
}
}
}
}

View File

@@ -0,0 +1,35 @@
{
"release_date": "2025-02-25",
"release_label": "0.7.1",
"release_product": "cusparselt",
"libcusparse_lt": {
"name": "NVIDIA cuSPARSELt",
"license": "cuSPARSELt",
"license_path": "libcusparse_lt/LICENSE.txt",
"version": "0.7.1.0",
"linux-x86_64": {
"relative_path": "libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.7.1.0-archive.tar.xz",
"sha256": "a0d885837887c73e466a31b4e86aaae2b7d0cc9c5de0d40921dbe2a15dbd6a88",
"md5": "b2e5f3c9b9d69e1e0b55b16de33fdc6e",
"size": "353151840"
},
"linux-sbsa": {
"relative_path": "libcusparse_lt/linux-sbsa/libcusparse_lt-linux-sbsa-0.7.1.0-archive.tar.xz",
"sha256": "4a131d0a54728e53ba536b50bb65380603456f1656e7df8ee52e285618a0b57c",
"md5": "612a712c7da6e801ee773687e99af87e",
"size": "352406784"
},
"windows-x86_64": {
"relative_path": "libcusparse_lt/windows-x86_64/libcusparse_lt-windows-x86_64-0.7.1.0-archive.zip",
"sha256": "004bcb1b700c24ca8d60a8ddd2124640f61138a6c29914d2afaa0bfa0d0e3cf2",
"md5": "a1d8df8dc8ff4b3bd0e859f992f8f392",
"size": "268594665"
},
"linux-aarch64": {
"relative_path": "libcusparse_lt/linux-aarch64/libcusparse_lt-linux-aarch64-0.7.1.0-archive.tar.xz",
"sha256": "d3b0a660fd552e0bd9a4491b15299d968674833483d5f164cfea35e70646136c",
"md5": "54e3f3b28c94118991ce54ec38f531fb",
"size": "5494380"
}
}
}

View File

@@ -0,0 +1,124 @@
# Support matrix can be found at
# https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-880/support-matrix/index.html
#
# TODO(@connorbaker):
# This is a very similar strategy to CUDA/CUDNN:
#
# - Get all versions supported by the current release of CUDA
# - Build all of them
# - Make the newest the default
#
# Unique twists:
#
# - Instead of providing different releases for each version of CUDA, CuTensor has multiple subdirectories in `lib`
# -- one for each version of CUDA.
{
cudaLib,
cudaMajorMinorVersion,
lib,
redistSystem,
}:
let
inherit (lib)
attrsets
lists
modules
versions
trivial
;
redistName = "cutensor";
pname = "libcutensor";
cutensorVersions = [
"2.0.2"
"2.1.0"
];
# Manifests :: { redistrib, feature }
# Each release of cutensor gets mapped to an evaluated module for that release.
# From there, we can get the min/max CUDA versions supported by that release.
# listOfManifests :: List Manifests
listOfManifests =
let
configEvaluator =
fullCutensorVersion:
modules.evalModules {
modules = [
../modules
# We need to nest the manifests in a config.cutensor.manifests attribute so the
# module system can evaluate them.
{
cutensor.manifests = {
redistrib = trivial.importJSON (./manifests + "/redistrib_${fullCutensorVersion}.json");
feature = trivial.importJSON (./manifests + "/feature_${fullCutensorVersion}.json");
};
}
];
};
# Un-nest the manifests attribute set.
releaseGrabber = evaluatedModules: evaluatedModules.config.cutensor.manifests;
in
lists.map (trivial.flip trivial.pipe [
configEvaluator
releaseGrabber
]) cutensorVersions;
# Our cudaMajorMinorVersion tells us which version of CUDA we're building against.
# The subdirectories in lib/ tell us which versions of CUDA are supported.
# Typically the names will look like this:
#
# - 11
# - 12
# libPath :: String
libPath = versions.major cudaMajorMinorVersion;
# A release is supported if it has a libPath that matches our CUDA version for our platform.
# LibPath are not constant across the same release -- one platform may support fewer
# CUDA versions than another.
# platformIsSupported :: Manifests -> Boolean
platformIsSupported =
{ feature, redistrib, ... }:
(attrsets.attrByPath [
pname
redistSystem
] null feature) != null;
# TODO(@connorbaker): With an auxiliary file keeping track of the CUDA versions each release supports,
# we could filter out releases that don't support our CUDA version.
# However, we don't have that currently, so we make a best-effort to try to build TensorRT with whatever
# libPath corresponds to our CUDA version.
# supportedManifests :: List Manifests
supportedManifests = builtins.filter platformIsSupported listOfManifests;
# Compute versioned attribute name to be used in this package set
# Patch version changes should not break the build, so we only use major and minor
# computeName :: RedistribRelease -> String
computeName =
{ version, ... }: cudaLib.mkVersionedName redistName (lib.versions.majorMinor version);
in
final: _:
let
# buildCutensorPackage :: Manifests -> AttrSet Derivation
buildCutensorPackage =
{ redistrib, feature }:
let
drv = final.callPackage ../generic-builders/manifest.nix {
inherit pname redistName libPath;
redistribRelease = redistrib.${pname};
featureRelease = feature.${pname};
};
in
attrsets.nameValuePair (computeName redistrib.${pname}) drv;
extension =
let
nameOfNewest = computeName (lists.last supportedManifests).redistrib.${pname};
drvs = builtins.listToAttrs (lists.map buildCutensorPackage supportedManifests);
containsDefault = attrsets.optionalAttrs (drvs != { }) { cutensor = drvs.${nameOfNewest}; };
in
drvs // containsDefault;
in
extension

View File

@@ -0,0 +1,44 @@
{
"libcutensor": {
"linux-ppc64le": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": true,
"sample": false,
"static": true
}
},
"linux-sbsa": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": true,
"sample": false,
"static": true
}
},
"linux-x86_64": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": true,
"sample": false,
"static": true
}
},
"windows-x86_64": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": false,
"sample": false,
"static": false
}
}
}
}

View File

@@ -0,0 +1,34 @@
{
"libcutensor": {
"linux-sbsa": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": true,
"sample": false,
"static": true
}
},
"linux-x86_64": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": true,
"sample": false,
"static": true
}
},
"windows-x86_64": {
"outputs": {
"bin": false,
"dev": true,
"doc": false,
"lib": false,
"sample": false,
"static": false
}
}
}
}

View File

@@ -0,0 +1,35 @@
{
"release_date": "2024-06-24",
"release_label": "2.0.2",
"release_product": "cutensor",
"libcutensor": {
"name": "NVIDIA cuTENSOR",
"license": "cuTensor",
"license_path": "libcutensor/LICENSE.txt",
"version": "2.0.2.4",
"linux-x86_64": {
"relative_path": "libcutensor/linux-x86_64/libcutensor-linux-x86_64-2.0.2.4-archive.tar.xz",
"sha256": "957b04ef6343aca404fe5f4a3f1f1d3ac0bd04ceb3acecc93e53f4d63bd91157",
"md5": "2b994ecba434e69ee55043cf353e05b4",
"size": "545271628"
},
"linux-ppc64le": {
"relative_path": "libcutensor/linux-ppc64le/libcutensor-linux-ppc64le-2.0.2.4-archive.tar.xz",
"sha256": "db2c05e231a26fb5efee470e1d8e11cb1187bfe0726b665b87cbbb62a9901ba0",
"md5": "6b00e29407452333946744c4084157e8",
"size": "543070992"
},
"linux-sbsa": {
"relative_path": "libcutensor/linux-sbsa/libcutensor-linux-sbsa-2.0.2.4-archive.tar.xz",
"sha256": "9712b54aa0988074146867f9b6f757bf11a61996f3b58b21e994e920b272301b",
"md5": "c9bb31a92626a092d0c7152b8b3eaa18",
"size": "540299376"
},
"windows-x86_64": {
"relative_path": "libcutensor/windows-x86_64/libcutensor-windows-x86_64-2.0.2.4-archive.zip",
"sha256": "ab2fca16d410863d14f2716cec0d07fb21d20ecd24ee47d309e9970c9c01ed4a",
"md5": "f6cfdb29a9a421a1ee4df674dd54028c",
"size": "921154033"
}
}
}

View File

@@ -0,0 +1,29 @@
{
"release_date": "2025-01-27",
"release_label": "2.1.0",
"release_product": "cutensor",
"libcutensor": {
"name": "NVIDIA cuTENSOR",
"license": "cuTensor",
"license_path": "libcutensor/LICENSE.txt",
"version": "2.1.0.9",
"linux-x86_64": {
"relative_path": "libcutensor/linux-x86_64/libcutensor-linux-x86_64-2.1.0.9-archive.tar.xz",
"sha256": "ee59fcb4e8d59fc0d8cebf5f7f23bf2a196a76e6bcdcaa621aedbdcabd20a759",
"md5": "ed15120c512dfb3e32b49103850bb9dd",
"size": "814871140"
},
"linux-sbsa": {
"relative_path": "libcutensor/linux-sbsa/libcutensor-linux-sbsa-2.1.0.9-archive.tar.xz",
"sha256": "cef7819c4ecf3120d4f99b08463b8db1a8591be25147d1688371024885b1d2f0",
"md5": "fec00a1a825a05c0166eda6625dc587d",
"size": "782008004"
},
"windows-x86_64": {
"relative_path": "libcutensor/windows-x86_64/libcutensor-windows-x86_64-2.1.0.9-archive.zip",
"sha256": "ed835ba7fd617000f77e1dff87403d123edf540bd99339e3da2eaab9d32a4040",
"md5": "9efcbc0c9c372b0e71e11d4487aa5ffa",
"size": "1514752712"
}
}
}

View File

@@ -0,0 +1,356 @@
{
# General callPackage-supplied arguments
autoAddDriverRunpath,
autoAddCudaCompatRunpath,
autoPatchelfHook,
backendStdenv,
callPackage,
_cuda,
fetchurl,
lib,
markForCudatoolkitRootHook,
flags,
stdenv,
# Builder-specific arguments
# Short package name (e.g., "cuda_cccl")
# pname : String
pname,
# Common name (e.g., "cutensor" or "cudnn") -- used in the URL.
# Also known as the Redistributable Name.
# redistName : String,
redistName,
# If libPath is non-null, it must be a subdirectory of `lib`.
# The contents of `libPath` will be moved to the root of `lib`.
libPath ? null,
# See ./modules/generic/manifests/redistrib/release.nix
redistribRelease,
# See ./modules/generic/manifests/feature/release.nix
featureRelease,
cudaMajorMinorVersion,
}:
let
inherit (lib)
attrsets
lists
strings
trivial
licenses
teams
sourceTypes
;
inherit (stdenv) hostPlatform;
# Last step before returning control to `callPackage` (adds the `.override` method)
# we'll apply (`overrideAttrs`) necessary package-specific "fixup" functions.
# Order is significant.
maybeFixup = _cuda.fixups.${pname} or null;
fixup = if maybeFixup != null then callPackage maybeFixup { } else { };
# Get the redist systems for which package provides distributables.
# These are used by meta.platforms.
supportedRedistSystems = builtins.attrNames featureRelease;
# redistSystem :: String
# The redistSystem is the name of the system for which the redistributable is built.
# It is `"unsupported"` if the redistributable is not supported on the target system.
redistSystem = _cuda.lib.getRedistSystem backendStdenv.hasJetsonCudaCapability hostPlatform.system;
sourceMatchesHost = lib.elem hostPlatform.system (_cuda.lib.getNixSystems redistSystem);
in
(backendStdenv.mkDerivation (finalAttrs: {
# NOTE: Even though there's no actual buildPhase going on here, the derivations of the
# redistributables are sensitive to the compiler flags provided to stdenv. The patchelf package
# is sensitive to the compiler flags provided to stdenv, and we depend on it. As such, we are
# also sensitive to the compiler flags provided to stdenv.
inherit pname;
inherit (redistribRelease) version;
# Don't force serialization to string for structured attributes, like outputToPatterns
# and brokenConditions.
# Avoids "set cannot be coerced to string" errors.
__structuredAttrs = true;
# Keep better track of dependencies.
strictDeps = true;
# NOTE: Outputs are evaluated jointly with meta, so in the case that this is an unsupported platform,
# we still need to provide a list of outputs.
outputs =
let
# Checks whether the redistributable provides an output.
hasOutput =
output:
attrsets.attrByPath [
redistSystem
"outputs"
output
] false featureRelease;
# Order is important here so we use a list.
possibleOutputs = [
"bin"
"lib"
"static"
"dev"
"doc"
"sample"
"python"
];
# Filter out outputs that don't exist in the redistributable.
# NOTE: In the case the redistributable isn't supported on the target platform,
# we will have `outputs = [ "out" ] ++ possibleOutputs`. This is of note because platforms which
# aren't supported would otherwise have evaluation errors when trying to access outputs other than `out`.
# The alternative would be to have `outputs = [ "out" ]` when`redistSystem = "unsupported"`, but that would
# require adding guards throughout the entirety of the CUDA package set to ensure `cudaSupport` is true --
# recall that OfBorg will evaluate packages marked as broken and that `cudaPackages` will be evaluated with
# `cudaSupport = false`!
additionalOutputs =
if redistSystem == "unsupported" then
possibleOutputs
else
builtins.filter hasOutput possibleOutputs;
# The out output is special -- it's the default output and we always include it.
outputs = [ "out" ] ++ additionalOutputs;
in
outputs;
# Traversed in the order of the outputs specified in outputs;
# entries are skipped if they don't exist in outputs.
outputToPatterns = {
bin = [ "bin" ];
dev = [
"share/pkgconfig"
"**/*.pc"
"**/*.cmake"
];
lib = [
"lib"
"lib64"
];
static = [ "**/*.a" ];
sample = [ "samples" ];
python = [ "**/*.whl" ];
};
# Useful for introspecting why something went wrong. Maps descriptions of why the derivation would be marked as
# broken on have badPlatforms include the current platform.
# brokenConditions :: AttrSet Bool
# Sets `meta.broken = true` if any of the conditions are true.
# Example: Broken on a specific version of CUDA or when a dependency has a specific version.
brokenConditions = {
# Unclear how this is handled by Nix internals.
"Duplicate entries in outputs" = finalAttrs.outputs != lists.unique finalAttrs.outputs;
# Typically this results in the static output being empty, as all libraries are moved
# back to the lib output.
"lib output follows static output" =
let
libIndex = lists.findFirstIndex (x: x == "lib") null finalAttrs.outputs;
staticIndex = lists.findFirstIndex (x: x == "static") null finalAttrs.outputs;
in
libIndex != null && staticIndex != null && libIndex > staticIndex;
};
# badPlatformsConditions :: AttrSet Bool
# Sets `meta.badPlatforms = meta.platforms` if any of the conditions are true.
# Example: Broken on a specific architecture when some condition is met (like targeting Jetson).
badPlatformsConditions = {
"No source" = !sourceMatchesHost;
};
# src :: Optional Derivation
# If redistSystem doesn't exist in redistribRelease, return null.
src = trivial.mapNullable (
{ relative_path, sha256, ... }:
fetchurl {
url = "https://developer.download.nvidia.com/compute/${redistName}/redist/${relative_path}";
inherit sha256;
}
) (redistribRelease.${redistSystem} or null);
postPatch =
# Pkg-config's setup hook expects configuration files in $out/share/pkgconfig
''
for path in pkg-config pkgconfig; do
[[ -d "$path" ]] || continue
mkdir -p share/pkgconfig
mv "$path"/* share/pkgconfig/
rmdir "$path"
done
''
# Rewrite FHS paths with store paths
# NOTE: output* fall back to out if the corresponding output isn't defined.
+ ''
for pc in share/pkgconfig/*.pc; do
sed -i \
-e "s|^cudaroot\s*=.*\$|cudaroot=''${!outputDev}|" \
-e "s|^libdir\s*=.*/lib\$|libdir=''${!outputLib}/lib|" \
-e "s|^includedir\s*=.*/include\$|includedir=''${!outputDev}/include|" \
"$pc"
done
''
# Generate unversioned names.
# E.g. cuda-11.8.pc -> cuda.pc
+ ''
for pc in share/pkgconfig/*-"$majorMinorVersion.pc"; do
ln -s "$(basename "$pc")" "''${pc%-$majorMinorVersion.pc}".pc
done
'';
env.majorMinorVersion = cudaMajorMinorVersion;
# We do need some other phases, like configurePhase, so the multiple-output setup hook works.
dontBuild = true;
nativeBuildInputs = [
autoPatchelfHook
# This hook will make sure libcuda can be found
# in typically /lib/opengl-driver by adding that
# directory to the rpath of all ELF binaries.
# Check e.g. with `patchelf --print-rpath path/to/my/binary
autoAddDriverRunpath
markForCudatoolkitRootHook
]
# autoAddCudaCompatRunpath depends on cuda_compat and would cause
# infinite recursion if applied to `cuda_compat` itself (beside the fact
# that it doesn't make sense in the first place)
++ lib.optionals (pname != "cuda_compat" && flags.isJetsonBuild) [
# autoAddCudaCompatRunpath must appear AFTER autoAddDriverRunpath.
# See its documentation in ./setup-hooks/extension.nix.
autoAddCudaCompatRunpath
];
buildInputs = [
# autoPatchelfHook will search for a libstdc++ and we're giving it
# one that is compatible with the rest of nixpkgs, even when
# nvcc forces us to use an older gcc
# NB: We don't actually know if this is the right thing to do
(lib.getLib stdenv.cc.cc)
];
# Picked up by autoPatchelf
# Needed e.g. for libnvrtc to locate (dlopen) libnvrtc-builtins
appendRunpaths = [ "$ORIGIN" ];
# NOTE: We don't need to check for dev or doc, because those outputs are handled by
# the multiple-outputs setup hook.
# NOTE: moveToOutput operates on all outputs:
# https://github.com/NixOS/nixpkgs/blob/2920b6fc16a9ed5d51429e94238b28306ceda79e/pkgs/build-support/setup-hooks/multiple-outputs.sh#L105-L107
installPhase =
let
mkMoveToOutputCommand =
output:
let
template = pattern: ''moveToOutput "${pattern}" "${"$" + output}"'';
patterns = finalAttrs.outputToPatterns.${output} or [ ];
in
strings.concatMapStringsSep "\n" template patterns;
in
# Pre-install hook
''
runHook preInstall
''
# Handle the existence of libPath, which requires us to re-arrange the lib directory
+ strings.optionalString (libPath != null) ''
full_lib_path="lib/${libPath}"
if [[ ! -d "$full_lib_path" ]]; then
echo "${finalAttrs.pname}: '$full_lib_path' does not exist, only found:" >&2
find lib/ -mindepth 1 -maxdepth 1 >&2
echo "This release might not support your CUDA version" >&2
exit 1
fi
echo "Making libPath '$full_lib_path' the root of lib" >&2
mv "$full_lib_path" lib_new
rm -r lib
mv lib_new lib
''
# Create the primary output, out, and move the other outputs into it.
+ ''
mkdir -p "$out"
mv * "$out"
''
# Move the outputs into their respective outputs.
+ strings.concatMapStringsSep "\n" mkMoveToOutputCommand (builtins.tail finalAttrs.outputs)
# Add a newline to the end of the installPhase, so that the post-install hook doesn't
# get concatenated with the last moveToOutput command.
+ "\n"
# Post-install hook
+ ''
runHook postInstall
'';
doInstallCheck = true;
allowFHSReferences = true; # TODO: Default to `false`
postInstallCheck = ''
echo "Executing postInstallCheck"
if [[ -z "''${allowFHSReferences-}" ]]; then
mapfile -t outputPaths < <(for o in $(getAllOutputNames); do echo "''${!o}"; done)
if grep --max-count=5 --recursive --exclude=LICENSE /usr/ "''${outputPaths[@]}"; then
echo "Detected references to /usr" >&2
exit 1
fi
fi
'';
# libcuda needs to be resolved during runtime
autoPatchelfIgnoreMissingDeps = [
"libcuda.so"
"libcuda.so.*"
];
# _multioutPropagateDev() currently expects a space-separated string rather than an array
preFixup = ''
export propagatedBuildOutputs="''${propagatedBuildOutputs[@]}"
'';
# Propagate all outputs, including `static`
propagatedBuildOutputs = builtins.filter (x: x != "dev") finalAttrs.outputs;
# Kept in case overrides assume postPhases have already been defined
postPhases = [ "postPatchelf" ];
postPatchelf = ''
true
'';
passthru = {
# Provide access to the release information for fixup functions.
inherit redistribRelease featureRelease;
# Make the CUDA-patched stdenv available
stdenv = backendStdenv;
};
meta = {
description = "${redistribRelease.name}. By downloading and using the packages you accept the terms and conditions of the ${finalAttrs.meta.license.shortName}";
sourceProvenance = [ sourceTypes.binaryNativeCode ];
broken = lists.any trivial.id (attrsets.attrValues finalAttrs.brokenConditions);
platforms = trivial.pipe supportedRedistSystems [
# Map each redist system to the equivalent nix systems.
(lib.concatMap _cuda.lib.getNixSystems)
# Take all the unique values.
lib.unique
# Sort the list.
lib.naturalSort
];
badPlatforms =
let
isBadPlatform = lists.any trivial.id (attrsets.attrValues finalAttrs.badPlatformsConditions);
in
lists.optionals isBadPlatform finalAttrs.meta.platforms;
license =
if redistName == "cuda" then
# Add the package-specific license.
let
licensePath =
if redistribRelease.license_path != null then
redistribRelease.license_path
else
"${pname}/LICENSE.txt";
url = "https://developer.download.nvidia.com/compute/cuda/redist/${licensePath}";
in
lib.licenses.nvidiaCudaRedist // { inherit url; }
else
licenses.unfree;
teams = [ teams.cuda ];
};
})).overrideAttrs
fixup

View File

@@ -0,0 +1,130 @@
{
lib,
cudaLib,
cudaMajorMinorVersion,
redistSystem,
stdenv,
# Builder-specific arguments
# Short package name (e.g., "cuda_cccl")
# pname : String
pname,
# Common name (e.g., "cutensor" or "cudnn") -- used in the URL.
# Also known as the Redistributable Name.
# redistName : String,
redistName,
# releasesModule :: Path
# A path to a module which provides a `releases` attribute
releasesModule,
# shims :: Path
# A path to a module which provides a `shims` attribute
# The redistribRelease is only used in ./manifest.nix for the package version
# and the package description (which NVIDIA's manifest calls the "name").
# It's also used for fetching the source, but we override that since we can't
# re-use that portion of the functionality (different URLs, etc.).
# The featureRelease is used to populate meta.platforms (by way of looking at the attribute names), determine the
# outputs of the package, and provide additional package-specific constraints (e.g., min/max supported CUDA versions,
# required versions of other packages, etc.).
# shimFn :: {package, redistSystem} -> AttrSet
shimsFn ? (throw "shimsFn must be provided"),
}:
let
evaluatedModules = lib.modules.evalModules {
modules = [
../modules
releasesModule
];
};
# NOTE: Important types:
# - Releases: ../modules/${pname}/releases/releases.nix
# - Package: ../modules/${pname}/releases/package.nix
# Check whether a package supports our CUDA version.
# satisfiesCudaVersion :: Package -> Bool
satisfiesCudaVersion =
package:
lib.versionAtLeast cudaMajorMinorVersion package.minCudaVersion
&& lib.versionAtLeast package.maxCudaVersion cudaMajorMinorVersion;
# FIXME: do this at the module system level
propagatePlatforms = lib.mapAttrs (redistSystem: lib.map (p: { inherit redistSystem; } // p));
# Releases for all platforms and all CUDA versions.
allReleases = propagatePlatforms evaluatedModules.config.${pname}.releases;
# Releases for all platforms and our CUDA version.
allReleases' = lib.mapAttrs (_: lib.filter satisfiesCudaVersion) allReleases;
# Packages for all platforms and our CUDA versions.
allPackages = lib.concatLists (lib.attrValues allReleases');
packageOlder = p1: p2: lib.versionOlder p1.version p2.version;
packageSupportedPlatform = p: p.redistSystem == redistSystem;
# Compute versioned attribute name to be used in this package set
# Patch version changes should not break the build, so we only use major and minor
# computeName :: Package -> String
computeName = { version, ... }: cudaLib.mkVersionedName pname (lib.versions.majorMinor version);
# The newest package for each major-minor version, with newest first.
# newestPackages :: List Package
newestPackages =
let
newestForEachMajorMinorVersion = lib.foldl' (
newestPackages: package:
let
majorMinorVersion = lib.versions.majorMinor package.version;
existingPackage = newestPackages.${majorMinorVersion} or null;
in
newestPackages
// {
${majorMinorVersion} =
# Only keep the existing package if it is newer than the one we are considering or it is supported on the
# current platform and the one we are considering is not.
if
existingPackage != null
&& (
packageOlder package existingPackage
|| (!packageSupportedPlatform package && packageSupportedPlatform existingPackage)
)
then
existingPackage
else
package;
}
) { } allPackages;
in
# Sort the packages by version so the newest is first.
# NOTE: builtins.sort requires a strict weak ordering, so we must use versionOlder rather than versionAtLeast.
# See https://github.com/NixOS/nixpkgs/commit/9fd753ea84e5035b357a275324e7fd7ccfb1fc77.
lib.sort (lib.flip packageOlder) (lib.attrValues newestForEachMajorMinorVersion);
extension =
final: _:
let
# Builds our package into derivation and wraps it in a nameValuePair, where the name is the versioned name
# of the package.
buildPackage =
package:
let
shims = final.callPackage shimsFn { inherit package redistSystem; };
name = computeName package;
drv = final.callPackage ./manifest.nix {
inherit pname redistName;
inherit (shims) redistribRelease featureRelease;
};
in
lib.nameValuePair name drv;
# versionedDerivations :: AttrSet Derivation
versionedDerivations = builtins.listToAttrs (lib.map buildPackage newestPackages);
defaultDerivation = {
${pname} = (buildPackage (lib.head newestPackages)).value;
};
in
# NOTE: Must condition on the length of newestPackages to avoid non-total function lib.head aborting if
# newestPackages is empty.
lib.optionalAttrs (lib.length newestPackages > 0) (versionedDerivations // defaultDerivation);
in
extension

View File

@@ -0,0 +1,56 @@
# Modules
Modules as they are used in `modules` exist primarily to check the shape and
content of CUDA redistributable and feature manifests. They are ultimately meant
to reduce the repetitive nature of repackaging CUDA redistributables.
Building most redistributables follows a pattern of a manifest indicating which
packages are available at a location, their versions, and their hashes. To avoid
creating builders for each and every derivation, modules serve as a way for us
to use a single `genericManifestBuilder` to build all redistributables.
## `generic`
The modules in `generic` are reusable components meant to check the shape and
content of NVIDIA's CUDA redistributable manifests, our feature manifests (which
are derived from NVIDIA's manifests), or hand-crafted Nix expressions describing
available packages. They are used by the `genericManifestBuilder` to build CUDA
redistributables.
Generally, each package which relies on manifests or Nix release expressions
will create an alias to the relevant generic module. For example, the [module
for CUDNN](./cudnn/default.nix) aliases the generic module for release
expressions, while the [module for CUDA redistributables](./cuda/default.nix)
aliases the generic module for manifests.
Alternatively, additional fields or values may need to be configured to account
for the particulars of a package. For example, while the release expressions for
[CUDNN](../cudnn/releases.nix) and [TensorRT](../tensorrt/releases.nix) are very
close, they differ slightly in the fields they have. The [module for
CUDNN](./cudnn/default.nix) is able to use the generic module for
release expressions, while the [module for
TensorRT](./tensorrt/default.nix) must add additional fields to the
generic module.
### `manifests`
The modules in `generic/manifests` define the structure of NVIDIA's CUDA
redistributable manifests and our feature manifests.
NVIDIA's redistributable manifests are retrieved from their web server, while
the feature manifests are produced by
[`cuda-redist-find-features`](https://github.com/connorbaker/cuda-redist-find-features).
### `releases`
The modules in `generic/releases` define the structure of our hand-crafted Nix
expressions containing information necessary to download and repackage CUDA
redistributables. These expressions are created when NVIDIA-provided manifests
are unavailable or otherwise unusable. For example, though CUDNN has manifests,
a bug in NVIDIA's CI/CD causes manifests for different versions of CUDA to use
the same name, which leads to the manifests overwriting each other.
### `types`
The modules in `generic/types` define reusable types used in both
`generic/manifests` and `generic/releases`.

View File

@@ -0,0 +1,4 @@
{ options, ... }:
{
options.cuda.manifests = options.generic.manifests;
}

View File

@@ -0,0 +1,12 @@
{ options, ... }:
{
options.cudnn.releases = options.generic.releases;
# TODO(@connorbaker): Figure out how to add additional options to the
# to the generic release.
# {
# url = options.mkOption {
# description = "URL to download the tarball from";
# type = types.str;
# };
# }
}

View File

@@ -0,0 +1,4 @@
{ options, ... }:
{
options.cusparselt.manifests = options.generic.manifests;
}

View File

@@ -0,0 +1,4 @@
{ options, ... }:
{
options.cutensor.manifests = options.generic.manifests;
}

View File

@@ -0,0 +1,11 @@
{
imports = [
./generic
# Always after generic
./cuda
./cudnn
./cusparselt
./cutensor
./tensorrt
];
}

View File

@@ -0,0 +1,7 @@
{
imports = [
./types
./manifests
./releases
];
}

View File

@@ -0,0 +1,7 @@
{ lib, config, ... }:
{
options.generic.manifests = {
feature = import ./feature/manifest.nix { inherit lib config; };
redistrib = import ./redistrib/manifest.nix { inherit lib; };
};
}

View File

@@ -0,0 +1,10 @@
{ lib, config, ... }:
let
inherit (lib) options trivial types;
Release = import ./release.nix { inherit lib config; };
in
options.mkOption {
description = "Feature manifest is an attribute set which includes a mapping from package name to release";
example = trivial.importJSON ../../../../cuda/manifests/feature_11.8.0.json;
type = types.attrsOf Release.type;
}

View File

@@ -0,0 +1,60 @@
{ lib, ... }:
let
inherit (lib) options types;
in
# https://github.com/ConnorBaker/cuda-redist-find-features/blob/603407bea2fab47f2dfcd88431122a505af95b42/cuda_redist_find_features/manifest/feature/package/package.py
options.mkOption {
description = "Set of outputs that a package can provide";
example = {
bin = true;
dev = true;
doc = false;
lib = false;
sample = false;
static = false;
};
type = types.submodule {
options = {
bin = options.mkOption {
description = "`bin` output requires that we have a non-empty `bin` directory containing at least one file with the executable bit set";
type = types.bool;
};
dev = options.mkOption {
description = ''
A `dev` output requires that we have at least one of the following non-empty directories:
- `include`
- `lib/pkgconfig`
- `share/pkgconfig`
- `lib/cmake`
- `share/aclocal`
'';
type = types.bool;
};
doc = options.mkOption {
description = ''
A `doc` output requires that we have at least one of the following non-empty directories:
- `share/info`
- `share/doc`
- `share/gtk-doc`
- `share/devhelp`
- `share/man`
'';
type = types.bool;
};
lib = options.mkOption {
description = "`lib` output requires that we have a non-empty lib directory containing at least one shared library";
type = types.bool;
};
sample = options.mkOption {
description = "`sample` output requires that we have a non-empty `samples` directory";
type = types.bool;
};
static = options.mkOption {
description = "`static` output requires that we have a non-empty lib directory containing at least one static library";
type = types.bool;
};
};
};
}

View File

@@ -0,0 +1,10 @@
{ lib, ... }:
let
inherit (lib) options types;
Outputs = import ./outputs.nix { inherit lib; };
in
options.mkOption {
description = "Package in the manifest";
example = (import ./release.nix { inherit lib; }).linux-x86_64;
type = types.submodule { options.outputs = Outputs; };
}

View File

@@ -0,0 +1,10 @@
{ lib, config, ... }:
let
inherit (lib) options types;
Package = import ./package.nix { inherit lib config; };
in
options.mkOption {
description = "Release is an attribute set which includes a mapping from platform to package";
example = (import ./manifest.nix { inherit lib; }).cuda_cccl;
type = types.attrsOf Package.type;
}

View File

@@ -0,0 +1,33 @@
{ lib, ... }:
let
inherit (lib) options trivial types;
Release = import ./release.nix { inherit lib; };
in
options.mkOption {
description = "Redistributable manifest is an attribute set which includes a mapping from package name to release";
example = trivial.importJSON ../../../../cuda/manifests/redistrib_11.8.0.json;
type = types.submodule {
# Allow any attribute name as these will be the package names
freeformType = types.attrsOf Release.type;
options = {
release_date = options.mkOption {
description = "Release date of the manifest";
type = types.nullOr types.str;
default = null;
example = "2023-08-29";
};
release_label = options.mkOption {
description = "Release label of the manifest";
type = types.nullOr types.str;
default = null;
example = "12.2.2";
};
release_product = options.mkOption {
example = "cuda";
description = "Release product of the manifest";
type = types.nullOr types.str;
default = null;
};
};
};
}

View File

@@ -0,0 +1,32 @@
{ lib, ... }:
let
inherit (lib) options types;
in
options.mkOption {
description = "Package in the manifest";
example = (import ./release.nix { inherit lib; }).linux-x86_64;
type = types.submodule {
options = {
relative_path = options.mkOption {
description = "Relative path to the package";
example = "cuda_cccl/linux-x86_64/cuda_cccl-linux-x86_64-11.5.62-archive.tar.xz";
type = types.str;
};
sha256 = options.mkOption {
description = "Sha256 hash of the package";
example = "bbe633d6603d5a96a214dcb9f3f6f6fd2fa04d62e53694af97ae0c7afe0121b0";
type = types.str;
};
md5 = options.mkOption {
description = "Md5 hash of the package";
example = "e5deef4f6cb71f14aac5be5d5745dafe";
type = types.str;
};
size = options.mkOption {
description = "Size of the package as a string";
type = types.str;
example = "960968";
};
};
};
}

View File

@@ -0,0 +1,36 @@
{ lib, ... }:
let
inherit (lib) options types;
Package = import ./package.nix { inherit lib; };
in
options.mkOption {
description = "Release is an attribute set which includes a mapping from platform to package";
example = (import ./manifest.nix { inherit lib; }).cuda_cccl;
type = types.submodule {
# Allow any attribute name as these will be the platform names
freeformType = types.attrsOf Package.type;
options = {
name = options.mkOption {
description = "Full name of the package";
example = "CXX Core Compute Libraries";
type = types.str;
};
license = options.mkOption {
description = "License of the package";
example = "CUDA Toolkit";
type = types.str;
};
license_path = options.mkOption {
description = "Path to the license of the package";
example = "cuda_cccl/LICENSE.txt";
default = null;
type = types.nullOr types.str;
};
version = options.mkOption {
description = "Version of the package";
example = "11.5.62";
type = types.str;
};
};
};
}

View File

@@ -0,0 +1,45 @@
{ lib, config, ... }:
let
inherit (config.generic.types) majorMinorVersion majorMinorPatchBuildVersion;
inherit (lib) options types;
in
{
options.generic.releases = options.mkOption {
description = "Collection of packages targeting different platforms";
type =
let
Package = options.mkOption {
description = "Package for a specific platform";
example = {
version = "8.0.3.4";
minCudaVersion = "10.2";
maxCudaVersion = "10.2";
hash = "sha256-LxcXgwe1OCRfwDsEsNLIkeNsOcx3KuF5Sj+g2dY6WD0=";
};
type = types.submodule {
# TODO(@connorbaker): Figure out how to extend option sets.
freeformType = types.attrsOf types.anything;
options = {
version = options.mkOption {
description = "Version of the package";
type = majorMinorPatchBuildVersion;
};
minCudaVersion = options.mkOption {
description = "Minimum CUDA version supported";
type = majorMinorVersion;
};
maxCudaVersion = options.mkOption {
description = "Maximum CUDA version supported";
type = majorMinorVersion;
};
hash = options.mkOption {
description = "Hash of the tarball";
type = types.str;
};
};
};
};
in
types.attrsOf (types.listOf Package.type);
};
}

View File

@@ -0,0 +1,39 @@
{ lib, ... }:
let
inherit (lib) options types;
in
{
options.generic.types = options.mkOption {
type = types.attrsOf types.optionType;
default = { };
description = "Set of generic types";
};
config.generic.types = {
cudaArch = types.strMatching "^sm_[[:digit:]]+[a-z]?$" // {
name = "cudaArch";
description = "CUDA architecture name";
};
# https://github.com/ConnorBaker/cuda-redist-find-features/blob/c841980e146f8664bbcd0ba1399e486b7910617b/cuda_redist_find_features/types/_lib_so_name.py
libSoName = types.strMatching ".*\\.so(\\.[[:digit:]]+)*$" // {
name = "libSoName";
description = "Name of a shared object file";
};
majorMinorVersion = types.strMatching "^([[:digit:]]+)\\.([[:digit:]]+)$" // {
name = "majorMinorVersion";
description = "Version number with a major and minor component";
};
majorMinorPatchVersion = types.strMatching "^([[:digit:]]+)\\.([[:digit:]]+)\\.([[:digit:]]+)$" // {
name = "majorMinorPatchVersion";
description = "Version number with a major, minor, and patch component";
};
majorMinorPatchBuildVersion =
types.strMatching "^([[:digit:]]+)\\.([[:digit:]]+)\\.([[:digit:]]+)\\.([[:digit:]]+)$"
// {
name = "majorMinorPatchBuildVersion";
description = "Version number with a major, minor, patch, and build component";
};
};
}

View File

@@ -0,0 +1,16 @@
{ options, ... }:
{
options.tensorrt.releases = options.generic.releases;
# TODO(@connorbaker): Figure out how to add additional options to the
# to the generic release.
# {
# cudnnVersion = lib.options.mkOption {
# description = "CUDNN version supported";
# type = types.nullOr majorMinorVersion;
# };
# filename = lib.options.mkOption {
# description = "Tarball name";
# type = types.str;
# };
# }
}

View File

@@ -0,0 +1,27 @@
# shellcheck shell=bash
# Patch all dynamically linked, ELF files with the CUDA driver (libcuda.so)
# coming from the cuda_compat package by adding it to the RUNPATH.
echo "Sourcing auto-add-cuda-compat-runpath-hook"
addCudaCompatRunpath() {
local libPath
local origRpath
if [[ $# -eq 0 ]]; then
echo "addCudaCompatRunpath: no library path provided" >&2
exit 1
elif [[ $# -gt 1 ]]; then
echo "addCudaCompatRunpath: too many arguments" >&2
exit 1
elif [[ "$1" == "" ]]; then
echo "addCudaCompatRunpath: empty library path" >&2
exit 1
else
libPath="$1"
fi
origRpath="$(patchelf --print-rpath "$libPath")"
patchelf --set-rpath "@libcudaPath@:$origRpath" "$libPath"
}
postFixupHooks+=("autoFixElfFiles addCudaCompatRunpath")

View File

@@ -0,0 +1,29 @@
# autoAddCudaCompatRunpath hook must be added AFTER `setupCudaHook`. Both
# hooks prepend a path with `libcuda.so` to the `DT_RUNPATH` section of
# patched elf files, but `cuda_compat` path must take precedence (otherwise,
# it doesn't have any effect) and thus appear first. Meaning this hook must be
# executed last.
{
autoFixElfFiles,
cuda_compat,
makeSetupHook,
}:
makeSetupHook {
name = "auto-add-cuda-compat-runpath-hook";
propagatedBuildInputs = [ autoFixElfFiles ];
substitutions = {
libcudaPath = "${cuda_compat}/compat";
};
meta =
let
# Handle `null`s in pre-`cuda_compat` releases,
# and `badPlatform`s for `!isJetsonBuild`.
platforms = cuda_compat.meta.platforms or [ ];
badPlatforms = cuda_compat.meta.badPlatforms or platforms;
in
{
inherit badPlatforms platforms;
};
} ./auto-add-cuda-compat-runpath.sh

View File

@@ -0,0 +1,154 @@
# This is what nvcc uses as a backend,
# and it has to be an officially supported one (e.g. gcc14 for cuda12).
#
# It, however, propagates current stdenv's libstdc++ to avoid "GLIBCXX_* not found errors"
# when linked with other C++ libraries.
# E.g. for cudaPackages_12_9 we use gcc14 with gcc's libstdc++
# Cf. https://github.com/NixOS/nixpkgs/pull/218265 for context
{
config,
_cuda,
cudaMajorMinorVersion,
lib,
pkgs,
stdenv,
stdenvAdapters,
}:
let
inherit (builtins) toJSON;
inherit (_cuda.db) allSortedCudaCapabilities cudaCapabilityToInfo nvccCompatibilities;
inherit (_cuda.lib)
_cudaCapabilityIsDefault
_cudaCapabilityIsSupported
_evaluateAssertions
getRedistSystem
mkVersionedName
;
inherit (lib) addErrorContext;
inherit (lib.customisation) extendDerivation;
inherit (lib.lists) filter intersectLists subtractLists;
# NOTE: By virtue of processing a sorted list (allSortedCudaCapabilities), our groups will be sorted.
architectureSpecificCudaCapabilities = filter (
cudaCapability: cudaCapabilityToInfo.${cudaCapability}.isArchitectureSpecific
) allSortedCudaCapabilities;
familySpecificCudaCapabilities = filter (
cudaCapability: cudaCapabilityToInfo.${cudaCapability}.isFamilySpecific
) allSortedCudaCapabilities;
jetsonCudaCapabilities = filter (
cudaCapability: cudaCapabilityToInfo.${cudaCapability}.isJetson
) allSortedCudaCapabilities;
passthruExtra = {
nvccHostCCMatchesStdenvCC = backendStdenv.cc == stdenv.cc;
# The Nix system of the host platform.
hostNixSystem = stdenv.hostPlatform.system;
# The Nix system of the host platform for the CUDA redistributable.
hostRedistSystem = getRedistSystem passthruExtra.hasJetsonCudaCapability stdenv.hostPlatform.system;
# Sets whether packages should be built with forward compatibility.
# TODO(@connorbaker): If the requested CUDA capabilities are not supported by the current CUDA version,
# should we throw an evaluation warning and build with forward compatibility?
cudaForwardCompat = config.cudaForwardCompat or true;
# CUDA capabilities which are supported by the current CUDA version.
supportedCudaCapabilities = filter (
cudaCapability:
_cudaCapabilityIsSupported cudaMajorMinorVersion cudaCapabilityToInfo.${cudaCapability}
) allSortedCudaCapabilities;
# Find the default set of capabilities for this CUDA version using the list of supported capabilities.
# Includes only baseline capabilities.
defaultCudaCapabilities = filter (
cudaCapability:
_cudaCapabilityIsDefault cudaMajorMinorVersion cudaCapabilityToInfo.${cudaCapability}
) passthruExtra.supportedCudaCapabilities;
# The resolved requested or default CUDA capabilities.
cudaCapabilities =
if config.cudaCapabilities or [ ] != [ ] then
config.cudaCapabilities
else
passthruExtra.defaultCudaCapabilities;
# Requested architecture-specific CUDA capabilities.
requestedArchitectureSpecificCudaCapabilities = intersectLists architectureSpecificCudaCapabilities passthruExtra.cudaCapabilities;
# Whether the requested CUDA capabilities include architecture-specific CUDA capabilities.
hasArchitectureSpecificCudaCapability =
passthruExtra.requestedArchitectureSpecificCudaCapabilities != [ ];
# Requested family-specific CUDA capabilities.
requestedFamilySpecificCudaCapabilities = intersectLists familySpecificCudaCapabilities passthruExtra.cudaCapabilities;
# Whether the requested CUDA capabilities include family-specific CUDA capabilities.
hasFamilySpecificCudaCapability = passthruExtra.requestedFamilySpecificCudaCapabilities != [ ];
# Requested Jetson CUDA capabilities.
requestedJetsonCudaCapabilities = intersectLists jetsonCudaCapabilities passthruExtra.cudaCapabilities;
# Whether the requested CUDA capabilities include Jetson CUDA capabilities.
hasJetsonCudaCapability = passthruExtra.requestedJetsonCudaCapabilities != [ ];
};
assertions =
let
# Jetson devices cannot be targeted by the same binaries which target non-Jetson devices. While
# NVIDIA provides both `linux-aarch64` and `linux-sbsa` packages, which both target `aarch64`,
# they are built with different settings and cannot be mixed.
jetsonMesssagePrefix = "Jetson CUDA capabilities (${toJSON passthruExtra.requestedJetsonCudaCapabilities})";
# Remove all known capabilities from the user's list to find unrecognized capabilities.
unrecognizedCudaCapabilities = subtractLists allSortedCudaCapabilities passthruExtra.cudaCapabilities;
# Remove all supported capabilities from the user's list to find unsupported capabilities.
unsupportedCudaCapabilities = subtractLists passthruExtra.supportedCudaCapabilities passthruExtra.cudaCapabilities;
in
[
{
message = "Unrecognized CUDA capabilities: ${toJSON unrecognizedCudaCapabilities}";
assertion = unrecognizedCudaCapabilities == [ ];
}
{
message = "Unsupported CUDA capabilities: ${toJSON unsupportedCudaCapabilities}";
assertion = unsupportedCudaCapabilities == [ ];
}
{
message =
"${jetsonMesssagePrefix} require hostPlatform (currently ${passthruExtra.hostNixSystem}) "
+ "to be aarch64-linux";
assertion = passthruExtra.hasJetsonCudaCapability -> passthruExtra.hostNixSystem == "aarch64-linux";
}
{
message =
let
# Find the capabilities which are not Jetson capabilities.
requestedNonJetsonCudaCapabilities = subtractLists (
passthruExtra.requestedJetsonCudaCapabilities
++ passthruExtra.requestedArchitectureSpecificCudaCapabilities
++ passthruExtra.requestedFamilySpecificCudaCapabilities
) passthruExtra.cudaCapabilities;
in
"${jetsonMesssagePrefix} cannot be specified with non-Jetson capabilities "
+ "(${toJSON requestedNonJetsonCudaCapabilities})";
assertion =
passthruExtra.hasJetsonCudaCapability
-> passthruExtra.requestedJetsonCudaCapabilities == passthruExtra.cudaCapabilities;
}
];
assertCondition = addErrorContext "while evaluating ${mkVersionedName "cudaPackages" cudaMajorMinorVersion}.backendStdenv" (
_evaluateAssertions assertions
);
backendStdenv =
stdenvAdapters.useLibsFrom stdenv
pkgs."gcc${nvccCompatibilities.${cudaMajorMinorVersion}.gcc.maxMajorVersion}Stdenv";
in
# TODO: Consider testing whether we in fact use the newer libstdc++
extendDerivation assertCondition passthruExtra backendStdenv

View File

@@ -0,0 +1,30 @@
From eeef96e91bd3453160315bf4618b7b91ae7240ba Mon Sep 17 00:00:00 2001
From: Connor Baker <ConnorBaker01@gmail.com>
Date: Sat, 18 Jan 2025 20:48:11 +0000
Subject: [PATCH 1/4] cmake: float out common python bindings option
---
CMakeLists.txt | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9739569..8944621 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -5,12 +5,11 @@ project(cudnn_frontend VERSION 1.9.0)
option(CUDNN_FRONTEND_SKIP_JSON_LIB "Defines whether FE should not include nlohmann/json.hpp." OFF)
option(CUDNN_FRONTEND_BUILD_SAMPLES "Defines if samples are built or not." ON)
option(CUDNN_FRONTEND_BUILD_TESTS "Defines if unittests are built or not." ON)
+option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." OFF)
if(MSVC OR MSYS OR MINGW)
- option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." OFF)
add_compile_options(/W4 /WX)
else()
- option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." OFF)
add_compile_options(-Wall -Wextra -Wpedantic -Werror -Wno-error=attributes -Wno-attributes -Wno-error=unused-function -Wno-unused-function)
endif()
--
2.47.0

View File

@@ -0,0 +1,84 @@
From da16ec51ea78f88f333ecf3df2a249fcc65ead24 Mon Sep 17 00:00:00 2001
From: Connor Baker <ConnorBaker01@gmail.com>
Date: Sat, 18 Jan 2025 22:01:03 +0000
Subject: [PATCH 2/4] cmake: add config so headers can be discovered when
installed
---
CMakeLists.txt | 39 +++++++++++++++++++++++++++++++---
cudnn_frontend-config.cmake.in | 3 +++
2 files changed, 39 insertions(+), 3 deletions(-)
create mode 100644 cudnn_frontend-config.cmake.in
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 8944621..9b1bfba 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,4 +1,4 @@
-cmake_minimum_required(VERSION 3.17)
+cmake_minimum_required(VERSION 3.23)
project(cudnn_frontend VERSION 1.9.0)
@@ -15,6 +15,15 @@ endif()
add_library(cudnn_frontend INTERFACE)
+# Add header files to library
+file(GLOB_RECURSE CUDNN_FRONTEND_INCLUDE_FILES "include/*")
+target_sources(
+ cudnn_frontend PUBLIC FILE_SET HEADERS
+ BASE_DIRS "$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
+ FILES "${CUDNN_FRONTEND_INCLUDE_FILES}"
+)
+unset(CUDNN_FRONTEND_INCLUDE_FILES)
+
target_compile_definitions(
cudnn_frontend INTERFACE
$<$<BOOL:${CUDNN_FRONTEND_SKIP_JSON_LIB}>:CUDNN_FRONTEND_SKIP_JSON_LIB>
@@ -58,7 +67,31 @@ endif()
# * CMAKE_INSTALL_INCLUDEDIR
include(GNUInstallDirs)
+# See https://cmake.org/cmake/help/latest/module/CMakePackageConfigHelpers.html#example-generating-package-files
+include(CMakePackageConfigHelpers)
+
+# Install and export the header files
+install(
+ TARGETS cudnn_frontend
+ EXPORT cudnn_frontend_targets FILE_SET HEADERS
+)
+export(
+ EXPORT cudnn_frontend_targets
+ FILE "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend/cudnn_frontend-targets.cmake"
+)
+install(
+ EXPORT cudnn_frontend_targets
+ FILE cudnn_frontend-targets.cmake
+ DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
+)
+
+# Install the CMake configuration file for header discovery
+configure_package_config_file(
+ cudnn_frontend-config.cmake.in
+ "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend-config.cmake"
+ INSTALL_DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
+)
install(
- DIRECTORY ${PROJECT_SOURCE_DIR}/include/
- DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
+ FILES "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend-config.cmake"
+ DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
)
diff --git a/cudnn_frontend-config.cmake.in b/cudnn_frontend-config.cmake.in
new file mode 100644
index 0000000..8b2d843
--- /dev/null
+++ b/cudnn_frontend-config.cmake.in
@@ -0,0 +1,3 @@
+@PACKAGE_INIT@
+
+include(${CMAKE_CURRENT_LIST_DIR}/cudnn_frontend-targets.cmake)
--
2.47.0

View File

@@ -0,0 +1,85 @@
From 53d5aaaad09b479cd8c0e148c9428baa33204024 Mon Sep 17 00:00:00 2001
From: Connor Baker <ConnorBaker01@gmail.com>
Date: Sat, 18 Jan 2025 22:10:41 +0000
Subject: [PATCH 3/4] cmake: install samples and tests when built
---
CMakeLists.txt | 12 +++++++++++-
samples/cpp/CMakeLists.txt | 2 ++
samples/legacy_samples/CMakeLists.txt | 2 ++
test/cpp/CMakeLists.txt | 2 ++
4 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9b1bfba..f6af111 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -70,11 +70,21 @@ include(GNUInstallDirs)
# See https://cmake.org/cmake/help/latest/module/CMakePackageConfigHelpers.html#example-generating-package-files
include(CMakePackageConfigHelpers)
-# Install and export the header files
+# Install the components
install(
TARGETS cudnn_frontend
EXPORT cudnn_frontend_targets FILE_SET HEADERS
)
+
+if (CUDNN_FRONTEND_BUILD_SAMPLES)
+ install(TARGETS legacy_samples samples RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
+endif()
+
+if (CUDNN_FRONTEND_BUILD_TESTS)
+ install(TARGETS tests RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
+endif()
+
+# Export the targets
export(
EXPORT cudnn_frontend_targets
FILE "${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend/cudnn_frontend-targets.cmake"
diff --git a/samples/cpp/CMakeLists.txt b/samples/cpp/CMakeLists.txt
index 9b8a5eb..01b09bb 100644
--- a/samples/cpp/CMakeLists.txt
+++ b/samples/cpp/CMakeLists.txt
@@ -69,8 +69,10 @@ target_link_libraries(
_cudnn_frontend_pch
CUDNN::cudnn
+ CUDA::cublasLt
CUDA::cudart
CUDA::cuda_driver # Needed as calls all CUDA calls will eventually move to driver
+ CUDA::nvrtc
)
# target cmake properties
diff --git a/samples/legacy_samples/CMakeLists.txt b/samples/legacy_samples/CMakeLists.txt
index 019f17c..3b56329 100644
--- a/samples/legacy_samples/CMakeLists.txt
+++ b/samples/legacy_samples/CMakeLists.txt
@@ -44,7 +44,9 @@ target_link_libraries(
_cudnn_frontend_pch
CUDNN::cudnn
+ CUDA::cublasLt
CUDA::cudart
+ CUDA::nvrtc
)
# target cmake properties
diff --git a/test/cpp/CMakeLists.txt b/test/cpp/CMakeLists.txt
index e244cd0..2750294 100644
--- a/test/cpp/CMakeLists.txt
+++ b/test/cpp/CMakeLists.txt
@@ -55,7 +55,9 @@ target_link_libraries(
CUDNN::cudnn
+ CUDA::cublasLt
CUDA::cudart
+ CUDA::nvrtc
)
# cuDNN dlopen's its libraries
--
2.47.0

View File

@@ -0,0 +1,591 @@
From 4ce40a0c3de0e8a7065caf1cf59a90493e084682 Mon Sep 17 00:00:00 2001
From: Connor Baker <ConnorBaker01@gmail.com>
Date: Sat, 18 Jan 2025 22:22:21 +0000
Subject: [PATCH 4/4] samples: fix instances of maybe-uninitialized
---
samples/cpp/convolution/dgrads.cpp | 6 +++---
samples/cpp/convolution/fp8_fprop.cpp | 2 +-
samples/cpp/convolution/fprop.cpp | 10 +++++-----
samples/cpp/convolution/int8_fprop.cpp | 2 +-
samples/cpp/convolution/wgrads.cpp | 4 ++--
samples/cpp/matmul/fp8_matmul.cpp | 2 +-
samples/cpp/matmul/int8_matmul.cpp | 2 +-
samples/cpp/matmul/matmuls.cpp | 8 ++++----
samples/cpp/matmul/mixed_matmul.cpp | 2 +-
samples/cpp/misc/pointwise.cpp | 6 +++---
samples/cpp/misc/resample.cpp | 6 +++---
samples/cpp/misc/serialization.cpp | 4 ++--
samples/cpp/misc/slice.cpp | 2 +-
samples/cpp/misc/sm_carveout.cpp | 2 +-
samples/cpp/norm/batchnorm.cpp | 8 ++++----
samples/cpp/norm/layernorm.cpp | 8 ++++----
samples/cpp/norm/rmsnorm.cpp | 6 +++---
samples/cpp/sdpa/fp16_bwd.cpp | 2 +-
samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp | 2 +-
samples/cpp/sdpa/fp16_cached.cpp | 2 +-
samples/cpp/sdpa/fp16_fwd.cpp | 2 +-
samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp | 2 +-
samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp | 2 +-
samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp | 2 +-
samples/cpp/sdpa/fp8_bwd.cpp | 4 ++--
samples/cpp/sdpa/fp8_fwd.cpp | 2 +-
26 files changed, 50 insertions(+), 50 deletions(-)
diff --git a/samples/cpp/convolution/dgrads.cpp b/samples/cpp/convolution/dgrads.cpp
index 589cb5f..f66abf4 100644
--- a/samples/cpp/convolution/dgrads.cpp
+++ b/samples/cpp/convolution/dgrads.cpp
@@ -65,7 +65,7 @@ TEST_CASE("Convolution Dgrad", "[dgrad][graph]") {
Surface<half> w_tensor(64 * 32 * 3 * 3, false);
Surface<half> dx_tensor(4 * 32 * 16 * 16, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -122,7 +122,7 @@ TEST_CASE("Dgrad Drelu Graph", "[dgrad][graph]") {
Surface<half> x_tensor(4 * 32 * 16 * 16, false);
Surface<half> dx_tensor(4 * 32 * 16 * 16, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -234,7 +234,7 @@ TEST_CASE("Dgrad Drelu DBNweight Graph", "[dgrad][graph]") {
Surface<float> eq_scale_x_tensor(1 * 32 * 1 * 1, false);
Surface<float> eq_bias_tensor(1 * 32 * 1 * 1, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/convolution/fp8_fprop.cpp b/samples/cpp/convolution/fp8_fprop.cpp
index dfcb7e2..8246ce4 100644
--- a/samples/cpp/convolution/fp8_fprop.cpp
+++ b/samples/cpp/convolution/fp8_fprop.cpp
@@ -116,7 +116,7 @@ TEST_CASE("Convolution fp8 precision", "[conv][graph]") {
Surface<float> Y_scale_gpu(1, false);
Surface<float> amax_gpu(1, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/convolution/fprop.cpp b/samples/cpp/convolution/fprop.cpp
index bc1aaf0..d61fa4e 100644
--- a/samples/cpp/convolution/fprop.cpp
+++ b/samples/cpp/convolution/fprop.cpp
@@ -80,7 +80,7 @@ TEST_CASE("Convolution fprop", "[conv][graph][caching]") {
std::unordered_map<int64_t, void *> variant_pack = {
{X->get_uid(), x_tensor.devPtr}, {W->get_uid(), w_tensor.devPtr}, {Y->get_uid(), y_tensor.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -303,7 +303,7 @@ TEST_CASE("CSBR Graph", "[conv][graph][caching]") {
Surface<half> b_tensor(k, false);
Surface<half> y_tensor(n * k * h * w, false); // Should be p, q.
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -550,7 +550,7 @@ TEST_CASE("SBRCS", "[conv][genstats][graph]") {
{SUM, sum_tensor.devPtr},
{SQ_SUM, sq_sum_tensor.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -651,7 +651,7 @@ TEST_CASE("CBR Graph NCHW", "[conv][graph][caching]") {
Surface<half> y_tensor(n * k * h * w, false); // Should be p, q.
Surface<half> z_tensor(n * k * h * w, false); // Should be p, q.
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -734,7 +734,7 @@ TEST_CASE("Convolution fprop large", "[conv][graph][caching]") {
std::unordered_map<int64_t, void *> variant_pack = {
{X->get_uid(), x_tensor.devPtr}, {W->get_uid(), w_tensor.devPtr}, {Y->get_uid(), y_tensor.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/convolution/int8_fprop.cpp b/samples/cpp/convolution/int8_fprop.cpp
index 3d5ac2f..e9248f5 100644
--- a/samples/cpp/convolution/int8_fprop.cpp
+++ b/samples/cpp/convolution/int8_fprop.cpp
@@ -94,7 +94,7 @@ TEST_CASE("Conv with Int8 datatypes", "[conv][graph][caching]") {
std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {
{X, x_tensor.devPtr}, {W, w_tensor.devPtr}, {Y, y_tensor.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/convolution/wgrads.cpp b/samples/cpp/convolution/wgrads.cpp
index 2c58b26..26887dc 100644
--- a/samples/cpp/convolution/wgrads.cpp
+++ b/samples/cpp/convolution/wgrads.cpp
@@ -64,7 +64,7 @@ TEST_CASE("Convolution Wgrad", "[wgrad][graph][wgrad][Conv_wgrad]") {
Surface<half> dy_tensor(4 * 64 * 16 * 16, false);
Surface<half> dw_tensor(64 * 64 * 3 * 3, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -137,7 +137,7 @@ TEST_CASE("scale-bias-relu-wgrad Graph", "[wgrad][graph][scale-bias-relu-wgrad][
Surface<half> dy_tensor(4 * 64 * 16 * 16, false);
Surface<half> dw_tensor(64 * 64 * 3 * 3, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/matmul/fp8_matmul.cpp b/samples/cpp/matmul/fp8_matmul.cpp
index c6470cd..f32c627 100644
--- a/samples/cpp/matmul/fp8_matmul.cpp
+++ b/samples/cpp/matmul/fp8_matmul.cpp
@@ -115,7 +115,7 @@ TEST_CASE("Matmul fp8 precision", "[matmul][graph]") {
REQUIRE(graph.build_plans(handle, fe::BuildPlanPolicy_t::HEURISTICS_CHOICE).is_good());
Surface<float> C_gpu(b * m * n, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/matmul/int8_matmul.cpp b/samples/cpp/matmul/int8_matmul.cpp
index cf4353a..cb3ce34 100644
--- a/samples/cpp/matmul/int8_matmul.cpp
+++ b/samples/cpp/matmul/int8_matmul.cpp
@@ -104,7 +104,7 @@ TEST_CASE("Int8 Matmul", "[matmul][graph]") {
// note this is a bf16 tensor, but half is used just for memory allocation
Surface<float> C_gpu(b * m * n, false);
Surface<float> Bias_gpu(b * m * n, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/matmul/matmuls.cpp b/samples/cpp/matmul/matmuls.cpp
index ed0f10b..5c95713 100644
--- a/samples/cpp/matmul/matmuls.cpp
+++ b/samples/cpp/matmul/matmuls.cpp
@@ -250,7 +250,7 @@ TEST_CASE("Matmul", "[matmul][graph]") {
// Run cudnn graph
Surface<float> C_gpu(b * m * n, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -319,7 +319,7 @@ TEST_CASE("Abs + Matmul", "[matmul][graph]") {
// Run cudnn graph
Surface<float> C_gpu(b * m * n, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -539,7 +539,7 @@ TEST_CASE("Matmul SBR Graph", "[matmul][graph]") {
auto [graph, A, B, bias, scale, O] = lookup_cache_or_build_graph(
handle, x_tensor.devPtr, w_tensor.devPtr, s_tensor.devPtr, b_tensor.devPtr, y_tensor.devPtr);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -606,7 +606,7 @@ TEST_CASE("Matmul with restricted shared memory", "[matmul][graph]") {
// Run cudnn graph
Surface<float> C_gpu(b * m * n, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/matmul/mixed_matmul.cpp b/samples/cpp/matmul/mixed_matmul.cpp
index ab3e195..a2b05bd 100644
--- a/samples/cpp/matmul/mixed_matmul.cpp
+++ b/samples/cpp/matmul/mixed_matmul.cpp
@@ -96,7 +96,7 @@ TEST_CASE("Mixed Precision Matmul", "[matmul][graph]") {
//// Run cudnn graph
// note this is a bf16 tensor, but half is used just for memory allocation
Surface<half> C_gpu(b * m * n, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/misc/pointwise.cpp b/samples/cpp/misc/pointwise.cpp
index 8f8d699..e8f4cb1 100644
--- a/samples/cpp/misc/pointwise.cpp
+++ b/samples/cpp/misc/pointwise.cpp
@@ -51,7 +51,7 @@ TEST_CASE("Reduction", "[reduction]") {
Surface<float> C_gpu(n * n * n * n, false);
std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {{A, A_gpu.devPtr},
{C, C_gpu.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -88,7 +88,7 @@ TEST_CASE("Fused scalar", "[scalar][graph]") {
std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {{A, A_gpu.devPtr},
{C, C_gpu.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -148,7 +148,7 @@ TEST_CASE("Fused Amax Reduction and type conversion", "[reduction]") {
std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {
{A, A_gpu.devPtr}, {scale, scale_gpu.devPtr}, {amax, amax_gpu.devPtr}, {C, C_gpu.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/misc/resample.cpp b/samples/cpp/misc/resample.cpp
index 3f782e7..21998c3 100644
--- a/samples/cpp/misc/resample.cpp
+++ b/samples/cpp/misc/resample.cpp
@@ -69,7 +69,7 @@ TEST_CASE("Resample Max Pooling NHWC Inference", "[resample][pooling][max][graph
Surface<half> Y_gpu(N * H * W * C, false);
std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {{X, X_gpu.devPtr},
{Y, Y_gpu.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -132,7 +132,7 @@ TEST_CASE("Resample Max Pooling NHWC Training", "[resample][pooling][max][graph]
Surface<int8_t> Index_gpu(N * H * W * C / 8, false);
std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {
{X, X_gpu.devPtr}, {Y, Y_gpu.devPtr}, {Index, Index_gpu.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -186,7 +186,7 @@ TEST_CASE("Resample Avg Pooling", "[resample][pooling][average][graph]") {
Surface<half> Y_gpu(N * H * W * C, false);
std::unordered_map<std::shared_ptr<fe::graph::Tensor_attributes>, void*> variant_pack = {{X, X_gpu.devPtr},
{Y, Y_gpu.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/misc/serialization.cpp b/samples/cpp/misc/serialization.cpp
index a130406..278bad8 100644
--- a/samples/cpp/misc/serialization.cpp
+++ b/samples/cpp/misc/serialization.cpp
@@ -178,7 +178,7 @@ TEST_CASE("CSBR Graph with serialization", "[conv][graph][serialization]") {
Surface<half> b_device_memory(k, false);
Surface<half> y_device_memory(n * k * h * w, false); // Should be p, q.
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -401,7 +401,7 @@ TEST_CASE("SDPA Graph with serialization", "[sdpa][graph][serialization]") {
Surface<int32_t> dropoutSeed(scaleSize, false, seed_value);
Surface<int32_t> dropoutOffset(scaleSize, false, (int32_t)1);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/misc/slice.cpp b/samples/cpp/misc/slice.cpp
index 087ba36..78962c6 100644
--- a/samples/cpp/misc/slice.cpp
+++ b/samples/cpp/misc/slice.cpp
@@ -80,7 +80,7 @@ TEST_CASE("Slice gemm", "[slice][gemm][graph][fusion]") {
Surface<half> C_gpu(B * M * N, false);
std::unordered_map<int64_t, void *> variant_pack = {
{a_uid, A_gpu.devPtr}, {b_uid, B_gpu.devPtr}, {c_uid, C_gpu.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/misc/sm_carveout.cpp b/samples/cpp/misc/sm_carveout.cpp
index d6818c0..b0e0651 100644
--- a/samples/cpp/misc/sm_carveout.cpp
+++ b/samples/cpp/misc/sm_carveout.cpp
@@ -121,7 +121,7 @@ TEST_CASE("SGBN with SM carveout", "[batchnorm][graph][sm_carveout]") {
Surface<float> Peer_stats_0_tensor(2 * 4 * c, false, true);
Surface<float> Peer_stats_1_tensor(2 * 4 * c, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/norm/batchnorm.cpp b/samples/cpp/norm/batchnorm.cpp
index 5949365..a91a9bd 100644
--- a/samples/cpp/norm/batchnorm.cpp
+++ b/samples/cpp/norm/batchnorm.cpp
@@ -96,7 +96,7 @@ TEST_CASE("BN Finalize Graph", "[batchnorm][graph]") {
Surface<float> eq_scale_tensor(32, false);
Surface<float> eq_bias_tensor(32, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -226,7 +226,7 @@ TEST_CASE("SGBN Add Relu Graph", "[batchnorm][graph]") {
Surface<float> Peer_stats_0_tensor(2 * 4 * 32, false, true);
Surface<float> Peer_stats_1_tensor(2 * 4 * 32, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -346,7 +346,7 @@ TEST_CASE("DBN Add Relu Graph", "[BN][graph][backward]") {
Surface<float> Peer_stats_0_tensor(2 * 4 * 32, false, true);
Surface<float> Peer_stats_1_tensor(2 * 4 * 32, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -454,7 +454,7 @@ TEST_CASE("BN_inference DRelu DBN Graph", "[Batchnorm][graph][backward]") {
Surface<float> Dbias_tensor(32, false);
Surface<half> DX_tensor(4 * 32 * 16 * 16, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/norm/layernorm.cpp b/samples/cpp/norm/layernorm.cpp
index bac996f..7f69f34 100644
--- a/samples/cpp/norm/layernorm.cpp
+++ b/samples/cpp/norm/layernorm.cpp
@@ -133,7 +133,7 @@ layernorm_fwd_dynamic_shapes(bool train = true) {
Surface<float> Mean_tensor(max_stats_volume, false);
Surface<float> Var_tensor(max_stats_volume, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -232,7 +232,7 @@ TEST_CASE("LayerNorm Training", "[layernorm][graph]") {
Surface<float> Bias_tensor(hidden_size, false);
Surface<half> Y_tensor(batch_size * seq_length * hidden_size, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -310,7 +310,7 @@ TEST_CASE("LayerNorm Inference", "[layernorm][graph]") {
Surface<float> Bias_tensor(hidden_size, false);
Surface<half> Y_tensor(batch_size * seq_length * hidden_size, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -392,7 +392,7 @@ TEST_CASE("LayerNorm Backward", "[layernorm][graph]") {
Surface<float> Dbias_tensor(hidden_size, false);
Surface<half> DX_tensor(batch_size * seq_length * hidden_size, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/norm/rmsnorm.cpp b/samples/cpp/norm/rmsnorm.cpp
index 878086c..d5c919b 100644
--- a/samples/cpp/norm/rmsnorm.cpp
+++ b/samples/cpp/norm/rmsnorm.cpp
@@ -78,7 +78,7 @@ TEST_CASE("RmsNorm Training", "[rmsnorm][graph]") {
Surface<float> Scale_tensor(hidden_size, false);
Surface<float> Y_tensor(batch_size * seq_length * hidden_size, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -150,7 +150,7 @@ TEST_CASE("RmsNorm Inference", "[rmsnorm][graph]") {
Surface<float> Bias_tensor(hidden_size, false);
Surface<float> Y_tensor(batch_size * seq_length * hidden_size, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -227,7 +227,7 @@ TEST_CASE("RmsNorm Backward", "[rmsnorm][graph]") {
Surface<float> Dbias_tensor(hidden_size, false);
Surface<float> DX_tensor(batch_size * seq_length * hidden_size, false);
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/sdpa/fp16_bwd.cpp b/samples/cpp/sdpa/fp16_bwd.cpp
index 749cbed..1145008 100644
--- a/samples/cpp/sdpa/fp16_bwd.cpp
+++ b/samples/cpp/sdpa/fp16_bwd.cpp
@@ -275,7 +275,7 @@ TEST_CASE("Toy sdpa backward", "[graph][sdpa][flash][backward]") {
}
// Allocate workspace
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp b/samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp
index 62d6bb3..50205c3 100644
--- a/samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp
+++ b/samples/cpp/sdpa/fp16_bwd_with_flexible_graphs.cpp
@@ -195,7 +195,7 @@ TEST_CASE("Toy sdpa backward with flexible graph", "[graph][sdpa][flash][backwar
{DV_UID, dV_tensor.devPtr}};
// Allocate workspace
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/sdpa/fp16_cached.cpp b/samples/cpp/sdpa/fp16_cached.cpp
index d046271..4f0d3f8 100644
--- a/samples/cpp/sdpa/fp16_cached.cpp
+++ b/samples/cpp/sdpa/fp16_cached.cpp
@@ -146,7 +146,7 @@ TEST_CASE("Cached sdpa", "[graph][sdpa][flash]") {
{O_UID, o_tensor.devPtr},
{STATS_UID, stats_tensor.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(fwd_graph2->get_workspace_size(workspace_size).is_good());
Surface<int8_t> fwd_workspace(workspace_size, false);
diff --git a/samples/cpp/sdpa/fp16_fwd.cpp b/samples/cpp/sdpa/fp16_fwd.cpp
index b3acf5e..63697a1 100644
--- a/samples/cpp/sdpa/fp16_fwd.cpp
+++ b/samples/cpp/sdpa/fp16_fwd.cpp
@@ -210,7 +210,7 @@ TEST_CASE("Toy sdpa forward", "[graph][sdpa][flash][forward]") {
variant_pack[STATS_UID] = statsTensor.devPtr;
}
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp b/samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp
index 36cfba4..0cb9d2f 100644
--- a/samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp
+++ b/samples/cpp/sdpa/fp16_fwd_with_custom_dropout.cpp
@@ -178,7 +178,7 @@ TEST_CASE("Toy sdpa forward with dropout", "[graph][sdpa][flash][forward]") {
variant_pack[STATS_UID] = statsTensor.devPtr;
}
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp b/samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp
index 810de63..7d81afe 100644
--- a/samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp
+++ b/samples/cpp/sdpa/fp16_fwd_with_flexible_graphs.cpp
@@ -186,7 +186,7 @@ TEST_CASE("Toy sdpa forward with flexible graph", "[graph][sdpa][flash][forward]
variant_pack[STATS_UID] = statsTensor.devPtr;
}
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp b/samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp
index 18dd937..d195f6b 100644
--- a/samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp
+++ b/samples/cpp/sdpa/fp16_fwd_with_paged_caches.cpp
@@ -268,7 +268,7 @@ TEST_CASE("Toy sdpa forward with paged caches", "[graph][sdpa][flash][paged][for
variant_pack[STATS_UID] = statsTensor.devPtr;
}
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(graph->get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/sdpa/fp8_bwd.cpp b/samples/cpp/sdpa/fp8_bwd.cpp
index 82e542b..296f2f9 100644
--- a/samples/cpp/sdpa/fp8_bwd.cpp
+++ b/samples/cpp/sdpa/fp8_bwd.cpp
@@ -214,7 +214,7 @@ TEST_CASE("sdpa_fp8_bprop", "[graph][sdpa][fp8][backward]") {
{Amax_dV, AMax_dV_Tensor.devPtr},
{Amax_dP, AMax_dP_Tensor.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(mha_graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
@@ -385,7 +385,7 @@ TEST_CASE("sdpa_fp8_gqa_bprop", "[graph][sdpa][fp8][backward]") {
{amax_dV, amax_dV_gpu.devPtr},
{amax_dP, amax_dP_gpu.devPtr}};
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(mha_graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
diff --git a/samples/cpp/sdpa/fp8_fwd.cpp b/samples/cpp/sdpa/fp8_fwd.cpp
index 6ede98d..23abc3f 100644
--- a/samples/cpp/sdpa/fp8_fwd.cpp
+++ b/samples/cpp/sdpa/fp8_fwd.cpp
@@ -146,7 +146,7 @@ TEST_CASE("sdpa_fp8_fprop", "[graph][sdpa][fp8][forward]") {
variant_pack[Stats] = stats_tensor.devPtr;
}
- int64_t workspace_size;
+ int64_t workspace_size = 0;
REQUIRE(mha_graph.get_workspace_size(workspace_size).is_good());
Surface<int8_t> workspace(workspace_size, false);
--
2.47.0

View File

@@ -0,0 +1,133 @@
cmake_minimum_required(VERSION 3.23)
project(cudnn_frontend VERSION 1.8.0)
option(CUDNN_FRONTEND_SKIP_JSON_LIB "Defines whether FE should not include nlohmann/json.hpp." OFF)
option(CUDNN_FRONTEND_BUILD_SAMPLES "Defines if samples are built or not." ON)
option(CUDNN_FRONTEND_BUILD_TESTS "Defines if unittests are built or not." ON)
option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." OFF)
if(MSVC OR MSYS OR MINGW)
add_compile_options(/W4 /WX)
else()
add_compile_options(-Wall -Wextra -Wpedantic -Werror -Wno-error=attributes -Wno-attributes -Wno-error=unused-function -Wno-unused-function)
endif()
add_library(cudnn_frontend INTERFACE)
# Add header files to library
file(GLOB_RECURSE CUDNN_FRONTEND_INCLUDE_FILES "include/*")
target_sources(
cudnn_frontend
PUBLIC
FILE_SET
HEADERS
BASE_DIRS
"$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
FILES
"${CUDNN_FRONTEND_INCLUDE_FILES}"
)
unset(CUDNN_FRONTEND_INCLUDE_FILES)
target_compile_definitions(cudnn_frontend INTERFACE $<$<BOOL:${CUDNN_FRONTEND_SKIP_JSON_LIB}>:CUDNN_FRONTEND_SKIP_JSON_LIB>)
target_include_directories(
cudnn_frontend
INTERFACE
"$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
"$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>"
)
# Find the cuda compiler
find_package(CUDAToolkit REQUIRED)
target_include_directories(cudnn_frontend INTERFACE ${CUDAToolkit_INCLUDE_DIRS})
target_compile_features(cudnn_frontend INTERFACE cxx_std_17)
# Make PCH for targets to link against
add_library(_cudnn_frontend_pch INTERFACE)
target_precompile_headers(_cudnn_frontend_pch INTERFACE ${PROJECT_SOURCE_DIR}/include/cudnn_frontend.h)
if (CUDNN_FRONTEND_BUILD_SAMPLES)
add_subdirectory(samples)
target_link_libraries(
samples
PRIVATE
CUDA::cublasLt
CUDA::nvrtc
)
target_link_libraries(
legacy_samples
PRIVATE
CUDA::cublasLt
CUDA::nvrtc
)
endif()
if (CUDNN_FRONTEND_BUILD_TESTS)
add_subdirectory(test)
target_link_libraries(
tests
CUDA::cublasLt
CUDA::nvrtc
)
endif()
if (CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS)
add_subdirectory(python)
endif()
# Introduce variables:
# * CMAKE_INSTALL_LIBDIR
# * CMAKE_INSTALL_BINDIR
# * CMAKE_INSTALL_INCLUDEDIR
include(GNUInstallDirs)
# Install and export the header files
install(
TARGETS
cudnn_frontend
EXPORT
cudnn_frontend_targets
FILE_SET HEADERS
)
if (CUDNN_FRONTEND_BUILD_SAMPLES)
install(TARGETS legacy_samples samples RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
endif()
if (CUDNN_FRONTEND_BUILD_TESTS)
install(TARGETS tests RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
endif()
# See https://cmake.org/cmake/help/latest/module/CMakePackageConfigHelpers.html#example-generating-package-files
include(CMakePackageConfigHelpers)
export(
EXPORT
cudnn_frontend_targets
FILE
"${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend/cudnn_frontend-targets.cmake"
)
install(
EXPORT
cudnn_frontend_targets
FILE
cudnn_frontend-targets.cmake
DESTINATION
"${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
)
configure_package_config_file(
cudnn_frontend-config.cmake.in
"${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend-config.cmake"
INSTALL_DESTINATION
"${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
)
install(
FILES
"${CMAKE_CURRENT_BINARY_DIR}/cudnn_frontend-config.cmake"
DESTINATION
"${CMAKE_INSTALL_LIBDIR}/cmake/cudnn_frontend"
)

View File

@@ -0,0 +1,3 @@
@PACKAGE_INIT@
include(${CMAKE_CURRENT_LIST_DIR}/cudnn_frontend-targets.cmake)

View File

@@ -0,0 +1,132 @@
{
autoAddDriverRunpath,
catch2_3,
cmake,
fetchFromGitHub,
gitUpdater,
lib,
ninja,
nlohmann_json,
stdenv,
cuda_cccl ? null,
cuda_cudart ? null,
cuda_nvcc ? null,
cuda_nvrtc ? null,
cudnn ? null,
libcublas ? null,
}:
let
inherit (lib.lists) optionals;
inherit (lib.strings)
cmakeBool
cmakeFeature
optionalString
;
in
# TODO(@connorbaker): This should be a hybrid C++/Python package.
stdenv.mkDerivation (finalAttrs: {
pname = "cudnn-frontend";
version = "1.9.0";
src = fetchFromGitHub {
owner = "NVIDIA";
repo = "cudnn-frontend";
tag = "v${finalAttrs.version}";
hash = "sha256-Vc5jqB1XHcJEdKG0nxbWLewW2fDezRVwjUSzPDubSGE=";
};
patches = [
# https://github.com/NVIDIA/cudnn-frontend/pull/125
./0001-cmake-float-out-common-python-bindings-option.patch
./0002-cmake-add-config-so-headers-can-be-discovered-when-i.patch
./0003-cmake-install-samples-and-tests-when-built.patch
./0004-samples-fix-instances-of-maybe-uninitialized.patch
];
# nlohmann_json should be the only vendored dependency.
postPatch = ''
echo "patching source to use nlohmann_json from nixpkgs"
rm -rf include/cudnn_frontend/thirdparty/nlohmann
rmdir include/cudnn_frontend/thirdparty
substituteInPlace include/cudnn_frontend_utils.h \
--replace-fail \
'#include "cudnn_frontend/thirdparty/nlohmann/json.hpp"' \
'#include <nlohmann/json.hpp>'
'';
# TODO: As a header-only library, we should make sure we have an `include` directory or similar which is not a
# superset of the `out` (`bin`) or `dev` outputs (which is what the multiple-outputs setup hook does by default).
outputs = [
"out"
]
++ optionals finalAttrs.doCheck [
"legacy_samples"
"samples"
"tests"
];
nativeBuildInputs = [
autoAddDriverRunpath # Needed for samples because it links against CUDA::cuda_driver
cmake
cuda_nvcc
ninja
];
buildInputs = [
cuda_cccl
cuda_cudart
];
cmakeFlags = [
(cmakeBool "FETCHCONTENT_FULLY_DISCONNECTED" true)
(cmakeFeature "FETCHCONTENT_TRY_FIND_PACKAGE_MODE" "ALWAYS")
(cmakeBool "CUDNN_FRONTEND_BUILD_SAMPLES" finalAttrs.doCheck)
(cmakeBool "CUDNN_FRONTEND_BUILD_TESTS" finalAttrs.doCheck)
(cmakeBool "CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS" false)
];
checkInputs = [
cudnn
cuda_nvrtc
catch2_3
libcublas
];
enableParallelBuilding = true;
propagatedBuildInputs = [
nlohmann_json
];
doCheck = true;
postInstall = optionalString finalAttrs.doCheck ''
moveToOutput "bin/legacy_samples" "$legacy_samples"
moveToOutput "bin/samples" "$samples"
moveToOutput "bin/tests" "$tests"
if [[ -e "$out/bin" ]]
then
nixErrorLog "The bin directory in \$out should no longer exist."
exit 1
fi
'';
passthru.updateScript = gitUpdater {
inherit (finalAttrs) pname version;
rev-prefix = "v";
};
meta = {
description = "A c++ wrapper for the cudnn backend API";
homepage = "https://github.com/NVIDIA/cudnn-frontend";
license = lib.licenses.mit;
badPlatforms = optionals (cudnn == null) finalAttrs.meta.platforms;
platforms = [
"aarch64-linux"
"x86_64-linux"
];
maintainers = with lib.maintainers; [ connorbaker ];
teams = [ lib.teams.cuda ];
};
})

View File

@@ -0,0 +1,25 @@
# shellcheck shell=bash
(( ${hostOffset:?} == -1 && ${targetOffset:?} == 0)) || return 0
echo "Sourcing mark-for-cudatoolkit-root-hook" >&2
markForCUDAToolkit_ROOT() {
mkdir -p "${prefix:?}/nix-support"
local markerPath="$prefix/nix-support/include-in-cudatoolkit-root"
# Return early if the file already exists.
[[ -f "$markerPath" ]] && return 0
# Always create the file, even if it's empty, since setup-cuda-hook relies on its existence.
# However, only populate it if strictDeps is not set.
touch "$markerPath"
# Return early if strictDeps is set.
[[ -n "${strictDeps-}" ]] && return 0
# Populate the file with the package name and output.
echo "${pname:?}-${output:?}" > "$markerPath"
}
fixupOutputHooks+=(markForCUDAToolkit_ROOT)

View File

@@ -0,0 +1,4 @@
# Internal hook, used by cudatoolkit and cuda redist packages
# to accommodate automatic CUDAToolkit_ROOT construction
{ makeSetupHook }:
makeSetupHook { name = "mark-for-cudatoolkit-root-hook"; } ./mark-for-cudatoolkit-root-hook.sh

View File

@@ -0,0 +1,83 @@
# NOTE: Though NCCL tests is called within the cudaPackages package set, we avoid passing in
# the names of dependencies from that package set directly to avoid evaluation errors
# in the case redistributable packages are not available.
{
config,
cudaPackages,
fetchFromGitHub,
gitUpdater,
lib,
mpi,
mpiSupport ? false,
which,
}:
let
inherit (cudaPackages)
backendStdenv
cuda_cccl
cuda_cudart
cuda_nvcc
cudaAtLeast
nccl
;
in
backendStdenv.mkDerivation (finalAttrs: {
pname = "nccl-tests";
version = "2.15.0";
src = fetchFromGitHub {
owner = "NVIDIA";
repo = "nccl-tests";
rev = "v${finalAttrs.version}";
hash = "sha256-OgffbW9Vx/sm1I1tpaPGdAhIpV4jbB4hJa9UcEAWkdE=";
};
postPatch = ''
# fix build failure with GCC14
substituteInPlace src/Makefile --replace-fail "-std=c++11" "-std=c++14"
'';
strictDeps = true;
nativeBuildInputs = [
which
cuda_nvcc
];
buildInputs = [
nccl
cuda_nvcc # crt/host_config.h
cuda_cudart
cuda_cccl # <nv/target>
]
++ lib.optionals mpiSupport [ mpi ];
makeFlags = [
"NCCL_HOME=${nccl}"
"CUDA_HOME=${cuda_nvcc}"
]
++ lib.optionals mpiSupport [ "MPI=1" ];
enableParallelBuilding = true;
installPhase = ''
mkdir -p $out/bin
cp -r build/* $out/bin/
'';
passthru.updateScript = gitUpdater {
inherit (finalAttrs) pname version;
rev-prefix = "v";
};
meta = with lib; {
description = "Tests to check both the performance and the correctness of NVIDIA NCCL operations";
homepage = "https://github.com/NVIDIA/nccl-tests";
platforms = platforms.linux;
license = licenses.bsd3;
broken = !config.cudaSupport || (mpiSupport && mpi == null);
maintainers = with maintainers; [ jmillerpdt ];
teams = [ teams.cuda ];
};
})

View File

@@ -0,0 +1,98 @@
# NOTE: Though NCCL is called within the cudaPackages package set, we avoid passing in
# the names of dependencies from that package set directly to avoid evaluation errors
# in the case redistributable packages are not available.
{
lib,
fetchFromGitHub,
python3,
which,
autoAddDriverRunpath,
cudaPackages,
# passthru.updateScript
gitUpdater,
}:
let
inherit (cudaPackages)
backendStdenv
cuda_cccl
cuda_cudart
cuda_nvcc
cudaAtLeast
flags
;
version = "2.27.6-1";
hash = "sha256-/BiLSZaBbVIqOfd8nQlgUJub0YR3SR4B93x2vZpkeiU=";
in
backendStdenv.mkDerivation (finalAttrs: {
pname = "nccl";
version = version;
src = fetchFromGitHub {
owner = "NVIDIA";
repo = "nccl";
rev = "v${finalAttrs.version}";
hash = hash;
};
__structuredAttrs = true;
strictDeps = true;
outputs = [
"out"
"dev"
];
nativeBuildInputs = [
which
autoAddDriverRunpath
python3
cuda_nvcc
];
buildInputs = [
cuda_nvcc # crt/host_config.h
cuda_cudart
cuda_cccl
];
env.NIX_CFLAGS_COMPILE = toString [ "-Wno-unused-function" ];
postPatch = ''
patchShebangs ./src/device/generate.py
patchShebangs ./src/device/symmetric/generate.py
'';
makeFlags = [
"PREFIX=$(out)"
"NVCC_GENCODE=${flags.gencodeString}"
"CUDA_HOME=${cuda_nvcc}"
"CUDA_LIB=${lib.getLib cuda_cudart}/lib"
"CUDA_INC=${lib.getDev cuda_cudart}/include"
];
enableParallelBuilding = true;
postFixup = ''
moveToOutput lib/libnccl_static.a $dev
'';
passthru.updateScript = gitUpdater {
inherit (finalAttrs) pname version;
rev-prefix = "v";
};
meta = with lib; {
description = "Multi-GPU and multi-node collective communication primitives for NVIDIA GPUs";
homepage = "https://developer.nvidia.com/nccl";
license = licenses.bsd3;
platforms = platforms.linux;
# NCCL is not supported on Jetson, because it does not use NVLink or PCI-e for inter-GPU communication.
# https://forums.developer.nvidia.com/t/can-jetson-orin-support-nccl/232845/9
badPlatforms = lib.optionals flags.isJetsonBuild [ "aarch64-linux" ];
maintainers = with maintainers; [
mdaiter
orivej
];
teams = [ teams.cuda ];
};
})

View File

@@ -0,0 +1,63 @@
{
autoAddDriverRunpath,
cmake,
cudaPackages,
lib,
saxpy,
}:
let
inherit (cudaPackages)
backendStdenv
cuda_cccl
cuda_cudart
cuda_nvcc
cudaAtLeast
flags
libcublas
;
inherit (lib) getDev getLib getOutput;
in
backendStdenv.mkDerivation {
pname = "saxpy";
version = "unstable-2023-07-11";
src = ./src;
__structuredAttrs = true;
strictDeps = true;
nativeBuildInputs = [
cmake
autoAddDriverRunpath
cuda_nvcc
];
buildInputs = [
(getDev libcublas)
(getLib libcublas)
(getOutput "static" libcublas)
cuda_cudart
cuda_cccl
];
cmakeFlags = [
(lib.cmakeBool "CMAKE_VERBOSE_MAKEFILE" true)
(lib.cmakeFeature "CMAKE_CUDA_ARCHITECTURES" flags.cmakeCudaArchitecturesString)
];
passthru.gpuCheck = saxpy.overrideAttrs (_: {
requiredSystemFeatures = [ "cuda" ];
doInstallCheck = true;
postInstallCheck = ''
$out/bin/${saxpy.meta.mainProgram or (lib.getName saxpy)}
'';
});
meta = {
description = "Simple (Single-precision AX Plus Y) FindCUDAToolkit.cmake example for testing cross-compilation";
license = lib.licenses.mit;
teams = [ lib.teams.cuda ];
mainProgram = "saxpy";
platforms = lib.platforms.unix;
};
}

View File

@@ -0,0 +1,12 @@
cmake_minimum_required(VERSION 3.25)
project(saxpy LANGUAGES CXX CUDA)
find_package(CUDAToolkit REQUIRED COMPONENTS cudart cublas)
add_executable(saxpy saxpy.cu)
target_link_libraries(saxpy PUBLIC CUDA::cublas CUDA::cudart m)
target_compile_features(saxpy PRIVATE cxx_std_14)
target_compile_options(saxpy PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:
--expt-relaxed-constexpr>)
install(TARGETS saxpy)

View File

@@ -0,0 +1,68 @@
#include <cublas_v2.h>
#include <cuda_runtime.h>
#include <vector>
#include <stdio.h>
static inline void check(cudaError_t err, const char *context) {
if (err != cudaSuccess) {
fprintf(stderr, "CUDA error at %s: %s\n", context, cudaGetErrorString(err));
std::exit(EXIT_FAILURE);
}
}
#define CHECK(x) check(x, #x)
__global__ void saxpy(int n, float a, float *x, float *y) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n)
y[i] = a * x[i] + y[i];
}
int main(void) {
setbuf(stderr, NULL);
fprintf(stderr, "Start\n");
int rtVersion, driverVersion;
CHECK(cudaRuntimeGetVersion(&rtVersion));
CHECK(cudaDriverGetVersion(&driverVersion));
fprintf(stderr, "Runtime version: %d\n", rtVersion);
fprintf(stderr, "Driver version: %d\n", driverVersion);
constexpr int N = 1 << 10;
std::vector<float> xHost(N), yHost(N);
for (int i = 0; i < N; i++) {
xHost[i] = 1.0f;
yHost[i] = 2.0f;
}
fprintf(stderr, "Host memory initialized, copying to the device\n");
fflush(stderr);
float *xDevice, *yDevice;
CHECK(cudaMalloc(&xDevice, N * sizeof(float)));
CHECK(cudaMalloc(&yDevice, N * sizeof(float)));
CHECK(cudaMemcpy(xDevice, xHost.data(), N * sizeof(float),
cudaMemcpyHostToDevice));
CHECK(cudaMemcpy(yDevice, yHost.data(), N * sizeof(float),
cudaMemcpyHostToDevice));
fprintf(stderr, "Scheduled a cudaMemcpy, calling the kernel\n");
saxpy<<<(N + 255) / 256, 256>>>(N, 2.0f, xDevice, yDevice);
fprintf(stderr, "Scheduled a kernel call\n");
CHECK(cudaGetLastError());
CHECK(cudaMemcpy(yHost.data(), yDevice, N * sizeof(float),
cudaMemcpyDeviceToHost));
float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = max(maxError, abs(yHost[i] - 4.0f));
fprintf(stderr, "Max error: %f\n", maxError);
CHECK(cudaFree(xDevice));
CHECK(cudaFree(yDevice));
}

View File

@@ -0,0 +1,14 @@
# Currently propagated by cuda_nvcc or cudatoolkit, rather than used directly
{ makeSetupHook, backendStdenv }:
makeSetupHook {
name = "setup-cuda-hook";
substitutions.setupCudaHook = placeholder "out";
# Point NVCC at a compatible compiler
substitutions.ccRoot = "${backendStdenv.cc}";
# Required in addition to ccRoot as otherwise bin/gcc is looked up
# when building CMakeCUDACompilerId.cu
substitutions.ccFullPath = "${backendStdenv.cc}/bin/${backendStdenv.cc.targetPrefix}c++";
} ./setup-cuda-hook.sh

View File

@@ -0,0 +1,128 @@
# shellcheck shell=bash
# Only run the hook from nativeBuildInputs
(( "$hostOffset" == -1 && "$targetOffset" == 0)) || return 0
guard=Sourcing
reason=
[[ -n ${cudaSetupHookOnce-} ]] && guard=Skipping && reason=" because the hook has been propagated more than once"
if (( "${NIX_DEBUG:-0}" >= 1 )) ; then
echo "$guard hostOffset=$hostOffset targetOffset=$targetOffset setup-cuda-hook$reason" >&2
else
echo "$guard setup-cuda-hook$reason" >&2
fi
[[ "$guard" = Sourcing ]] || return 0
declare -g cudaSetupHookOnce=1
declare -Ag cudaHostPathsSeen=()
declare -Ag cudaOutputToPath=()
extendcudaHostPathsSeen() {
(( "${NIX_DEBUG:-0}" >= 1 )) && echo "extendcudaHostPathsSeen $1" >&2
local markerPath="$1/nix-support/include-in-cudatoolkit-root"
[[ ! -f "${markerPath}" ]] && return 0
[[ -v cudaHostPathsSeen[$1] ]] && return 0
cudaHostPathsSeen["$1"]=1
# E.g. cuda_cudart-lib
local cudaOutputName
# Fail gracefully if the file is empty.
# One reason the file may be empty: the package was built with strictDeps set, but the current build does not have
# strictDeps set.
read -r cudaOutputName < "$markerPath" || return 0
[[ -z "$cudaOutputName" ]] && return 0
local oldPath="${cudaOutputToPath[$cudaOutputName]-}"
[[ -n "$oldPath" ]] && echo "extendcudaHostPathsSeen: warning: overwriting $cudaOutputName from $oldPath to $1" >&2
cudaOutputToPath["$cudaOutputName"]="$1"
}
addEnvHooks "$targetOffset" extendcudaHostPathsSeen
setupCUDAToolkit_ROOT() {
(( "${NIX_DEBUG:-0}" >= 1 )) && echo "setupCUDAToolkit_ROOT: cudaHostPathsSeen=${!cudaHostPathsSeen[*]}" >&2
for path in "${!cudaHostPathsSeen[@]}" ; do
addToSearchPathWithCustomDelimiter ";" CUDAToolkit_ROOT "$path"
if [[ -d "$path/include" ]] ; then
addToSearchPathWithCustomDelimiter ";" CUDAToolkit_INCLUDE_DIR "$path/include"
fi
done
# Use array form so semicolon-separated lists are passed safely.
if [[ -n "${CUDAToolkit_INCLUDE_DIR-}" ]]; then
cmakeFlagsArray+=("-DCUDAToolkit_INCLUDE_DIR=${CUDAToolkit_INCLUDE_DIR}")
fi
if [[ -n "${CUDAToolkit_ROOT-}" ]]; then
cmakeFlagsArray+=("-DCUDAToolkit_ROOT=${CUDAToolkit_ROOT}")
fi
}
preConfigureHooks+=(setupCUDAToolkit_ROOT)
setupCUDAToolkitCompilers() {
echo Executing setupCUDAToolkitCompilers >&2
if [[ -n "${dontSetupCUDAToolkitCompilers-}" ]] ; then
return 0
fi
# Point NVCC at a compatible compiler
# For CMake-based projects:
# https://cmake.org/cmake/help/latest/module/FindCUDA.html#input-variables
# https://cmake.org/cmake/help/latest/envvar/CUDAHOSTCXX.html
# https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_HOST_COMPILER.html
appendToVar cmakeFlags "-DCUDA_HOST_COMPILER=@ccFullPath@"
appendToVar cmakeFlags "-DCMAKE_CUDA_HOST_COMPILER=@ccFullPath@"
# For non-CMake projects:
# We prepend --compiler-bindir to nvcc flags.
# Downstream packages can override these, because NVCC
# uses the last --compiler-bindir it gets on the command line.
# FIXME: this results in "incompatible redefinition" warnings.
# https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#compiler-bindir-directory-ccbin
if [ -z "${CUDAHOSTCXX-}" ]; then
export CUDAHOSTCXX="@ccFullPath@";
fi
appendToVar NVCC_PREPEND_FLAGS "--compiler-bindir=@ccRoot@/bin"
# NOTE: We set -Xfatbin=-compress-all, which reduces the size of the compiled
# binaries. If binaries grow over 2GB, they will fail to link. This is a problem for us, as
# the default set of CUDA capabilities we build can regularly cause this to occur (for
# example, with Magma).
#
# @SomeoneSerge: original comment was made by @ConnorBaker in .../cudatoolkit/common.nix
if [[ -z "${dontCompressFatbin-}" ]]; then
appendToVar NVCC_PREPEND_FLAGS "-Xfatbin=-compress-all"
fi
}
preConfigureHooks+=(setupCUDAToolkitCompilers)
propagateCudaLibraries() {
(( "${NIX_DEBUG:-0}" >= 1 )) && echo "propagateCudaLibraries: cudaPropagateToOutput=$cudaPropagateToOutput cudaHostPathsSeen=${!cudaHostPathsSeen[*]}" >&2
[[ -z "${cudaPropagateToOutput-}" ]] && return 0
mkdir -p "${!cudaPropagateToOutput}/nix-support"
# One'd expect this should be propagated-bulid-build-deps, but that doesn't seem to work
echo "@setupCudaHook@" >> "${!cudaPropagateToOutput}/nix-support/propagated-native-build-inputs"
local propagatedBuildInputs=( "${!cudaHostPathsSeen[@]}" )
for output in $(getAllOutputNames) ; do
if [[ ! "$output" = "$cudaPropagateToOutput" ]] ; then
appendToVar propagatedBuildInputs "${!output}"
fi
break
done
# One'd expect this should be propagated-host-host-deps, but that doesn't seem to work
printWords "${propagatedBuildInputs[@]}" >> "${!cudaPropagateToOutput}/nix-support/propagated-build-inputs"
}
postFixupHooks+=(propagateCudaLibraries)

View File

@@ -0,0 +1,77 @@
{
lib,
runCommand,
python3Packages,
makeWrapper,
writableTmpDirAsHomeHook,
}:
{
feature ? "cuda",
name ? if feature == null then "cpu" else feature,
libraries ? [ ], # [PythonPackage] | (PackageSet -> [PythonPackage])
gpuCheckArgs ? { },
...
}@args:
let
inherit (builtins) isFunction all;
librariesFun = if isFunction libraries then libraries else (_: libraries);
in
assert lib.assertMsg (
isFunction libraries || all (python3Packages.hasPythonModule) libraries
) "writeGpuTestPython was passed `libraries` from the wrong python release";
content:
let
interpreter = python3Packages.python.withPackages librariesFun;
tester =
runCommand "tester-${name}"
(
lib.removeAttrs args [
"gpuCheckArgs"
"libraries"
"name"
]
// {
inherit content;
nativeBuildInputs = args.nativeBuildInputs or [ ] ++ [ makeWrapper ];
passAsFile = args.passAsFile or [ ] ++ [ "content" ];
}
)
''
mkdir -p "$out"/bin
cat << EOF >"$out/bin/$name"
#!${lib.getExe interpreter}
EOF
cat "$contentPath" >>"$out/bin/$name"
chmod +x "$out/bin/$name"
if [[ -n "''${makeWrapperArgs+''${makeWrapperArgs[@]}}" ]] ; then
wrapProgram "$out/bin/$name" ''${makeWrapperArgs[@]}
fi
'';
tester' = tester.overrideAttrs (oldAttrs: {
passthru.gpuCheck =
runCommand "test-${name}"
(
gpuCheckArgs
// {
nativeBuildInputs = [
tester'
]
++ gpuCheckArgs.nativeBuildInputs or [ ];
requiredSystemFeatures =
lib.optionals (feature != null) [ feature ] ++ gpuCheckArgs.requiredSystemFeatures or [ ];
}
)
''
set -e
${tester.meta.mainProgram or (lib.getName tester')}
touch $out
'';
});
in
tester'

View File

@@ -0,0 +1,50 @@
# NOTE: Check https://developer.nvidia.com/nvidia-tensorrt-8x-download
# https://developer.nvidia.com/nvidia-tensorrt-10x-download
# Version policy is to keep the latest minor release for each major release.
{
tensorrt.releases = {
# jetson
linux-aarch64 = [ ];
# powerpc
linux-ppc64le = [ ];
# server-grade arm
linux-sbsa = [
{
version = "10.8.0.43";
minCudaVersion = "12.8";
maxCudaVersion = "12.8";
cudnnVersion = "9.7";
filename = "TensorRT-10.8.0.43.Linux.aarch64-gnu.cuda-12.8.tar.gz";
hash = "sha256-sB5d0sfGQyUhGdA9ku6pcCNBjpL0Wjvg0Ilulikj5Do=";
}
{
version = "10.9.0.34";
minCudaVersion = "12.8";
maxCudaVersion = "12.8";
cudnnVersion = "9.7";
filename = "TensorRT-10.9.0.34.Linux.aarch64-gnu.cuda-12.8.tar.gz";
hash = "sha256-uB7CoGf2fwgsE8rsLc71Q4W0Kp3mpOyubzGKotQZZPI=";
}
];
# x86_64
linux-x86_64 = [
{
version = "10.8.0.43";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
cudnnVersion = "9.7";
filename = "TensorRT-10.8.0.43.Linux.x86_64-gnu.cuda-12.8.tar.gz";
hash = "sha256-V31tivU4FTQUuYZ8ZmtPZYUvwusefA6jogbl+vvH1J4=";
}
{
version = "10.9.0.34";
minCudaVersion = "12.0";
maxCudaVersion = "12.8";
cudnnVersion = "9.7";
filename = "TensorRT-10.9.0.34.Linux.x86_64-gnu.cuda-12.8.tar.gz";
hash = "sha256-M74OYeO/F3u7yrtIkr8BPwyKxx0r5z8oA4SKOCyxQnI=";
}
];
};
}

View File

@@ -0,0 +1,24 @@
# Shims to mimic the shape of ../modules/generic/manifests/{feature,redistrib}/release.nix
{
package,
# redistSystem :: String
# String is `"unsupported"` if the given architecture is unsupported.
redistSystem,
}:
{
featureRelease = {
inherit (package) cudnnVersion minCudaVersion maxCudaVersion;
${redistSystem}.outputs = {
bin = true;
lib = true;
static = true;
dev = true;
sample = true;
python = true;
};
};
redistribRelease = {
name = "TensorRT: a high-performance deep learning interface";
inherit (package) hash filename version;
};
}

View File

@@ -0,0 +1,79 @@
{
_cuda,
cudaNamePrefix,
lib,
runCommand,
}:
let
inherit (builtins) deepSeq toJSON tryEval;
inherit (_cuda.bootstrapData) cudaCapabilityToInfo;
inherit (_cuda.lib) formatCapabilities;
inherit (lib.asserts) assertMsg;
in
# When changing names or formats: pause, validate, and update the assert
assert assertMsg (
cudaCapabilityToInfo ? "7.5" && cudaCapabilityToInfo ? "8.6"
) "The following test requires both 7.5 and 8.6 be known CUDA capabilities";
assert
let
expected = {
cudaCapabilities = [
"7.5"
"8.6"
];
cudaForwardCompat = true;
# Sorted alphabetically
archNames = [
"Ampere"
"Turing"
];
realArches = [
"sm_75"
"sm_86"
];
virtualArches = [
"compute_75"
"compute_86"
];
arches = [
"sm_75"
"sm_86"
"compute_86"
];
gencode = [
"-gencode=arch=compute_75,code=sm_75"
"-gencode=arch=compute_86,code=sm_86"
"-gencode=arch=compute_86,code=compute_86"
];
gencodeString = "-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86";
cmakeCudaArchitecturesString = "75;86";
};
actual = formatCapabilities {
inherit cudaCapabilityToInfo;
cudaCapabilities = [
"7.5"
"8.6"
];
cudaForwardCompat = true;
};
actualWrapped = (tryEval (deepSeq actual actual)).value;
in
assertMsg (expected == actualWrapped) ''
Expected: ${toJSON expected}
Actual: ${toJSON actualWrapped}
'';
runCommand "${cudaNamePrefix}-tests-flags"
{
__structuredAttrs = true;
strictDeps = true;
}
''
touch "$out"
''

View File

@@ -0,0 +1,81 @@
{
cudaPackages,
lib,
writeGpuTestPython,
# Configuration flags
openCVFirst,
useOpenCVDefaultCuda,
useTorchDefaultCuda,
}:
let
inherit (lib.strings) optionalString;
openCVBlock = ''
import cv2
print("OpenCV version:", cv2.__version__)
# Ensure OpenCV can access the GPU.
assert cv2.cuda.getCudaEnabledDeviceCount() > 0, "No CUDA devices found for OpenCV"
print("OpenCV CUDA device:", cv2.cuda.printCudaDeviceInfo(cv2.cuda.getDevice()))
# Ensure OpenCV can access the GPU.
print(cv2.getBuildInformation())
a = cv2.cuda.GpuMat(size=(256, 256), type=cv2.CV_32S, s=1)
b = cv2.cuda.GpuMat(size=(256, 256), type=cv2.CV_32S, s=1)
c = int(cv2.cuda.sum(cv2.cuda.add(a, b))[0]) # OpenCV returns a Scalar float object.
assert c == 2 * 256 * 256, f"Expected {2 * 256 * 256} OpenCV, got {c}"
'';
torchBlock = ''
import torch
print("Torch version:", torch.__version__)
# Set up the GPU.
torch.cuda.init()
# Ensure the GPU is available.
assert torch.cuda.is_available(), "CUDA is not available to Torch"
print("Torch CUDA device:", torch.cuda.get_device_properties(torch.cuda.current_device()))
a = torch.ones(256, 256, dtype=torch.int32).cuda()
b = torch.ones(256, 256, dtype=torch.int32).cuda()
c = (a + b).sum().item()
assert c == 2 * 256 * 256, f"Expected {2 * 256 * 256} for Torch, got {c}"
'';
content = if openCVFirst then openCVBlock + torchBlock else torchBlock + openCVBlock;
torchName = "torch" + optionalString useTorchDefaultCuda "-with-default-cuda";
openCVName = "opencv4" + optionalString useOpenCVDefaultCuda "-with-default-cuda";
in
# TODO: Ensure the expected CUDA libraries are loaded.
# TODO: Ensure GPU access works as expected.
writeGpuTestPython {
name = if openCVFirst then "${openCVName}-then-${torchName}" else "${torchName}-then-${openCVName}";
libraries =
# NOTE: These are purposefully in this order.
pythonPackages:
let
effectiveOpenCV = pythonPackages.opencv4.override (prevAttrs: {
cudaPackages = if useOpenCVDefaultCuda then prevAttrs.cudaPackages else cudaPackages;
});
effectiveTorch = pythonPackages.torchWithCuda.override (prevAttrs: {
cudaPackages = if useTorchDefaultCuda then prevAttrs.cudaPackages else cudaPackages;
});
in
if openCVFirst then
[
effectiveOpenCV
effectiveTorch
]
else
[
effectiveTorch
effectiveOpenCV
];
} content