[0020] - SPIR-V Variable address space

Status	Design In Progress
Author	Nathan Gauër

Introduction

From the HLSL spec:

HLSL programs manipulates data stored in four distinct memory spaces: thread, threadgroup, device and constant.

Those four groups represents the user-facing semantic, and the group this proposal will focus on is thread. Following this model, a function local variable and a static global variable share the same address space.

On the logical SPIR-V side, variables are attached to a storage class. This is a different name to represent the same thing: an address space.

A pointer to one storage class is incompatible with a pointer to another.

This proposal will use address space when speaking in HLSL/LLVM-IR terms, and storage class when speaking in SPIR-V terms. We will not mention C/HLSL style storage classes (static, volatile, etc).

SPIR-V has 2 interesting storage classes:

Function
Private A variable declared with the Function storage class must be declared in the first basic block of a function. It is normaly used to represent function local variables.

A variable declared with the Private storage class is private to the current invocation/thread, but belongs to the global scope. This would be the equivalent of a static global variable in HLSL.

Reconciliating the SPIR-V & HLSL side could be done in two ways:

unify the storage classes in SPIR-V.
separate the address spaces in HLSL.

Implementing constant buffers & other resources is done by creating new address spaces, making explicit the constraints some allocations have. Thus, it seems separating the address spaces for globals & locals would allow us to stay consistent with the rest of the language.

HLSL patterns to look for

This section will explain why some HLSL patterns are hard to lower to SPIR-V.

Note: HLSL does not implement references yet, but we have to make sure our design would allow us to implement them. For this reason, we’ll assume HLSL has references.

Example 1:

static int a = 0;

void foo() {
  int b = 0;
}

a and b both share the same address space. But on the SPIR-V side, a must be a Private variable, while b must be a Function variable. This requires the lowering pass to know the context of a variable.

Example 2:

static int a = 0;

void foo() {
  int& ref = a;
  int b = ref;
}

a is still Private, b still Function. But ref points to a. In SPIR-V, a variable cannot store a pointer pointing to another storage class. This means ref cannot be stored in a variable in the Function class. If a is Private, ref could only be declared as Private.

Example 3:

static int global = 0;

int& foo(int& input, int select) {
  return select ? input : global;
}

void main(int select) {
  int local;
  int& res1 = foo(local, select);
  int& res2 = foo(global, select);
}

global is still Private. local is Function. In SPIR-V, function declarations contains the return and parameters types, including the storage classes. This means, depending on the call-site, and the value of select, the return value and parameter would required either the Function or the Private storage class. When this selection depends on a runtime condition, this cannot be lowered to SPIR-V as-is.

Proposed solution: using 2 HLSL address spaces

Thread-local, global variables will be put in the hlsl_private address space. Thread-local function-local variables will be put in the default address space.

Implementing the solution

A new address space will be added to Clang: hlsl_private. This address space will be mapped to Spirv::Private on the SPIR-V backend, PRIVATE_ADDRESS for AMDGPU, and the address space 0 in DXIL.

Clang codegen will add the new address space annotations, separating the private from the default.

For the time being, the private address space will be marked as a subset of the default address space, allowing overload resolution for class methods:

an object in the private address space will be allowed to use a method declared with a this in the default address space.

Clang will emit an addrspacecast we will have to handle, but that’s a known issue in address-space overload resolution, and not new to this proposal.

Alternative design considered

Force optimizations, and force inlining

One solution is to inline all function call, even those marked as noinline. Then replicate all instruction that use pointers so that the pointer operand has a single known address space. If those transformations were applied, we could avoid address-conflict mismatch for pointers, and all we’d have are direct load/stores to global/local variables. Functions returning incompatible references wouldn’t exist, allowing us to generate valid SPIR-V.

Note that those transformations can get arbitrarily complex. The number of copies is exponential in regards to the number of pointer operands.

Additionally:

HLSL allows using noinline: we would have to ignore it.
HLSL allows exporting functions to compile to a library: if we need to inline to generate functions, we cannot emit libraries exposing such functions.
Runtime conditions causing address-space conflict would require code duplication.
It makes reading the generated assembly harder.

Move all variables to the function scope

HLSL static globals have a known initialization value at compile-time. Meaning we could move the global variables to the entrypoint first basic block, as local variables. If SPIR-V has no global variables, all pointers as Function. This would require passing references to other functions referencing those globals, or inline them, but it would be possible.

But the blocker remains the same: building to a library function. If an exported function references a global variable, we cannot change the signature of the function.

Move all variables to the global scope

By moving all local variables to the global scope, we now have a single storage class Private, and won’t have conflict issues. This also allows us to compile non-optimized code, and to keep functions if required.

HLSL & SPIR-V disallow static recursion. Meaning we know at compile-time that each function requires one instance of each local variable. This would also work with exported functions: static recursion is still not allowed, so cross compile-units recursion is not an issue.

The main issue of this solution can have are:

drivers may have a harder time figuring out variable lifetimes.
SPIR-V has a hard 65536 global variable limit (vs 500k local variables).

I believe those 2 are not hard blockers, but something we need to be aware of.

Selectively move variables to the global scope.

If a variable is only loaded/stored from/to, and remains in the function scope, there should be no pointer incompatibility. This means we could potentially implement the solution 4, but only targeting variables for which addresses are moved across their function scope boundaries.

This would require additional IR analysis, as we would need to determine which address is used in another scope to recreate a global variable.

The motivation we could have for such solution are:

if drivers have a hard time optimizing the global variables.
if the global variable count limit becomes an issue.

Implementing this solution is more complex, and could be more error prone, so until we have a real need, I would recommend against, and moving forward with solution 4. If the need comes, moving from solution 4 to solution 5 would be possible, as it’s just an optimization on top.

Edit on GitHub