Compute shaders in ue5

In this gide we will explore how to run compute shader on UE5 without modifying the source engine.

DISCLAMER: this is still not finished and there are several parts that have to be improved, the ones that we know of will be marked in red. Also, we are still learning UE at the moment, so there are several steps that we are not sure why we have to do them, so feel free to share better ways of doing this with us at https://twitter.com/arnau_aguilar or https://twitter.com/ignasipelayo.

This guide's purpose is to guide other people that finds itself in the same position as us.


Here you can find the code: https://github.com/arnauaguilar/ComputeShader

This is only the module code, but the project has nothing more.

Introduction

Running a compute shader is complex and undocumented, and it usually requires modifying the source engine. We want to avoid doing so and to do it, we will need a separate module where we will encapsulate all of our code.

Refer to this page to learn more about modules:

Code

We will mainly work with two classes, one that will encapsulate all the dispatching logic, and another one that will represent the compute shader in unreal.
In our example, those will be named ForceField and FForceFieldCS respectively.

C++ ComputeShader

We will start with FForceFieldCS , this one illustrates much better why we need a module:

class FForceFieldCS : public FGlobalShader

{

DECLARE_GLOBAL_SHADER(FForceFieldCS)

SHADER_USE_PARAMETER_STRUCT(FForceFieldCS, FGlobalShader)

BEGIN_SHADER_PARAMETER_STRUCT(FParameters,)

SHADER_PARAMETER_UAV(RWTexture2D<float>, OutputTexture)

SHADER_PARAMETER(FVector2D, Dimensions)

SHADER_PARAMETER(UINT, TimeStamp)

END_SHADER_PARAMETER_STRUCT()


static bool ShouldCompilePermutation(const FGlobalShaderPermutationParameters& Parameters)

{

return IsFeatureLevelSupported(Parameters.Platform, ERHIFeatureLevel::SM5);

}


static inline void ModifyCompilationEnvironment(const FGlobalShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment)

{

FGlobalShader::ModifyCompilationEnvironment(Parameters, OutEnvironment);


//We're using it here to add some preprocessor defines. That way we don't have to change both C++ and HLSL code when we change the value for

NUM_THREADS_PER_GROUP_DIMENSION

OutEnvironment.SetDefine(TEXT("THREADGROUPSIZE_X"), NUM_THREADS_PER_GROUP_DIMENSION);

OutEnvironment.SetDefine(TEXT("THREADGROUPSIZE_Y"), NUM_THREADS_PER_GROUP_DIMENSION);

OutEnvironment.SetDefine(TEXT("THREADGROUPSIZE_Z"), 1);

}

};

IMPLEMENT_GLOBAL_SHADER(FForceFieldCS, "/CustomShaders/ForceFieldCS.usf", "MainCS", SF_Compute);

First of all, notice that the class inherits from : public FGlobalShader, which makes us implement the following functions:

Should compile permutations which is fairly descriptive

static bool ShouldCompilePermutation(const FGlobalShaderPermutationParameters& Parameters)

{

return IsFeatureLevelSupported(Parameters.Platform, ERHIFeatureLevel::SM5);

}

and ModifyCompilationEnvironment

static inline void ModifyCompilationEnvironment(const FGlobalShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment)

{

FGlobalShader::ModifyCompilationEnvironment(Parameters, OutEnvironment);


//We're using it here to add some preprocessor defines. That way we don't have to change both C++ and HLSL code when we change the value for

NUM_THREADS_PER_GROUP_DIMENSION

OutEnvironment.SetDefine(TEXT("THREADGROUPSIZE_X"), NUM_THREADS_PER_GROUP_DIMENSION);

OutEnvironment.SetDefine(TEXT("THREADGROUPSIZE_Y"), NUM_THREADS_PER_GROUP_DIMENSION);

OutEnvironment.SetDefine(TEXT("THREADGROUPSIZE_Z"), 1);

}

};

We found it useful to declare our preprocessor defines here


But the most important part of this class is the following

DECLARE_GLOBAL_SHADER(FForceFieldCS)

SHADER_USE_PARAMETER_STRUCT(FForceFieldCS, FGlobalShader)

BEGIN_SHADER_PARAMETER_STRUCT(FParameters,)

SHADER_PARAMETER_UAV(RWTexture2D<float>, OutputTexture)

SHADER_PARAMETER(FVector2D, Dimensions)

SHADER_PARAMETER(UINT, TimeStamp)

END_SHADER_PARAMETER_STRUCT()


Here we are telling unreal that this class will be our representation of a shader DECLARE_GLOBAL_SHADER(FForceFieldCS), and then telling it that it contains a parameter struct SHADER_USE_PARAMETER_STRUCT(FForceFieldCS, FGlobalShader). This parameter struct needs to match the parameters on the usf compute shader file, in this case:

RWTexture2D<float> OutputTexture;

float2 Dimensions;

uint TimeStamp;

With this we have successfully created a c++ representation of our shader but we have not binded that representation to the actual usf file, to do so we need to add this macro somewhere in our code

IMPLEMENT_GLOBAL_SHADER(FForceFieldCS, "/CustomShaders/ForceFieldCS.usf", "MainCS", SF_Compute);

I like to do it right after the c++ representation of the shader to keep it clean.

In this macro we are telling which class, FForceFieldCS, goes with which shader file, "/CustomShaders/ForceFieldCS.usf", and wich function will be the one that we need to call, "MainCS".

But one thing we have not covered yet is where to create the usf file and how to link it to UE.


Creating the shader is simple, create a txt and change the extension to .usf. The location of the file does not matter that much as long as it’s inside the main folder of the project, the one that contains the .uproject.

What we do is create a Shaders folder and put all of our shaders inside. Note that naming any folder Private might generate problems.

With this we have created the shader but we need to let unreal know about it, to do so do the following:

IMPLEMENT_MODULE(FComputeShaderRunnerModule, ComputeShaderRunner)


void FComputeShaderRunnerModule::StartupModule()

{

const FString ShaderDirectory = FPaths::Combine(FPaths::ProjectDir(), TEXT("Shaders"));

AddShaderSourceDirectoryMapping(FString("/CustomShaders"), ShaderDirectory);

}


void FComputeShaderRunnerModule::ShutdownModule()

{

}

In the cpp module file, in StartupModule call AddShaderSourceDirectoryMapping(FString("/CustomShaders"), ShaderDirectory); with the path to your shader folder. In our case since the Shaders folder is right at the project directory we only need to do const FString ShaderDirectory = FPaths::Combine(FPaths::ProjectDir(), TEXT("Shaders"));, but if you have subfolders you will need to write the path on the TEXT("Shaders").

now the path on the macro will be correct.

IMPORTANT NOTE

The module has to be configured like this on the .uproject file:

{

"Name": "ComputeShaderRunner",

"Type": "Runtime",

"LoadingPhase": "PostConfigInit",

"AdditionalDependencies": [

"Engine"

]

}

The LoadingPhase is extremely important, it has to be PostConfigInit because this is the phase where shaders are loaded by the engine, and also when we want to load ours.

C++ Dispatcher Class

This is the class called ForceField in our example:

class COMPUTESHADERRUNNER_API ForceField

{

public:

ForceField();


void BeginRendering();

void EndRendering();

void UpdateParameters(FForceFieldCSParameters& DrawParameters);

private:

void Execute_RenderThread(FRDGBuilder& builder, const FSceneTextures& SceneTextures);

//The delegate handle to our function that will be executed each frame by the renderer

FDelegateHandle OnPostResolvedSceneColorHandle;

//Cached Shader Manager Parameters

FForceFieldCSParameters cachedParams;

//Whether we have cached parameters to pass to the shader or not

volatile bool bCachedParamsAreValid;


//Reference to a pooled render target where the shader will write its output

TRefCountPtr<IPooledRenderTarget> ComputeShaderOutput;

};

In the .h file we can see that this clsss will be composed of a public BeginRendering() and EndRendering() functions that we will have to call whenever we want the compute shader to start computing or to stop.

The system works like this because of limitations we will discus in a bit.

With those we also have a UpdateParameters() function that will be used in case we have to change any parameter. Notice that this receives a FForceFieldCSParameters struct, this is a struct we will declare to encapsulate all of the compute shader parameters. Note that this is not necessary, but its how we did it.

This is the structure of the struct (👉😏👉)

struct FForceFieldCSParameters

{

UTextureRenderTarget2D* RenderTarget;

FIntPoint GetRenderTargetSize() const

{

return CachedRenderTargetSize;

}


FForceFieldCSParameters() { }

FForceFieldCSParameters(UTextureRenderTarget2D* IORenderTarget)

: RenderTarget(IORenderTarget)

{

CachedRenderTargetSize = RenderTarget ? FIntPoint(RenderTarget->SizeX, RenderTarget->SizeY) : FIntPoint::ZeroValue;

}


private:

FIntPoint CachedRenderTargetSize;

public:

uint32 TimeStamp;

};


But going back to ForceField notice that we have a variable FDelegateHandle OnPostResolvedSceneColorHandle this is the delegate handle that will hold the call to the function Execute_RenderThread(..). Before explaining why we do it like this instead of calling Execute_RenderThread(..) oruself lets see how BeginRendering(..) works:

void ForceField::BeginRendering()

{

//If the handle is already initalized and valid, no need to do anything

if (OnPostResolvedSceneColorHandle.IsValid())

{

return;

}

//Get the Renderer Module and add our entry to the callbacks so it can be executed each frame after the scene rendering is done

const FName RendererModuleName("Renderer");

IRendererModule* RendererModule = FModuleManager::GetModulePtr<IRendererModule>(RendererModuleName);

if (RendererModule)

{

OnPostResolvedSceneColorHandle = RendererModule->GetResolvedSceneColorCallbacks().AddRaw(this, &ForceField::Execute_RenderThread);

}

}


If you take a close look at this function you will see that all we are doing is getting the RendererModule, and if it is correct then add a callback to GetResolvedSceneColorCallbacks(). This way our function will be called every frame after the frame is rendered. I would have liked doing it another way, but look at Execute_RenderThread declaration:

Execute_RenderThread(FRDGBuilder& builder, const FSceneTextures& SceneTextures);

We need a FRDGBuilder& and a FSceneTextures&, this is because the callback requires them, but we are only going to use the builder, technically, only builder.RHICmdList, this is why we have to use the callback, we found no other way to get a valid FRHICommandListImmediate& any other way😥.

And this is the perfect introduction to the main function of the system, Execute_RenderThread:

void ForceField::Execute_RenderThread(FRDGBuilder& builder, const FSceneTextures& SceneTextures)

{

FRHICommandListImmediate& RHICmdList = builder.RHICmdList;


//If there's no cached parameters to use, skip

//If no Render Target is supplied in the cachedParams, skip

if (!(bCachedParamsAreValid && cachedParams.RenderTarget))

{

return;

}


//Render Thread Assertion

check(IsInRenderingThread());


//If the render target is not valid, get an element from the render target pool by supplying a Descriptor

if (!ComputeShaderOutput.IsValid())

{

UE_LOG(LogTemp, Warning, TEXT("Not Valid"));

FPooledRenderTargetDesc ComputeShaderOutputDesc(

FPooledRenderTargetDesc::Create2DDesc(

cachedParams.GetRenderTargetSize(),

cachedParams.RenderTarget->GetRenderTargetResource()->TextureRHI->GetFormat(),

FClearValueBinding::None,

TexCreate_None,

TexCreate_ShaderResource | TexCreate_UAV, false

)

);

ComputeShaderOutputDesc.DebugName = TEXT("ForceFieldCS_Output_RenderTarget1");

GRenderTargetPool.FindFreeElement(RHICmdList, ComputeShaderOutputDesc, ComputeShaderOutput, TEXT("ForceFieldCS_Output_RenderTarget2"));

}


//Specify the resource transition, we're executing this in post scene rendering so we set it to Graphics to Compute

ERHIAccess transitionType = ERHIAccess::SRVMask;

RHICmdList.TransitionResource(transitionType, cachedParams.RenderTarget->GetRenderTargetResource()->TextureRHI);

FForceFieldCS::FParameters PassParameters;

PassParameters.OutputTexture = ComputeShaderOutput->GetRenderTargetItem().UAV;

PassParameters.Dimensions = FVector2D(cachedParams.GetRenderTargetSize().X, cachedParams.GetRenderTargetSize().Y);

PassParameters.TimeStamp = cachedParams.TimeStamp;


//Get a reference to our shader type from global shader map

TShaderMapRef<FForceFieldCS> forceFieldCS(GetGlobalShaderMap(GMaxRHIFeatureLevel));


//Dispatch the compute shader

FComputeShaderUtils::Dispatch(

RHICmdList,

forceFieldCS,

PassParameters,

FIntVector(

FMath::DivideAndRoundUp(cachedParams.GetRenderTargetSize().X, NUM_THREADS_PER_GROUP_DIMENSION),

FMath::DivideAndRoundUp(cachedParams.GetRenderTargetSize().Y, NUM_THREADS_PER_GROUP_DIMENSION),

1

)

);


//Copy shader's output to the render target provided by the client

RHICmdList.CopyTexture(

ComputeShaderOutput->GetRenderTargetItem().ShaderResourceTexture,

cachedParams.RenderTarget->GetRenderTargetResource()->TextureRHI,

FRHICopyTextureInfo()

);

//Unbind the previously bound render targets

GRenderTargetPool.FreeUnusedResource(ComputeShaderOutput);


}

Its a long function, so lets go step by step; first we are getting the FRHICommandListImmediate& RHICmdList = builder.RHICmdList; that we will need and then making some basic checks to see if the cached params are ok and if we are in a rendering thread (and we are).


Then, if the ComputeShaderOutput is not valid (the first time it won’t be) we generate one:

if (!ComputeShaderOutput.IsValid())

{

UE_LOG(LogTemp, Warning, TEXT("Not Valid"));

FPooledRenderTargetDesc ComputeShaderOutputDesc(

FPooledRenderTargetDesc::Create2DDesc(

cachedParams.GetRenderTargetSize(),

cachedParams.RenderTarget->GetRenderTargetResource()->TextureRHI->GetFormat(),

FClearValueBinding::None,

TexCreate_None,

TexCreate_ShaderResource | TexCreate_UAV, false

)

);

ComputeShaderOutputDesc.DebugName = TEXT("ForceFieldCS_Output_RenderTarget1");

GRenderTargetPool.FindFreeElement(RHICmdList, ComputeShaderOutputDesc, ComputeShaderOutput, TEXT("ForceFieldCS_Output_RenderTarget2"));

}

We generate it by creating a FPooledRenderTargetDesc with the same parameters as our render target we have cached, we then give it a debug name just in case something goes wrong (and it surely did a lot) and finally we ask the graphics render target pool for a free element that we will bind with our RHICmdList and ComputeShaderOutputDesc. This will be the output of our compute shader.


Once we have this, we need to specify the resource transition, we're executing this in post scene rendering so we set it to Graphics and Compute with these two lines:

ERHIAccess transitionType = ERHIAccess::SRVMask;

RHICmdList.TransitionResource(transitionType, cachedParams.RenderTarget->GetRenderTargetResource()->TextureRHI);

Now its parameter setting time!

FForceFieldCS::FParameters PassParameters;

PassParameters.OutputTexture = ComputeShaderOutput->GetRenderTargetItem().UAV;

PassParameters.Dimensions = FVector2D(cachedParams.GetRenderTargetSize().X, cachedParams.GetRenderTargetSize().Y);

PassParameters.TimeStamp = cachedParams.TimeStamp;

Here we are getting the macro-defined parameters from FForceFieldCS that we talked about, and setting its values.


We now get a reference to our shader like so:

TShaderMapRef<FForceFieldCS> forceFieldCS(GetGlobalShaderMap(GMaxRHIFeatureLevel));

And the moment we’ve been waiting for, we are ready to dispatch our compute shader like this:

FComputeShaderUtils::Dispatch(

RHICmdList,

forceFieldCS,

PassParameters,

FIntVector(

FMath::DivideAndRoundUp(cachedParams.GetRenderTargetSize().X, NUM_THREADS_PER_GROUP_DIMENSION),

FMath::DivideAndRoundUp(cachedParams.GetRenderTargetSize().Y, NUM_THREADS_PER_GROUP_DIMENSION),

1

)

);

But remember, we got a render target from the pool? The result of the compute shader is stored there, we need to copy it to the render target the user provided.

RHICmdList.CopyTexture(

ComputeShaderOutput->GetRenderTargetItem().ShaderResourceTexture,

cachedParams.RenderTarget->GetRenderTargetResource()->TextureRHI,

FRHICopyTextureInfo()

);

IMPORTANT NOTE

This is one of the parts that needs to be improved:

//Unbind the previously bound render targets

GRenderTargetPool.FreeUnusedResource(ComputeShaderOutput);

At the end of the function, we are freeing the ComputeShaderOutput that we got from the render target pool, we do this because if not when we exit play the editor crashes because it tries to free that space from a non rendering thread, we still need to find a way to call this from a render thread before the game closes and the thread is closed. If someone knows how to do it, please share it with us so we can update this document.


Important note 2.0

The system works on editor and on build, but only if you don’t make it BeginRendering on BeginPlay, if you wait a few frames it does work, we don’t know why.

If it crashes make sure it is not starting from the beginning. We will keep on investigating to see when it’s a good moment to start rendering.


We would like to gieve a huge thank you to AyoubKhammassi, his repo on compute shaders for ue4 helped us a lot