Compiling
In order to get the x264 encoder to build with CUDA, it was necessary to modify Makefile.
These changes include:
list of .cu files to be built
link in the object files generated by nvcc (I called them .cuo)
Include cuda.mk
cuda.mk is likely overly complicated. It can be cleaned up. In order to do so we understand the compiler flags that are passed to nvcc.
use_fast_math?
Structs with long long
When compiling with GCC, special care must be taken for structs that contain 64-bit integers. This is because GCC aligns long longs to a 4 byte boundary by default, while NVCC aligns long longs to an 8 byte boundary by default. Thus, when using GCC to compile a file that has a struct/union, users must give the -malign-double option to GCC. When using NVCC, this option is automatically passed to GCC. (http://developer.download.nvidia.com/compute/cuda/2_0/docs/CUDA_Toolkit_Release_Notes_linux_2.0_08.pdf).
In other words we can solve the problem by passing -malign-double option to the GCC compiler.
I solved this problem by passing the following to the NVCC compiler:
--no-align-double
Specifies that -malign-double should not be passed as a compiler argument on 32-bit platforms.
WARNING: this makes the ABI incompatible with the cuda kernel ABI for certain 64-bit types.
TODOs:
Figure out if CUDA requires PIC. If so then figure out how this should be set (flag for linker, compiler, etc.; which flag -fPIC or -bSymbolic)
Eventually handle CUDA as an option in the configure script so that the Makefile will build the non-CUDA version. This is not important at this time but will be important if we are to merge our changes into the trunk.