Largely thanks to LLVM, in recent years we’ve seen a significant increase of interest to domain-specific compilation tools research & development. With the release of PTX backends by NVIDIA (opensource NVPTX and proprietary libNVVM), construction of custom LLVM-driven compilers for generating GPU binaries also becomes possible. However, two questions are still remaining:
How to customize the CUDA source compilation? What is the NVIDIA’s best set of GPU-specific LLVM optimizations and how to continue modifying IR after applying them? In order to answer these two questions, we have created a special dynamic library. Being attached to NVIDIA CUDA compiler, this library exposes unoptimized and optimized LLVM IR code to the user and allows its on-the-fly modification. As result, domain-specific compiler developer receives flexibility e.g. to re-target CUDA-generated LLVM IR to different architectures, or to make additional modifications to IR after executing NVIDIA’s optimizations.
Source code, sample modification and description are available on our GitHub page. Tested with CUDA 6.0.
Dmitry Mikushin
If you need help with Machine Learning, Computer Vision or with GPU computing in general, please reach out to us at Applied Parallel Computing LLC.