Notes on work mostly hosted at: my personal GitHub and GitHub of APC LLC

all | popular | tags | rss

Thrust/CUDA tip: reuse temporary buffer across multiple transforms

Thrust is a very handy STL-like template library for rapid data processing on GPUs.

Continue Reading »

On-the-fly modification of LLVM IR code of CUDA sources

Largely thanks to LLVM, in recent years we’ve seen a significant increase of interest to domain-specific compilation tools research & development. With the re...

Continue Reading »

How to find CUDA's version of LLVM backend

It is well-known that CUDA toolkit uses LLVM backend, but the used version number is not shown. We can use gdb and LLVM API function to print the version string:

Continue Reading »

NVIDIA Visual Profiler allows to connect 64-bit Linux server from 32-bit Windows

In CUDA 6.0 release an extremely handy feature has been added to Visual Profiler: support for remote profiling. This means that you can run the profiler GUI from ...

Continue Reading »

Calling CUDA device function from OpenACC Fortran kernel

OpenACC is known to be a fast method of developing quite efficient GPU-enabled applications. It is also possible to mix CUDA kernels and libraries with OpenACC ke...

Continue Reading »

Jetson K1: bandwidthTest

Chart on the left shows the bandwidths of memory transfers on Jetson K1 (Click to enlarge). For the baseline we also added GTX680M’s host-device and device-host (...

Continue Reading »

Jetson K1: from unboxing straight to CUDA in 5 steps

We finally got the most wanted Jetson K1 board in the house! In this post we show how to turn a just unboxed tiny board into fully-functional CUDA development nod...

Continue Reading »

How to break Ubuntu 13.04/14.04 with vanilla CUDA driver and unbreak it back

After installing CUDA driver from NVIDIA website, Ubuntu 13.04/14.04 window manager decorations (Unity, via Compiz) may stop working properly on Optimus machines ...

Continue Reading »

Improving CUDA profiler output of the MPI-CUDA program

Consider we need to profile the following MPI-CUDA program on GPU cluster. The most obvious way to profile this code on console-only cluster would be to invoke th...

Continue Reading »

One non-obvious reason of 'Illegal instruction' in GPU code

If cuda-gdb throws Program received signal CUDA_EXCEPTION_4, Warp Illegal Instruction. for the following code line:

Continue Reading »
« Newer Posts Page 3 of 3