Dmitry Mikushin: Coding Blog

Use CUDA 7.0 NVRTC with Thrust

Runtime Compilation (NVRTC) introduced in CUDA 7.0 allows to dynamically compile CUDA kernels during program execution (see example). This functionality allows to...

Apr 29, 2015 Software Engineering Comments

Get extra 8% perf in bilinear interpolation on GPU using restrict keyword

Starting from GK110 (Tesla Kepler), “const restrict” annotation on kernel argument has an extra GPU-specific meaning: accesses to that argument should go through ...

Mar 26, 2015 Software Engineering Comments

Thrust/CUDA tip: reuse temporary buffer across multiple transforms

Thrust is a very handy STL-like template library for rapid data processing on GPUs.

Oct 09, 2014 Software Engineering Comments

On-the-fly modification of LLVM IR code of CUDA sources

Largely thanks to LLVM, in recent years we’ve seen a significant increase of interest to domain-specific compilation tools research & development. With the re...

Sep 23, 2014 Software Engineering Comments

How to find CUDA's version of LLVM backend

It is well-known that CUDA toolkit uses LLVM backend, but the used version number is not shown. We can use gdb and LLVM API function to print the version string:

Jul 14, 2014 Software Engineering Comments

NVIDIA Visual Profiler allows to connect 64-bit Linux server from 32-bit Windows

In CUDA 6.0 release an extremely handy feature has been added to Visual Profiler: support for remote profiling. This means that you can run the profiler GUI from ...

Jul 13, 2014 Software Engineering Comments

Calling CUDA device function from OpenACC Fortran kernel

OpenACC is known to be a fast method of developing quite efficient GPU-enabled applications. It is also possible to mix CUDA kernels and libraries with OpenACC ke...

Jul 11, 2014 Software Engineering Comments

Jetson K1: bandwidthTest

Chart on the left shows the bandwidths of memory transfers on Jetson K1 (Click to enlarge). For the baseline we also added GTX680M’s host-device and device-host (...

Jun 15, 2014 Software Engineering Comments

Jetson K1: from unboxing straight to CUDA in 5 steps

We finally got the most wanted Jetson K1 board in the house! In this post we show how to turn a just unboxed tiny board into fully-functional CUDA development nod...

Jun 14, 2014 Software Engineering Comments

How to break Ubuntu 13.04/14.04 with vanilla CUDA driver and unbreak it back

After installing CUDA driver from NVIDIA website, Ubuntu 13.04/14.04 window manager decorations (Unity, via Compiz) may stop working properly on Optimus machines ...

Jun 01, 2014 Software Engineering Comments

Use CUDA 7.0 NVRTC with Thrust

Get extra 8% perf in bilinear interpolation on GPU using __restrict__ keyword

Thrust/CUDA tip: reuse temporary buffer across multiple transforms

On-the-fly modification of LLVM IR code of CUDA sources

How to find CUDA's version of LLVM backend

NVIDIA Visual Profiler allows to connect 64-bit Linux server from 32-bit Windows

Calling CUDA device function from OpenACC Fortran kernel

Jetson K1: bandwidthTest

Jetson K1: from unboxing straight to CUDA in 5 steps

How to break Ubuntu 13.04/14.04 with vanilla CUDA driver and unbreak it back

Get extra 8% perf in bilinear interpolation on GPU using restrict keyword