Building PyTorch without AVX2 on MacOS
In order to quickly explore PyTorch internals, I decided to compile and install a Debug build on my local machine. The first problem was that modern Clang surpris...
In order to quickly explore PyTorch internals, I decided to compile and install a Debug build on my local machine. The first problem was that modern Clang surpris...
We all love our CV/blog websites hosted on GitHub Pages. We also love Jupyter notebooks for revolting the look and feel of daily data processing. Now imagine that...
Web-conferencing platforms are on the raise during these unprecedented times. On the other side, the vulnerablilities of Zoom and lack of privacy motivates us to ...
In order to quickly explore PyTorch internals, I decided to compile and install a Debug build on my local machine. The first problem was that modern Clang surpris...
In order to debug a GPU kernel with cuda-gdb, we add -G -O0 to nvcc command line, which in case of CMake would be:
You may want to have your NVIDIA GPU not to be involved in any desktop rendering for many reasons. While this is the default on the headless servers, personal sys...
The CUDA compiler does not handle infinite loops properly. For instance, the loop below will be completely eliminated from the resulting assembly, along with its ...
Recent 5.x and 6.x GCC compilers are causing NVCC to produce the following kind of weird compile errors:
Recent 5.x and 6.x GCC compilers are causing NVCC to produce the following kind of weird compile errors:
In order to debug a GPU kernel with cuda-gdb, we add -G -O0 to nvcc command line, which in case of CMake would be:
CVS is still around for many important projects, making it difficult to scale their development. Tutorials available for this topic are not robust enough for ease...
Overleaf deploys Git to track collaborative modifications to projects. Moreover, a user has an option to work with Overleaf’s Git backend directly. It supports Gi...
In order to debug a GPU kernel with cuda-gdb, we add -G -O0 to nvcc command line, which in case of CMake would be:
Web-conferencing platforms are on the raise during these unprecedented times. On the other side, the vulnerablilities of Zoom and lack of privacy motivates us to ...
Suppose we have a crash while compiling huge application from source, e.g. a Python package with native C++ code. A source file fails to compile with the followin...
CVS is still around for many important projects, making it difficult to scale their development. Tutorials available for this topic are not robust enough for ease...
There is an interesting and not so well-known feature of APT package manager: the ability to automatically choose a download mirror for every individual operation.
The number of providers having problems with ipv4 support is growing. Recently we came across an ISP, which offers only ipv6, and is able to connect only to ipv6 ...
Overleaf deploys Git to track collaborative modifications to projects. Moreover, a user has an option to work with Overleaf’s Git backend directly. It supports Gi...
Database corruption always happens before we prepare for it. “Back up or give up” is the most frequently recommended solution. The main reason is database engines...
Upgrade from Zulip 5 to Zulip 6 requires updating PostgreSQL version from 10 to 14.
In order to quickly explore PyTorch internals, I decided to compile and install a Debug build on my local machine. The first problem was that modern Clang surpris...
In the most recent version of PyQt5, QWebEngineView refuses to draw any page content. Aparently, the solution is to disable sandboxing, as mentioned in this comme...
Suppose we have a crash while compiling huge application from source, e.g. a Python package with native C++ code. A source file fails to compile with the followin...
We all love our CV/blog websites hosted on GitHub Pages. We also love Jupyter notebooks for revolting the look and feel of daily data processing. Now imagine that...
Web-conferencing platforms are on the raise during these unprecedented times. On the other side, the vulnerablilities of Zoom and lack of privacy motivates us to ...
In order to quickly explore PyTorch internals, I decided to compile and install a Debug build on my local machine. The first problem was that modern Clang surpris...
The CUDA compiler does not handle infinite loops properly. For instance, the loop below will be completely eliminated from the resulting assembly, along with its ...
Recent 5.x and 6.x GCC compilers are causing NVCC to produce the following kind of weird compile errors:
GPU-equipped clusters are often managed by SLURM job control system. Essentially, developer logs into the frontend node by SSH, builds the application and then qu...
OpenACC enables rapid transition of serial C/C++/Fortran into GPU-enabled parallel code. However, due to high-level nature, OpenACC does not offer access to GPU-s...
The performance power of GPUs could be exposed to applications using two principal kinds of programming interfaces: with manual parallel programming (CUDA or Open...
Multiple presentations about OpenMP 4.0 support on NVIDIA GPUs date back to 2012. There is however still very limited OpenMP 4.0 production-ready tools availabili...
Runtime Compilation (NVRTC) introduced in CUDA 7.0 allows to dynamically compile CUDA kernels during program execution (see example). This functionality allows to...
Starting from GK110 (Tesla Kepler), “const restrict” annotation on kernel argument has an extra GPU-specific meaning: accesses to that argument should go through ...
Thrust is a very handy STL-like template library for rapid data processing on GPUs.
Largely thanks to LLVM, in recent years we’ve seen a significant increase of interest to domain-specific compilation tools research & development. With the re...
It is well-known that CUDA toolkit uses LLVM backend, but the used version number is not shown. We can use gdb and LLVM API function to print the version string:
In CUDA 6.0 release an extremely handy feature has been added to Visual Profiler: support for remote profiling. This means that you can run the profiler GUI from ...
OpenACC is known to be a fast method of developing quite efficient GPU-enabled applications. It is also possible to mix CUDA kernels and libraries with OpenACC ke...
Chart on the left shows the bandwidths of memory transfers on Jetson K1 (Click to enlarge). For the baseline we also added GTX680M’s host-device and device-host (...
We finally got the most wanted Jetson K1 board in the house! In this post we show how to turn a just unboxed tiny board into fully-functional CUDA development nod...
After installing CUDA driver from NVIDIA website, Ubuntu 13.04/14.04 window manager decorations (Unity, via Compiz) may stop working properly on Optimus machines ...
Consider we need to profile the following MPI-CUDA program on GPU cluster. The most obvious way to profile this code on console-only cluster would be to invoke th...
If cuda-gdb throws Program received signal CUDA_EXCEPTION_4, Warp Illegal Instruction. for the following code line:
There is an interesting and not so well-known feature of APT package manager: the ability to automatically choose a download mirror for every individual operation.
Web-conferencing platforms are on the raise during these unprecedented times. On the other side, the vulnerablilities of Zoom and lack of privacy motivates us to ...
Upgrade from Zulip 5 to Zulip 6 requires updating PostgreSQL version from 10 to 14.