Notes on work mostly hosted at: my personal GitHub and GitHub of APC LLC

all | popular | tags | rss

Embedding Jupyter-quaility rich visual Python into a static website

We all love our CV/blog websites hosted on GitHub Pages. We also love Jupyter notebooks for revolting the look and feel of daily data processing. Now imagine that...

Continue Reading »

Setting up Jitsi Meet

Web-conferencing platforms are on the raise during these unprecedented times. On the other side, the vulnerablilities of Zoom and lack of privacy motivates us to ...

Continue Reading »

Building PyTorch without AVX2 on MacOS

In order to quickly explore PyTorch internals, I decided to compile and install a Debug build on my local machine. The first problem was that modern Clang surpris...

Continue Reading »

How to get infinite loops to work in CUDA

The CUDA compiler does not handle infinite loops properly. For instance, the loop below will be completely eliminated from the resulting assembly, along with its ...

Continue Reading »

How to fix CUDA and avx512vlintrin.h incompatibilty issue

Recent 5.x and 6.x GCC compilers are causing NVCC to produce the following kind of weird compile errors:

Continue Reading »

Remote profiling with NVIDIA Visual Profiler on a SLURM-based cluster

GPU-equipped clusters are often managed by SLURM job control system. Essentially, developer logs into the frontend node by SSH, builds the application and then qu...

Continue Reading »

Using CUDA device functions from OpenACC

OpenACC enables rapid transition of serial C/C++/Fortran into GPU-enabled parallel code. However, due to high-level nature, OpenACC does not offer access to GPU-s...

Continue Reading »

CUDA-like runtime interface for Xeon Phi

The performance power of GPUs could be exposed to applications using two principal kinds of programming interfaces: with manual parallel programming (CUDA or Open...

Continue Reading »

OpenMP 4.0 on NVIDIA CUDA GPUs

Multiple presentations about OpenMP 4.0 support on NVIDIA GPUs date back to 2012. There is however still very limited OpenMP 4.0 production-ready tools availabili...

Continue Reading »

Use CUDA 7.0 NVRTC with Thrust

Rintime Compilation (NVRTC) introduced in CUDA 7.0 allows to dynamically compile CUDA kernels during program execution (see example). This functionality allows to...

Continue Reading »
 
Page 1 of 3 Older Posts »