Middleware for Using Multiple GPUs in Python


Project Description

In order to enable programmers to take advantage of multiple GPUs, NVIDIA has improved data transfer rates between locally hosted GPUs and improved interaction between GPUs and multithread/multiprocess applications. Additionally, the latest releases of NVIDIA's CUDA programming environment provides mechanisms for interleaving kernel execution and data transfers between host and device memory. Although these features can be accessed in Python programs via the PyCUDA package [Klöckner2012], their effective use in accelerating a neural circuit emulation that uses multiple GPUs is currently nontrivial. To decouple the efficient management of and communication between GPUs from the neural emulation applications that require such functionality, we wish to develop a scalable middleware layer that both exploits the above mechanisms to achieve optimal data transfer rates without programmer intervention and provides a high-level Python API that developers can use to easily access multiple local and remote GPUs.

Possible Project Goals

  • Use GPUDirect (peer-to-peer) communication to accelerate communication between local Fermi-based GPUs.
  • Take advantage of existing networking packages such as ZeroMQ or OpenMPI to avoid reinventing the wheel.
  • Exploit asynchronous execution by automatically interleaving kernel launches and data transfers. One possible way to do this might involve non-stop kernels [Sun2010].
  • Support useful high-level communication patterns (e.g., scatter/gather).
  • Provide consistent API for accessing both local and remote GPUs.
  • Enable scaling over a range of GPU hardware configurations.
  • Ensure interoperability with PyCUDA (or at least provide a PyCUDA-like interface to the infrastructure).

Skills Gained

  • Familiarity with cutting-edge features of GPU programming platforms.
  • Experience developing parallel software for platforms comprising both CPUs and GPUs.
  • Experience using real-world networking/messaging platforms to develop performance-critical applications.