Friday 11:45 a.m.–noon

pyMIC: A Python Offload Module for Intel(R) Xeon Phi(tm) Coprocessors

Michael Klemm

Audience level:


The Intel Xeon Phi coprocessor strives to provide additional compute power to HPC applications. This talk introduces the pyMIC module for offloading to the coprocessor from scientific Python codes. Its easy-to-use interface to invoke compute kernels, while handling data transfers in a flexible yet performant way. We will describe pyMIC's design and show how to use pyMIC in Python HPC applications.


Python is one of the most commonly used programming languages throughout the computing industry. Python has proven to be an easy-to-use, elegant scripting language that allows for rapid prototyping and development of highly flexible software. In the past years, Python has also gained a lot of attention by the high performance computing (HPC) community. Add-on packages such as Numpy and SciPy provide efficient implementations of key data structures and algorithms. Since implementing extensions with compiled languages such as C or Fortran is relatively straightforward, performance aspects no longer prohibit the use of Python as an HPC language.

The growing need for speed in HPC also drives the adoption of coprocessor hardware that accelerates the compute-intense floating point operations of typical applications. General-purpose graphics processing units (GPGPUs) and the Intel(R) Xeon Phi(tm) coprocessor are examples of discrete extension cards that provide additional compute power on top of traditional CPUs such as the Intel(R) Xeon(R) processors. These extensions typically require to program in native languages such as C/C++ plus a device-specific programming model (e.g., OpenCL).

The pyMIC offload module for Python strives to improve the situation by providing an easy-to-use, slim interface to enable offloading to the Intel Xeon Phi coprocessor from the Python level. The current version of pyMIC supports asynchronous invocation of kernels that have been implemented in C/C++ and a native multi-threading model. Its design allows to start with a rather simple offloading solution that can later be refined by adding more and more fine-grained control over buffer management, data transfers, and the offload process. To enable offloading for scientific Python applications, pyMIC blends in with the Numpy ndarray class and SciPy. Key pyMIC features are crafted around Numpy arrays; pyMIC provide array operations as offloaded kernels.

The presentation will introduce and discuss pyMIC's design principles and will show its usage by employing several educating examples that are inspired by real-world HPC applications. The example will explain how to start from a very simple, naive offload solution and how to incrementally refine the solution to optimize for performance. The presentation will show how pyMIC can be used to add offloading capabilities to GPAW, a software for electronic-structure calculations. The talk closes with a discussion of the performance of micro-benchmarks as well as application performance of two enabled applications, GPAW and PyFR.