Session: Scalable Hierarchical Parallel Computing

Target-audience:
Advanced

This tutorial is targeted at the intermediate Python user who wants to extend Python into hierarchical parallel computing. The tutorial will provide hands-on examples and essential performance tips every developer should know for writing effective parallel Python. The result will be a clear sense of possibilities and best practices using Python in simple parallel computing environments.

Many of the examples you often find on parallel Python focus on the mechanics of getting the parallel infrastructure working with your code, and not on actually building good portable parallel Python. This tutorial is intended to be a broad introduction to writing high-performance parallel Python that is well suited to both the beginner and the veteran developer. Parallel efficiency starts with the speed of the target code itself, so we will start with how to evolve code from for-loops to Python looping constructs and vector programming. We will also discuss tools and techniques to optimize your code for speed and memory performance.

The tutorial will overview working with the common parallel communication technologies (threading and multiprocessing) and introduce the use of parallel programming models such as blocking and non-blocking pipes, asynchronous and iterative conditional maps, and map-reduce. We will discuss strategies for extending parallel workflow to utilize hierarchical and heterogeneous computing, distributed parallel computing, and job schedulers.

At the end of the tutorial, participants should be able to write simple parallel Python scripts, make use of effective parallel programming techniques, and have a framework in place to leverage the power of Python in parallel computing environments.

OUTLINE:

parallel programming ~ (45 min)
  • vector programming vs looping constructs
  • timing, profiling, and code optimization
  • coding for speed and portability
  • scalability with asynchronous computing
  • Exercise(s)
multiprocessing and threads ~~ (45 min)
  • point to point communication
  • blocking and non-blocking (iterative, unordered, and asynchronous) communication
  • task pools and collective communication
  • issues: serialization, working in main
  • Exercise(s)
distributed parallel computing ~~~ (45 min)
  • point to point communication
  • blocking and non-blocking (iterative, unordered, and asynchronous) communication
  • task pools and collective communication
  • issues: serialization and heterogeneity
  • issues: failure detection and reporting
  • Exercise(s)
hierarchical workflow ~~ (45 min)
  • workflow management: ipyparallel, pathos, or scoop?
  • server-client computing
  • cluster schedulers
  • ssh-tunneling
  • Exercise(s)

This tutorial will assume attendees have basic knowledge of python and numpy. The tutorial will require python, numpy, and pathos to be installed, and optionally installs of ipyparallel and scoop. All packages can be installed under Anaconda or Canopy, with setuptools or pip, and may also be installed with Linux or Macintosh package managers.

An earlier version of the tutorial is available at: http://www.pyvideo.org/video/1345/efficient-parallel-python-for-high-performance-co while a preliminary version of the tutorial is at https://github.com/mmckerns/tuthpc.