Cover up that code, which I can't endure to look on

Serge Guelton , Ninon Eyrolles

Abstract

When running scientific software on a remote (cluster of) machine, we temporarily entrust the ownership of the code to a third party. In the case of scientific Python software, given the ease with which one can reverse Python bytecode, this means the algorithms are freely available to anyone that has a read access to the remote platform.

When this situation is not desirable, it is possible to pack the whole application in a single binary, using frozen modules, but this does not provide a good level of security. We propose a complementary approach based on automatic Python code obfuscation and automatic native code generation to make it harder for an attacker to recover the algorithms from the obfuscated application.

To achieve this goal, several obfuscation techniques are used, both at the Python level, the interpreter level, and joint levels. This involves control-flow obfuscations and data-flow-obfuscations as well as Python opcode remapping, self modifying code or junk bytecode injection. Additionally, the Pythran compiler is used to turn part of the Python code into native code to make it harder to reverse.

The transformations are applied either systematically or controlled by the user, which helps to keep the performance decrease within reasonable bounds. They are based on high-level transformation of Python code using the ast module, low-level modifications of the CPython implementation and the Pythran compiler infrastructure.

Experiments show that it is possible to turn a small Numpy-based application into a single obfuscated binary that runs almost as fast as the original implementation while being much harder to reverse.