1. 不要反复创建新对象
  2. 尽量不要写嵌套循环
  3. 追求速度的时候不要做文件读写,非常耗时间
  4. 枚举策略效率瓶颈一般就在于子问题目标函数值的计算,因为每一次迭代都得算子问题目标函数值,5ms和0.5ms的计算时间区别放大到整体可能就是1000s到100s的区别。这块儿是性能优化的重点,能用numpy做矩阵运算就用矩阵运算,尽量不用python的原生数据结构,同时可以使用numba启用CPU做并行运算,进一步加速
numba启用CPU并行计算的装饰器
from numba import njit @njit(parallel=True) # CPU并行计算

A ~5 minute guide to Numba

Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops. The most common way to use Numba is through its collection of decorators that can be applied to your functions to instruct Numba to compile them. When a call is made to a Numba-decorated function it is compiled to machine code “just-in-time” for execution and all or part of your code can subsequently run at native machine code speed!

For information on supported platforms, operating systems, and architectures, please refer to the version support table (see Numba docs) and the support tiers documentation.

How do I get it?

Numba is available as a conda package for the Anaconda Python distribution:

$ conda install numba

Numba also has wheels available:

$ pip install numba

Numba can also be compiled from source, although we do not recommend it for first-time Numba users.

Numba is often used as a core package so its dependencies are kept to an absolute minimum, however, extra packages can be installed as follows to provide additional functionality:

  • scipy - enables support for compiling numpy.linalg functions.
  • colorama - enables support for color highlighting in backtraces/error messages.
  • pyyaml - enables configuration of Numba via a YAML config file.
  • intel-cmplr-lib-rt - allows the use of the Intel SVML (high performance short vector math library, x86_64 only). Installation instructions are in the performance tips (see Numba docs).

Will Numba work for my code?

This depends on what your code looks like, if your code is numerically orientated (does a lot of math), uses NumPy a lot and/or has a lot of loops, then Numba is often a good choice. In these examples we’ll apply the most fundamental of Numba’s JIT decorators, @jit, to try and speed up some functions to demonstrate what works well and what does not.

Numba works well on code that looks like this:

from numba import jit
import numpy as np

x = np.arange(100).reshape(10, 10)

@jit
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

print(go_fast(x))

It won’t work very well, if at all, on code that looks like this:

from numba import jit
import pandas as pd

x = {'a': [1, 2, 3], 'b': [20, 30, 40]}

@jit(forceobj=True, looplift=True) # Need to use object mode, try and compile loops!
def use_pandas(a): # Function will not benefit from Numba jit
    df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
    df += 1                        # Numba doesn't understand what this is
    return df.cov()                # or this!

print(use_pandas(x))

Note that Pandas is not understood by Numba and as a result Numba would simply run this code via the interpreter but with the added cost of the Numba internal overheads!

What is object mode?

The Numba @jit decorator fundamentally operates in two compilation modes, nopython mode and object mode. In the go_fast example above, the @jit decorator defaults to operating in nopython mode. The behaviour of the nopython compilation mode is to essentially compile the decorated function so that it will run entirely without the involvement of the Python interpreter. This is the recommended and best-practice way to use the Numba jit decorator as it leads to the best performance.

Should the compilation in nopython mode fail, Numba can compile using object mode. This achieved through using the forceobj=True key word argument to the @jit decorator (as seen in the use_pandas example above). In this mode Numba will compile the function with the assumption that everything is a Python object and essentially run the code in the interpreter. Specifying looplift=True might gain some performance over pure object mode as Numba will try and compile loops into functions that run in machine code, and it will run the rest of the code in the interpreter. For best performance avoid using object mode mode in general!

How to measure the performance of Numba?

First, recall that Numba has to compile your function for the argument types given before it executes the machine code version of your function. This takes time. However, once the compilation has taken place Numba caches the machine code version of your function for the particular types of arguments presented. If it is called again with the same types, it can reuse the cached version instead of having to compile again.

A really common mistake when measuring performance is to not account for the above behaviour and to time code once with a simple timer that includes the time taken to compile your function in the execution time.

For example:

from numba import jit
import numpy as np
import time

x = np.arange(100).reshape(10, 10)

@jit(nopython=True)
def go_fast(a): # Function is compiled and runs in machine code
    trace = 0.0
    for i in range(a.shape[0]):
        trace += np.tanh(a[i, i])
    return a + trace

# DO NOT REPORT THIS... COMPILATION TIME IS INCLUDED IN THE EXECUTION TIME!
start = time.perf_counter()
go_fast(x)
end = time.perf_counter()
print("Elapsed (with compilation) = {}s".format((end - start)))

# NOW THE FUNCTION IS COMPILED, RE-TIME IT EXECUTING FROM CACHE
start = time.perf_counter()
go_fast(x)
end = time.perf_counter()
print("Elapsed (after compilation) = {}s".format((end - start)))

This, for example prints:

Elapsed (with compilation) = 0.33030009269714355s
Elapsed (after compilation) = 6.67572021484375e-06s

A good way to measure the impact Numba JIT has on your code is to time execution using the timeit module functions; these measure multiple iterations of execution and, as a result, can be made to accommodate for the compilation time in the first execution.

As a side note, if compilation time is an issue, Numba JIT supports on-disk caching of compiled functions and also has an Ahead-Of-Time compilation mode.

How fast is it?

Assuming Numba can operate in nopython mode, or at least compile some loops, it will target compilation to your specific CPU. Speed up varies depending on application but can be one to two orders of magnitude. Numba has a performance guide that covers common options for gaining extra performance.

How does Numba work?

Numba reads the Python bytecode for a decorated function and combines this with information about the types of the input arguments to the function. It analyzes and optimizes your code, and finally uses the LLVM compiler library to generate a machine code version of your function, tailored to your CPU capabilities. This compiled version is then used every time your function is called.

Other things of interest:

Numba has quite a few decorators, we’ve seen @jit, but there’s also:

  • @njit - this is an alias for @jit(nopython=True) as it is so commonly used!
  • @vectorize - produces NumPy ufuncs (with all the ufunc methods supported). Docs are here.
  • @guvectorize - produces NumPy generalized ufuncs.Docs are here.
  • @stencil - declare a function as a kernel for a stencil like operation.Docs are here.
  • @jitclass - for jit aware classes. Docs are here.
  • @cfunc - declare a function for use as a native call back (to be calledfrom C/C++ etc). Docs are here.
  • @overload - register your own implementation of a function for use in nopython mode, e.g. @overload(scipy.special.j0).Docs are here.

Extra options available in some decorators:

ctypes/cffi/cython interoperability:

  • cffi - The calling of CFFI functions is supported in nopython mode.
  • ctypes - The calling of ctypes wrapped functions is supported in nopython mode.
  • Cython exported functions are callable.

GPU targets:

Numba can target Nvidia CUDA GPUs.You can write a kernel in pure Python and have Numba handle the computation anddata movement (or do this explicitly). Click for Numba documentation onCUDA.

为什么解释型语言会比编译型语言慢

特性 编译型语言 解释型语言
核心差异 提前翻译,直接执行机器码。 边翻译边执行,逐行进行。
启动速度 慢(因为要先编译)。 快(无需编译,可直接运行)。
执行速度 快。优化彻底,无运行时翻译开销。 相对慢。每步都有翻译和类型检查开销。
优化能力 强大。可进行全局、深度的静态优化。 受限。JIT可以优化热点代码,但仍有局限。
内存占用 通常较低。 通常较高(解释器、JIT编译器本身也占内存)。
典型代表 C, C++, Rust, Go Python, Ruby, JavaScript(最初),PHP(传统)

python解释器是如何工作的?

python在运行前也会编译,但不是把代码编译成机器码,而是编译成字节码.pyc文件(解释型语言优点就是灵活,支持动态类型、运行时可以动态修改代码等特性。而且编译虽然是一劳永逸的,但是编译本身是比较耗时间的,如果某些代码只需要执行一次那编译就是在浪费时间,只有代码需要重复使用的时候编译成机器码的优势才能体现出来)。以CPython解释器为例,解释器的活儿就是去逐行读取你的字节码文件,然后调用相应的C实现的方法(已经编译好的机器码)。这就是解释型语言处理循环多的场景很慢的原因,它运行每一行都得再翻译一遍(字节码->对应C方法->机器码,字节码CPU是读不懂的,机器码才可以直接由CPU去执行)。

参考

  1. https://numba.pydata.org/
  2. https://numpy.org/