摘要
在 Windows 中通过自行编译 numpy, 可以省去 36 MiB 的 openblas 依赖. 自行编译 numpy 并没有想象中那么困难.
前言
近日需要打包一个使用了一些 numpy 计算的工具, pyinstaller
起手一看, 一个硕大的 openblas 依赖:
36.4 MiB [##########] libopenblas64__v0.3.23-246-g3d31191b-gcc_10_3_0.dll
10.6 MiB [## ] main.exe
5.5 MiB [# ] python311.dll
4.9 MiB [# ] /numpy
4.8 MiB [# ] /pydantic_core
...
我很崇尚过去那些几兆就能完成很多事情的强力工具, 加之这个程序里对 numpy 性能的要求并不算高, 因此我尝试着去除这个依赖, 网上有很多讨论打包和减小打包大小的, 但无非是把 mkl 换成 openblas, 但是 openblas 也够大的了.
经过检索, 此处 提供了不带 openblas 和 mkl 的构建, 但是最后一次更新已是 2022 年, 其版本还在 1.22.4, 有点老了. 既然已有第三方构建, 那我猜此事应该也不会太难.
准备
编译器
在 Windows 下构建 numpy, 只需在此下载Microsoft C++ 生成工具
, 启动后勾选使用 C++ 的桌面开发
, 等待完成即可. 不需要设置什么环境变量, 后面 numpy 会自己处理.
源代码
git clone --recurse-submodules https://github.com/numpy/numpy.git
cd numpy
git checkout maintenance/1.24.x # 或者 1.25.x 随便
关闭 openblas
在numpy/distutils/
添加一个文件site.cfg
, 内容为:
[openblas]
libraries =
library_dirs =
include_dirs =
此文件完整描述可参考根目录的site.cfg.example
.
按照官方文档, 也可以通过设置环境变量来关闭
虚拟环境和依赖包
python -m venv .#env
./.#env/Scripts/activate
pip install -r build_requirements.txt
之后默认在 .#env
的虚拟环境中执行命令.
构建
在 ⇐1.24.x 时, 可以裸 msvc 构建, 如下:
python setup.py build -j 16
但是 1.25+ 之后, 这种方式在 Windows 会生成单行超过 32768 的编译命令, 然后静默失败.
可行的方案是换用 cibuildwheel.
修改 pyproject.toml
, 找到[tool.cibuildwheel]
, 在before-build
, before-test
, test-command
这三行前面都加上#
, 或者删掉也行. 结果如下:
[tool.cibuildwheel]
skip = "cp36-* cp37-* pp37-* *-manylinux_i686 *_ppc64le *_s390x *-musllinux_aarch64"
build-verbosity = "3"
# before-build = "bash {project}/tools/wheels/cibw_before_build.sh {project}"
# before-test = "pip install -r {project}/test_requirements.txt"
# test-command = "bash {project}/tools/wheels/cibw_test_command.sh {project}"
before-build
的脚本是用来下载 openblas 的, 后面的两个 test 我这边运行不成.
之后在 PowerShell 中构建:
$env:CIBW_BUILD="cp311-win_amd64" # cp311 意为 CPython3.11, 你可以改成其他版本
$env:CIBW_ENVIRONMENT="NPY_USE_BLAS_ILP64=0" # 我不知道为什么后面的流程不吃 site.cfg, 这里再给一个环境变量强调下
cibuildwheel --platform windows
如果一切正常, 应当在 wheelhouse 目录下生成一个 whl 文件, 就是我们所要的了.
-rwxrwxrwx 1 root root 6.1M Aug 7 23:37 wheelhouse/numpy-1.25.2-cp310-cp310-win_amd64.whl
-rwxrwxrwx 1 root root 6.1M Aug 7 23:44 wheelhouse/numpy-1.25.2-cp311-cp311-win_amd64.whl
测试
安装 whl 后看看调试输出:
> python -c "import numpy as np; np.show_config(); print(np.__version__)"
blas_armpl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
blas_ssl2_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
accelerate_info:
NOT AVAILABLE
atlas_3_10_blas_threads_info:
NOT AVAILABLE
atlas_3_10_blas_info:
NOT AVAILABLE
atlas_blas_threads_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
blas_info:
NOT AVAILABLE
blas_src_info:
NOT AVAILABLE
blas_opt_info:
NOT AVAILABLE
lapack_armpl_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
lapack_ssl2_info:
NOT AVAILABLE
openblas_lapack_info:
NOT AVAILABLE
openblas_clapack_info:
NOT AVAILABLE
flame_info:
NOT AVAILABLE
atlas_3_10_threads_info:
NOT AVAILABLE
atlas_3_10_info:
NOT AVAILABLE
atlas_threads_info:
NOT AVAILABLE
atlas_info:
NOT AVAILABLE
lapack_info:
NOT AVAILABLE
lapack_src_info:
NOT AVAILABLE
lapack_opt_info:
NOT AVAILABLE
numpy_linalg_lapack_lite:
language = c
define_macros = [('HAVE_BLAS_ILP64', None), ('BLAS_SYMBOL_SUFFIX', '64_')]
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
1.25.2
很好, 完全没有 openblas, 跑一下 tests:
> python runtests.py -v --no-build
...
=============== 35028 passed, 1100 skipped, 1308 deselected, 29 xfailed, 2 xpassed in 345.30s (0:05:45) ===============
如果是普通的二进制安装, 则结果为
> python -c "import numpy as np; np.show_config(); print(np.__version__)"
openblas64__info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['openblas\\lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['openblas\\lib']
blas_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['openblas\\lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['openblas\\lib']
openblas64__lapack_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['openblas\\lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['openblas\\lib']
lapack_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['openblas\\lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['openblas\\lib']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
1.25.2
打包
# test.py
import numpy as np
print(np.random.rand(5, 5))
np.show_config()
print(np.__version__)
在无 openblas 安装中, pyinstaller test.py
得到结果:
6.9 MiB [##########] /numpy
5.5 MiB [####### ] python311.dll
3.9 MiB [##### ] test.exe
3.3 MiB [#### ] libcrypto-1_1.dll
1.7 MiB [## ] base_library.zip
1.1 MiB [# ] unicodedata.pyd
996.0 KiB [# ] ucrtbase.dll
...
Total disk usage: 26.1 MiB Apparent size: 25.9 MiB Items: 79
在默认安装中得到结果:
36.4 MiB [##########] libopenblas64__v0.3.23-246-g3d31191b-gcc_10_3_0.dll
5.5 MiB [# ] python311.dll
4.9 MiB [# ] /numpy
3.9 MiB [# ] test.exe
3.3 MiB [ ] libcrypto-1_1.dll
1.7 MiB [ ] base_library.zip
1.1 MiB [ ] unicodedata.pyd
996.0 KiB [ ] ucrtbase.dll
...
Total disk usage: 60.6 MiB Apparent size: 60.5 MiB Items: 81
差不多差了 35 MiB, 还是很可观的. 去掉 openblas, 我的工具单文件 15 MiB 就够了, 虽然离心目中的几兆还有些距离, 但这毕竟是 Python, 我很满足了.
拾遗
- 我编译 1.25.x 时的 HEAD 是 ea677928332c37e8052b4d599bf6ee52cf363cf9, 如果有哪里不同, 可以
git reset ea677928332c37e8052b4d599bf6ee52cf363cf9
过来 - 我的 Windows 版本是 22H2 19045.3271
- 你需要准备一把顺畅的梯子, cibuildwheel 会需要从 Python 官方网站下载全新的 Python
- 在我的洋垃圾 E5-2678 下纯编译过程需要两分钟左右
- 上面的大小是用 ncdu 输出的