Compile numpy without openblas to reduce pack size

Summary

Compiling numpy on Windows by self can save 36 MiB of openblas dependency. Building numpy from source on Windows is not as difficult as imagined.

Preface

I recently needed to package a tool that uses some numpy computations. At first glance with pyinstaller, there is a huge openblas dependency:

1
2
3
4
5
6
7

36.4 MiB [##########] libopenblas64__v0.3.23-246-g3d31191b-gcc_10_3_0.dll
10.6 MiB [## ] main.exe
5.5 MiB [# ] python311.dll
4.9 MiB [# ] /numpy
4.8 MiB [# ] /pydantic_core
...

I admire those powerful tools that could do a lot with just a few megabytes in the past. Besides, the performance requirements for numpy in this program are not high. So I tried to remove this dependency. There are many discussions online about packaging and reducing package size, but they only replace mkl with openblas. However, openblas is still quite large.

After searching, this site provides builds without openblas and mkl, but the last update was in 2022, and the version is still 1.22.4, which is a bit old. Since third-party builds already exist, I guessed this might not be too difficult.

Preparation

Compiler

On Windows, just download Microsoft C++ Build Tools, start it and select Desktop development with C++. Wait for it to finish and you are good to go. No need to set environment variables as numpy will handle it later.

Source Code

1
2
3
git clone --recurse-submodules https://github.com/numpy/numpy.git
cd numpy
git checkout maintenance/1.24.x # or 1.25.x, doesn't matter

Disable openblas

In numpy/distutils/, add a file site.cfg with the following content:

1
2
3
4
[openblas]
libraries =
library_dirs =
include_dirs =

Refer to site.cfg.example in root for complete description.

According to official docs, you can also disable openblas by setting environment variables.

Virtual environment and dependencies

1
2
3
python -m venv .#env
./.#env/Scripts/activate
pip install -r build_requirements.txt

Later we will run commands in the .#env virtual environment by default.

Build

For <=1.24.x, you can build with bare msvc like this:

1
python setup.py build -j 16

However, for 1.25+, this will silently fail on Windows by generating compilation commands longer than 32768 lines.

A viable solution is to switch to using cibuildwheel.

Modify pyproject.toml, find [tool.cibuildwheel], and comment out before-build, before-test, test-command by adding # in front or just delete them. It should look like:

1
2
3
4
5
6
[tool.cibuildwheel]
skip = "cp36-* cp37-* pp37-* *-manylinux_i686 *_ppc64le *_s390x *-musllinux_aarch64"
build-verbosity = "3"
# before-build = "bash {project}/tools/wheels/cibw_before_build.sh {project}"
# before-test = "pip install -r {project}/test_requirements.txt"
# test-command = "bash {project}/tools/wheels/cibw_test_command.sh {project}"

The before-build script is for downloading openblas. The two test steps don’t work for me.

Then build in PowerShell:

1
2
3
$env:CIBW_BUILD="cp311-win_amd64" # cp311 means CPython3.11, change to other versions if needed
$env:CIBW_ENVIRONMENT="NPY_USE_BLAS_ILP64=0" # Don't know why it doesn't read site.cfg, use env var to emphasize
cibuildwheel --platform windows

If everything goes well, you should see a whl file generated under wheelhouse:

1
2
-rwxrwxrwx 1 root root 6.1M Aug  7 23:37 wheelhouse/numpy-1.25.2-cp310-cp310-win_amd64.whl
-rwxrwxrwx 1 root root 6.1M Aug 7 23:44 wheelhouse/numpy-1.25.2-cp311-cp311-win_amd64.whl

Test

After installing the whl, check debug info:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
> python -c "import numpy as np; np.show_config(); print(np.__version__)"
blas_armpl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
blas_ssl2_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
accelerate_info:
NOT AVAILABLE
atlas_3_10_blas_threads_info:
NOT AVAILABLE
atlas_3_10_blas_info:
NOT AVAILABLE
atlas_blas_threads_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
blas_info:
NOT AVAILABLE
blas_src_info:
NOT AVAILABLE
blas_opt_info:
NOT AVAILABLE
lapack_armpl_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
lapack_ssl2_info:
NOT AVAILABLE
openblas_lapack_info:
NOT AVAILABLE
openblas_clapack_info:
NOT AVAILABLE
flame_info:
NOT AVAILABLE
atlas_3_10_threads_info:
NOT AVAILABLE
atlas_3_10_info:
NOT AVAILABLE
atlas_threads_info:
NOT AVAILABLE
atlas_info:
NOT AVAILABLE
lapack_info:
NOT AVAILABLE
lapack_src_info:
NOT AVAILABLE
lapack_opt_info:
NOT AVAILABLE
numpy_linalg_lapack_lite:
language = c
define_macros = [('HAVE_BLAS_ILP64', None), ('BLAS_SYMBOL_SUFFIX', '64_')]
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
1.25.2

Very good, no openblas at all. Run tests:

1
2
3
4
> python runtests.py -v --no-build 

...
=============== 35028 passed, 1100 skipped, 1308 deselected, 29 xfailed, 2 xpassed in 345.30s (0:05:45) ===============

For normal binary installs, the result is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
> python -c "import numpy as np; np.show_config(); print(np.__version__)"
openblas64__info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['openblas\\lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['openblas\\lib']
blas_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['openblas\\lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['openblas\\lib']
openblas64__lapack_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['openblas\\lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['openblas\\lib']
lapack_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['openblas\\lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['openblas\\lib']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
1.25.2

Packaging

1
2
3
4
5
6
7
# test.py

import numpy as np

print(np.random.rand(5, 5))
np.show_config()
print(np.__version__)

Without openblas install, pyinstaller test.py gives:

1
2
3
4
5
6
7
8
9
   6.9 MiB [##########] /numpy
5.5 MiB [####### ] python311.dll
3.9 MiB [##### ] test.exe
3.3 MiB [#### ] libcrypto-1_1.dll
1.7 MiB [## ] base_library.zip
1.1 MiB [# ] unicodedata.pyd
996.0 KiB [# ] ucrtbase.dll
...
Total disk usage: 26.1 MiB Apparent size: 25.9 MiB Items: 79

With default install, the result is:

1
2
3
4
5
6
7
8
9
10
  36.4 MiB [##########]  libopenblas64__v0.3.23-246-g3d31191b-gcc_10_3_0.dll
5.5 MiB [# ] python311.dll
4.9 MiB [# ] /numpy
3.9 MiB [# ] test.exe
3.3 MiB [ ] libcrypto-1_1.dll
1.7 MiB [ ] base_library.zip
1.1 MiB [ ] unicodedata.pyd
996.0 KiB [ ] ucrtbase.dll
...
Total disk usage: 60.6 MiB Apparent size: 60.5 MiB Items: 81

The difference is about 35 MiB, quite significant. Removing openblas, my tool can fit in a single 15 MiB file. Although it’s still some way from my goal of a few megabytes, this is Python after all, so I’m quite satisfied.

Misc

  • The commit I built 1.25.x on is ea677928332c37e8052b4d599bf6ee52cf363cf9. Reset to it if yours is different.
  • My Windows version is 22H2 19045.3271
  • The pure build takes around 2 minutes on my trashy E5-2678.
  • Sizes above are from ncdu output.

Compile numpy without openblas to reduce pack size

https://myuan.fun/Compile-numpy-without-openblas-to-reduce-pack-size/

作者

myuan

发布于

2023-08-07

更新于

2024-06-04

许可协议

评论