Faiss is used for efficient similarity search and dense vector clustering, often applied in similarity evaluation for anomaly detection. The library has a GPU extension version, but officially only supports Linux. This article discusses building faiss-GPU on Windows and installing it into Python.
Although theoretically I should add a Windows+CUDA related Github Action and then merge it upstream, I don’t currently have the intention to wrestle with that.
Preparation
-
Install Winget, used to download some tools. Windows should come with it pre-installed, but if not, you can visit getwinget or Microsoft Docs
-
VS 2019 or 2022. As of now (2025/09/23), Intel OneAPI does not support VS2026. Run:
winget install Microsoft.VisualStudio.2022.BuildTools
, or openVisual Studio Installer
and choose vs2022 build tools -
CMake,
winget install Kitware.CMake
-
Ninja, much faster than MSBuild, run:
winget install Ninja-build.Ninja
-
swig, faiss uses it to generate Python extensions, run:
winget install SWIG.SWIG
-
BLAS, per official recommendation: if on Intel platform, choose Intel OneAPI MKL, otherwise use OpenBLAS
- Intel OneAPI,
winget install Intel.OneAPI.BaseToolkit
, very large (~2.8G). After installation, make sure you can find a file namedsetvars.bat
. If installed with default path, it will be atC:\Program Files (x86)\Intel\oneAPI\setvars.bat
- OpenBLAS? I haven’t tried, I only succeeded with MKL
- Intel OneAPI,
-
CUDA 12.x visit
https://developer.nvidia.com/cuda-toolkit-archive
and choose one. I tested successfully with12.4, 12.9
-
[Optional] Python, if you don’t need the Python package, you can skip it
Install gflags
faiss depends on gflags, just run the following commands in order to install:
git clone https://github.com/gflags/gflags.git
cd gflags
cmake -B build-out . -G "Ninja" -DCMAKE_INSTALL_PREFIX=C:/opt/gflags # directory will be auto-created and registered
cmake --build build-out --config Release
cmake --install build-out
[Optional] Configure Python
Choose a directory you find appropriate, create a virtual environment, install three packages, then activate the environment and enter Faiss code directory:
python -m venv .venv
pip install setuptools packaging numpy
Activate environment
First find x64 Native Tools Command Prompt for VS 2022
in start menu, then run C:\Program Files (x86)\Intel\oneAPI\setvars.bat
inside it. Result looks like:
**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.14.14
** Copyright (c) 2025 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools>"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
:: initializing oneAPI environment...
Initializing Visual Studio command-line environment...
Visual Studio version 17.14.14 environment configured.
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\"
Visual Studio command-line environment initialized for: 'x64'
: advisor -- latest
: compiler -- latest
: dal -- latest
: debugger -- latest
: dev-utilities -- latest
: dnnl -- latest
: dpcpp-ct -- latest
: dpl -- latest
: ipp -- latest
: ippcp -- latest
: mkl -- latest
: ocloc -- latest
: pti -- latest
: tbb -- latest
: umf -- latest
: vtune -- latest
:: oneAPI environment initialized ::
If I directly run C:\Program Files (x86)\Intel\oneAPI\setvars.bat
in cmd, it will complain it can’t find VS environment.
Clone faiss
I made some modifications to upstream code, until those are patched upstream you can clone my branch:
git clone https://github.com/myuanz/faiss.git
cd faiss
From here, all commands are run in Faiss code directory.
Configure
# If you need Python extension, run
cmake . -B build -DFAISS_ENABLE_PYTHON=ON -G "Ninja" -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DFAISS_ENABLE_GPU=ON
# If you don’t need Python extension, run
cmake . -B build -DFAISS_ENABLE_PYTHON=OFF -G "Ninja" -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DFAISS_ENABLE_GPU=ON
If SWIG is not found, you need to run this in PowerShell:
' -DSWIG_EXECUTABLE=' + (Get-Item (where.exe swig | Select-Object -First 1) -Force).Target
One possible output is:
-DSWIG_EXECUTABLE=C:\Users\pc\AppData\Local\Microsoft\WinGet\Packages\SWIG.SWIG_Microsoft.Winget.Source_8wekyb3d8bbwe\swigwin-4.3.1\swig.exe
Paste this string to the end of the cmake command above.
Build C++ part
cmake --build build -j
cmake --install build --prefix install
I also uploaded this exported install
to releases
Build Python part
cd build/faiss/python
python setup.py install
Modify loader.py
I’m not sure why, but I needed to modify loader.py:
- from .swigfaiss import *
+ from swigfaiss import *
to import successfully.
Check correctness
I provided an extra try_import_faiss_python.py
in the repo. After installing faiss-python and modifying loader, running this file will show some GPU info and performance comparisons. Since runtime also needs some dlls, the file adds several paths at the top:
import os, sys, ctypes
for p in (
r'./build/faiss/python',
r'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin',
r'C:\Program Files (x86)\Intel\oneAPI\compiler\2025.2\bin',
r'C:\Program Files (x86)\Intel\oneAPI\mkl\2025.2\bin'
):
p = os.path.abspath(p)
os.add_dll_directory(p)
sys.path.append(p)
#
# The below code is generated by LLM:
# GPU self-check
ngpu = faiss.get_num_gpus()
print(f"[INFO] Num GPUs detected by FAISS: {ngpu}")
if ngpu == 0:
raise RuntimeError("Can't find GPU")
# %%
# Parameters (adjust as needed; Flat complexity ~ nb*nq*d)
d = 64
nb = 200_000
nq = 40_000 # Increase batch size to avoid launch/scheduling overhead dominating
k = 10
seed = 123
rs = np.random.RandomState(seed)
xb = rs.randn(nb, d).astype('float32')
xq = rs.randn(nq, d).astype('float32')
xb = rs.randn(nb, d).astype('float32')
# Make the vectors have a "small structure" for sanity check
xb[:100, :] += 3
xq = (rs.randn(nq, d)).astype('float32')
xq[:5, :] += 3
# ---- CPU baseline ----
cpu = faiss.IndexFlatL2(d)
t0 = time.perf_counter()
cpu.add(xb)
t1 = time.perf_counter()
Dcpu, Icpu = cpu.search(xq, k)
t2 = time.perf_counter()
print(f"[CPU] add {1e3*(t1-t0):.1f} ms search {1e3*(t2-t1):.1f} ms")
... # truncated
Running the code will first show CPU at full load, then GPU at full load. This code only proves it works, not elegant. Maybe in the future I’ll figure out how to publish a whl.