Linux Wheel Gotchas¶
libstdc++ is never bundled — and must never be¶
The single most important rule for the Linux wheels: the gcc runtime
(libstdc++.so.6, libgcc_s.so.1) must come from the host or the active
conda/pixi env, never from the wheel. scripts/build_linux_wheel.sh skips it
explicitly in the ldd-closure bundling loop, and the test-wheel CI has a
regression guard asserting the wheel ships no copy and exactly one copy is
mapped at runtime.
Why (gh-119 forensics, 2026-06-10)¶
Wheels ≤ 0.35.0.4 bundled the conda env's gcc-14 libstdc++.so.6 (the
ldd-closure copied everything outside /usr/lib, and the conda prefix carries
its own gcc runtime). Our libraries resolved it via $ORIGIN rpath while
numpy/scipy — manylinux wheels that never vendor libstdc++ — loaded the system
copy. Result: two libstdc++ copies in every numpy-importing process.
That dual-copy state corrupts the heap: gcc emits template statics (the
std::locale facet registries that std::regex depends on) as
STB_GNU_UNIQUE symbols, which the dynamic loader unifies across the two
copies — so code inlined against one copy operates on the other copy's
internal state. Concretely, Environment.init → PropertyTree::
rebuildAutoValidators → isSequenceType → std::regex compile ends up calling
free() on a stack address (valgrind shows the wrong-vtable-slot callee
codecvt::do_unshift inside _BracketMatcher::_M_ready()).
Key experimental facts (seven rounds of no-build CI experiments against released PyPI wheels; results emitted as check-run annotations):
- Every released Linux wheel since at least 0.35.0.3 crashes, both arches — the bug predates the conda-deps migration and the GH #35 preload.
- Import order is irrelevant (numpy-first and tesseract-first both crash).
- Any single active copy is clean:
LD_PRELOADing either the system or the bundled copy fixes it, as does not importing numpy, as does deleting the bundled copy so the RPATH falls through to the system one. - glibc's
double free or corruption (top)abort is detection, not occurrence:free(stack-address)is UB that glibc only catches when the fake chunk trips its heuristics. In allocation-rich processes (the pytest suite) the same free corrupts silently — the suite was never protected, only undetected. - With the bundled copy removed,
env.init+ kinematics run clean and the process then hits the second latent bug: the gh-72 teardown crash (Environment destroyed before plugin-backed objects → plugin.sos dlclosed → destructors virtual-call unmapped pages), fixed bynb::keep_alive<0, 1>on the Environment getters.
The GH #35 preload is gone¶
__init__.py used to RTLD_GLOBAL-preload the bundled copy so it would win
the SONAME race against numpy on old distros. Its premise — "two copies
coexist safely, the newer one wins" — is exactly what the forensics falsified,
and it could never help numpy-first processes anyway (it ran too late). Do not
reintroduce it.
Supported libstdc++ floor¶
The conda-forge-built tesseract libs need a gcc-13+ era libstdc++
(GLIBCXX_3.4.32-ish): ubuntu 24.04+, debian 13+, fedora 38+, or any
conda/pixi env. On older distros the wheel imports fail loudly with a
missing-GLIBCXX/CXXABI error — by design; that is strictly better than the
silent corruption it replaces. The manylinux_2_35 platform tag encodes only
the glibc floor, not libstdc++, so pip will install on e.g. ubuntu 22.04 and
fail at import.
Wheel inside a conda/pixi env on an old distro: the env carries a suitable
libstdc++ at $CONDA_PREFIX/lib, but env activation does not put it on the
loader path for foreign (pip-installed) wheels — export
LD_LIBRARY_PATH="$CONDA_PREFIX/lib" (the ABI canary in wheels-linux.yml
does exactly this on its 22.04 build host).
Plugin bundling¶
Plugin factory libraries are dlopen'd by boost_plugin_loader, so the linker
never sees them — build_linux_wheel.sh copies them into the package root
explicitly and they seed the ldd closure. Objects created by those plugins
carry vtables that live in the plugin .sos; binding-level
nb::keep_alive<0, 1> on every Environment getter returning plugin-backed
objects keeps the Environment (and so the loader and the mapped plugins) alive
as long as any such object exists.