Subject
memchr is a #![no_std] Rust library by Andrew Gallant (BurntSushi) and bluss that provides byte and substring search primitives optimised with SIMD. The crate exposes memchr, memchr2, memchr3 and their reverse counterparts, the corresponding iterator wrappers, and the memmem sub-module for forward/reverse substring search (Finder, FinderRev, find_iter, rfind_iter, find, rfind). Under the arch:: namespace it also re-exports the underlying per-architecture searchers (x86_64 SSE2 and AVX2, aarch64 NEON, wasm32 simd128) and the architecture-independent fall-back implementations (Two-Way, Rabin-Karp, Shift-Or, plus a "packedpair" SIMD prefilter). At construction time a meta-searcher selects an implementation based on the needle and runtime CPU features; for small inputs a Rabin-Karp fast-path is used. The crate has no build.rs, no proc macros, and two optional dependencies: log (for diagnostic trace! messages when the logging feature is enabled) and rustc-std-workspace-core (only when consumed from inside std itself via rustc-dep-of-std).
Methodology
Tooling used:
openvet audit new (0.6.0) to fetch and unpack the crate from crates.io and clone the upstream GitHub repository at the commit recorded in .cargo_vcs_info.json.
diff -r (Apple Darwin) to compare published crate contents against the upstream VCS checkout.
grep to scan contents/src/ for unsafe, extern "C", process::*, std::net::*, std::fs::*, env::*, std::thread, transmute, raw-pointer manipulation, allocator usage, and panic-prone calls.
- Manual reading of the crate-level entry point (
src/lib.rs), the Vector and MoveMask traits (src/vector.rs), the pointer-extension helpers (src/ext.rs), the meta-searcher dispatch logic (src/memmem/searcher.rs), the safety-documentation conventions in src/arch/generic/memchr.rs, and the prose headers of src/arch/all/twoway.rs, src/arch/all/rabinkarp.rs and src/arch/all/shiftor.rs. The four per-architecture vector implementations (~5 K LOC) were spot-checked but not read end to end.
- Survey of the test suite under
src/tests/ (~1300 LOC of quickcheck-based property tests and naive reference implementations) and of the upstream-only fuzz/ directory (8 cargo-fuzz targets covering every public byte- and substring-search entry point).
The published memchr-2.8.1.crate was diffed against the upstream repository at the commit pinned in .cargo_vcs_info.json. Every file under src/ matches the upstream tree byte-for-byte; differences are limited to cargo's Cargo.toml normalisation, the auto-generated Cargo.lock, and the upstream-only .github/, benchmarks/, fuzz/, scripts/ (excluded via the exclude list in Cargo.toml.orig).
Results
The diff between published contents and the upstream repository shows no unexpected changes. The crate contains no binary artefacts (justifying has-binaries) and no build.rs; Cargo.toml declares build = false and [lib] has no proc-macro = true. There is no install-time hook either, justifying has-build-exec and has-install-exec.
The codebase was reviewed for cryptography, process spawning, network I/O, filesystem I/O, environment-variable access, concurrency primitives, JIT, interpreters and parsing of external data formats. None was found, justifying uses-crypto, uses-exec, uses-network, uses-filesystem, uses-environment, uses-concurrency, uses-jit, uses-interpreter, and the corresponding implementation claims impl-crypto, impl-protocol, impl-interpreter, impl-jit, impl-parser, impl-datastructure, and impl-concurrency. The crate's purpose is implementing the Two-Way, Rabin-Karp, Shift-Or, and SIMD-accelerated byte/substring search algorithms; their published time and space bounds are documented in the source (twoway.rs references the Crochemore-Perrin 1991 paper; the substring find / rfind doc-comments document the O(n + m) worst-case guarantee), justifying impl-algorithm together with algorithm-impl-safe, algorithm-impl-correct, algorithm-impl-tested, and algorithm-impl-bounds.
unsafe is used pervasively (~340 occurrences across the crate) — this is unavoidable for the SIMD intrinsics and raw-pointer-based search loops the crate is built around. Every pub unsafe fn in the arch/ tree carries a # Safety block enumerating pointer validity, allocated-object provenance, isize-distance non-overflow and no-wrap requirements; the Vector, MoveMask and Pointer traits document their safety contracts at the trait level. The memmem::searcher::Searcher uses a hand-rolled union for dispatching between substring implementations, with the safety invariant (paired function pointer must read the populated variant) documented on the SearcherKindFn type and on every dispatch function. The unsafe portions are exercised by quickcheck property tests under src/tests/ that compare every searcher against a naive reference implementation, and the codebase carries #[cfg(miri)] gates indicating routine validation under miri; the upstream fuzz/ directory carries cargo-fuzz harnesses for every public entry point. Together this justifies uses-unsafe, unsafe-safe, unsafe-documented, unsafe-minimal, and unsafe-tested.
No malicious behaviour was identified, justifying is-benign. No findings were recorded.
Conclusion
memchr is a mature, widely-deployed library (used by regex, ripgrep, the Rust standard library's substring search, and many others) whose safety-critical sections are well-documented, well-tested and well-bounded. The crate's unsafe-heavy design is intrinsic to its goal — SIMD-accelerated byte search — and the safety contracts at every unsafe boundary are explicit. The test suite combines per-implementation property tests, a miri-aware configuration, and an upstream fuzz harness. The audit produced no findings.