Subject
encoding_rs 0.8.35 is Henri Sivonen's Gecko-oriented implementation of
the WHATWG Encoding Standard, the character-encoding conversion library
used by Firefox and much of the Rust HTTP/HTML ecosystem. It decodes and
encodes UTF-8, UTF-16LE/BE, and the legacy single-byte and CJK encodings
(Big5, EUC-JP, EUC-KR, GBK/GB18030, Shift_JIS, ISO-2022-JP,
x-user-defined, and the windows/ISO single-byte families). The public
API exposes the streaming Decoder/Encoder state machines, the
Encoding label registry, and the mem module of in-RAM
representation-conversion and Latin1/UTF-16 bidi-check helpers. SIMD
acceleration is available behind the off-by-default simd-accel feature.
Methodology
Tools: openvet 0.6.0, ripgrep, grep, awk, diff. I verified VCS
byte-equivalence with diff -rq contents vcs: the only difference is the
synthetic .cargo_vcs_info.json, so contents/ matches the published
VCS tree. I read the manifests, the CI script, and surveyed the source
for I/O, FFI, crypto, RNG, and concurrency. I read the SIMD/unsafe
conversion core: simd_funcs.rs in full, the unsafe macros and ASCII
fast paths in ascii.rs, the converter write machinery in handles.rs
including its file-level safety note, and the headers and representative
unsafe of mem.rs, utf_8.rs, and lib.rs. The large generated index
tables in data.rs (about 2.5 MB) were treated as data. Roughly 6 to 8K
lines of hand-written core were read in detail.
Scope. This is a large crate (about 138K LOC including generated tables,
271 unsafe occurrences across ~11 files, much of it macro-expanded and
SIMD-feature-gated). The following claims were not evaluated and are left
unasserted; they must not be read as either satisfied or violated:
unsafe-safe, unsafe-documented, unsafe-minimal, unsafe-tested,
algorithm-impl-safe, algorithm-impl-correct, algorithm-impl-bounds,
parser-impl-correct, and parser-impl-tested. Full WHATWG-conformance
verification and exhaustive review of every SIMD unsafe block are out of
scope. This audit verifies supply-chain integrity, the capability
surface, build/install execution, test presence, the implementation
categorization, and the representative unsafe I read.
Results
contents/ is byte-equivalent to the VCS tree apart from the expected
.cargo_vcs_info.json. The crate is pure computation: a source-wide
search found no network, filesystem, process, or environment access, so
uses-network, uses-filesystem, uses-exec, and uses-environment are
false; there is no cryptography or RNG (uses-crypto), no JIT or
interpreter (uses-jit, uses-interpreter), and no concurrency
primitives, unsafe impl Send/Sync, or atomics (uses-concurrency).
The manifest sets build = false with no proc-macro = true, so nothing
runs at build or install time (has-build-exec, has-install-exec), and
the published tree carries no compiled artifacts (has-binaries);
test_data/ holds plain-text encoding fixtures and data.rs holds
generated lookup tables. The one extern block is a
"platform-intrinsic" compiler intrinsic, not C FFI.
The crate decodes byte streams into Unicode and is therefore a parser
(impl-parser) and implements the WHATWG conversion algorithms
(impl-algorithm); it does not implement cryptography, an interpreter, a
JIT, a network protocol, a general data structure, or concurrency
primitives, so impl-crypto, impl-interpreter, impl-jit,
impl-protocol, impl-datastructure, and impl-concurrency are false.
The decoders handle malformed input per the Encoding Standard by emitting
U+FFFD replacement characters rather than invoking undefined behavior;
the representative decode paths I read keep their raw-pointer writes
within capacity reserved by the converter handles, supporting
parser-impl-safe. Testing is extensive: 388 #[test] functions inline
in src/ plus the test_data/ round-trip fixtures (has-unit-tests,
algorithm-impl-tested). There is no tests/ directory, no fuzz/
targets, and no proptest/quickcheck usage, so has-integration-tests,
has-fuzz-tests, and has-property-tests are false.
The unsafe surface (uses-unsafe) is real and performance-motivated:
ascii.rs and single_byte.rs use macro-generated pub unsafe fn
fast paths that elide bounds checks over src.add(i)/dst.add(i),
handles.rs writes through dst.add(self.pos) after reserving space,
and simd_funcs.rs reinterprets byte pointers as SIMD vectors and
transmutes between same-size vector types. The representative blocks read
are consistent with their stated contracts. One low-severity quality
finding records that per-block // SAFETY: documentation is uneven:
handles.rs and mem.rs document their contracts while ascii.rs
(which says so explicitly at lines 12-14), single_byte.rs, lib.rs,
and macros.rs carry none. No obfuscation, telemetry, base64 blobs,
include_bytes!, or suspicious endpoints were found (is-benign).
Dependencies are minimal: cfg-if (always on) and the optional
any_all_workaround, packed_simd, and serde.
Conclusion
The audit found one low-severity quality finding (uneven per-block
unsafe documentation) and no security, safety, or correctness defects
in the code that was read. encoding_rs 0.8.35 is byte-equivalent to its
VCS tree, performs no I/O and no build- or install-time execution, has no
concurrency or FFI to external libraries, and ships 388 inline tests
plus round-trip fixtures. The 271 unsafe occurrences are concentrated
in SIMD primitives and the bounds-check-eliding conversion fast paths;
exhaustive verification of every block and full WHATWG-conformance
checking were scoped out and left unasserted.