Subject
encoding_rs 0.8.35 is Henri Sivonen's Gecko-oriented implementation of
the WHATWG Encoding Standard, the character-encoding conversion library
used by Firefox and much of the Rust HTTP/HTML ecosystem. It decodes and
encodes UTF-8, UTF-16LE/BE, and the legacy single-byte and CJK encodings
(Big5, EUC-JP, EUC-KR, GBK/GB18030, Shift_JIS, ISO-2022-JP,
x-user-defined, and the windows/ISO single-byte families). The public
API exposes the streaming Decoder/Encoder state machines, the
Encoding label registry, and the mem module of in-RAM
representation-conversion and Latin1/UTF-16 bidi-check helpers. SIMD
acceleration is available behind the off-by-default simd-accel feature.
Methodology
Tools: openvet 0.6.0, ripgrep, grep, awk, diff. I verified VCS
byte-equivalence with diff -rq contents vcs: the only difference is the
synthetic .cargo_vcs_info.json, so contents/ matches the published
VCS tree. I read the manifests (Cargo.toml, Cargo.toml.orig), the CI
script, and surveyed the source for I/O, FFI, crypto, RNG, and
concurrency. I read the SIMD/unsafe conversion core: simd_funcs.rs in
full, the unsafe macros and ASCII fast paths in ascii.rs, the converter
write machinery in handles.rs including its file-level safety note, and
the headers and representative unsafe of mem.rs, utf_8.rs, and
lib.rs. The large generated index tables in data.rs (~2.5 MB) were
treated as data. Roughly 6-8K lines of hand-written core were read in
detail.
Scope. This is a large crate (~138K LOC including generated tables,
271 unsafe occurrences across ~11 files, much of it macro-expanded and
SIMD-feature-gated). The following claims were not evaluated and are left
unasserted; they must not be read as either satisfied or violated:
!unsafe-safe, !unsafe-documented, !unsafe-minimal,
!algorithm-impl-correct, and !parser-impl-correct. Full
WHATWG-conformance verification and exhaustive review of every SIMD
unsafe block are out of scope. This audit verifies supply-chain
integrity, the capability surface, build/install execution, test
presence, the implementation categorization, and the representative
unsafe I read.
Results
contents/ is byte-equivalent to the VCS tree apart from the expected
.cargo_vcs_info.json. The crate is pure computation: a source-wide
search found no network, filesystem, process, or environment access
(!uses-network, !uses-filesystem, !uses-exec,
!uses-environment), no cryptography or RNG (!uses-crypto), no JIT or
interpreter (!uses-jit, !uses-interpreter), and no concurrency
primitives, unsafe impl Send/Sync, or atomics (!uses-concurrency).
The manifest sets build = false with no proc-macro = true, so nothing
runs at build or install time (!has-build-exec, !has-install-exec),
and the published tree carries no compiled artifacts (!has-binaries);
test_data/ holds plain-text encoding fixtures and data.rs holds
generated lookup tables. The one extern block is an
"platform-intrinsic" compiler intrinsic, not C FFI.
The crate decodes byte streams into Unicode and is therefore a parser
(!impl-parser) and implements the WHATWG conversion algorithms
(!impl-algorithm); it does not implement cryptography, an interpreter,
a JIT, a network protocol, a general data structure, or concurrency
primitives (!impl-crypto, !impl-interpreter, !impl-jit,
!impl-protocol, !impl-datastructure, !impl-concurrency). The
decoders handle malformed input per the Encoding Standard by emitting
U+FFFD replacement characters rather than invoking undefined behavior;
the representative decode paths I read keep their raw-pointer writes
within capacity reserved by the converter handles, supporting
!parser-impl-safe. Testing is extensive: 388 #[test] functions inline
in src/ plus the test_data/ round-trip fixtures (!has-unit-tests,
!algorithm-impl-tested). There is no tests/ directory, no fuzz/
targets, and no proptest/quickcheck usage (!has-integration-tests,
!has-fuzz-tests, !has-property-tests).
The unsafe surface (!uses-unsafe) is real and performance-motivated:
ascii.rs and single_byte.rs use macro-generated pub unsafe fn
fast paths that elide bounds checks over src.add(i)/dst.add(i),
handles.rs writes through dst.add(self.pos) after reserving space,
and simd_funcs.rs reinterprets byte pointers as SIMD vectors and
transmutes between same-size vector types. The representative blocks read
are consistent with their stated contracts. One low-severity quality
finding records that per-block // SAFETY: documentation is uneven:
handles.rs and mem.rs document their contracts while ascii.rs
(which says so explicitly at lines 12-14), single_byte.rs, lib.rs,
and macros.rs carry none. No obfuscation, telemetry, base64 blobs,
include_bytes!, or suspicious endpoints were found (!is-benign).
Dependencies are minimal: cfg-if (always on) and the optional
any_all_workaround, packed_simd, and serde.
Conclusion
The audit found one low-severity quality finding (uneven per-block
unsafe documentation) and no security, safety, or correctness defects
in the code that was read. encoding_rs 0.8.35 is byte-equivalent to its
VCS tree, performs no I/O and no build- or install-time execution, has no
concurrency or FFI to external libraries, and ships 388 inline tests
plus round-trip fixtures. The 271 unsafe occurrences are concentrated
in SIMD primitives and the bounds-check-eliding conversion fast paths;
exhaustive verification of every block and full WHATWG-conformance
checking were scoped out and left unasserted.