Audit · compact_str@0.9.0

cargo : compact_str @ 0.9.0

PE Patrick Elsen signed 2026-05-27 published 2026-05-27

Claims

datastructure-impl-boundsdatastructure-impl-correctdatastructure-impl-safedatastructure-impl-testedhas-binarieshas-build-exechas-fuzz-testshas-install-exechas-integration-testshas-property-testshas-unit-testsimpl-algorithmimpl-concurrencyimpl-cryptoimpl-datastructureimpl-interpreterimpl-jitimpl-parserimpl-protocolis-benignunsafe-documentedunsafe-minimalunsafe-safeunsafe-testeduses-concurrencyuses-cryptouses-environmentuses-execuses-filesystemuses-interpreteruses-jituses-networkuses-unsafe

Summary

compact_str 0.9.0 is a no_std crate providing CompactString, a String-sized small-string-optimised type with three variants (inline, heap, &'static str). Heavy unsafe (~165 blocks) is locally justified, fully documented, and exercised by proptest + quickcheck under miri plus libfuzzer/AFL/honggfuzz harnesses. No build scripts, proc-macros, binaries, or system I/O. No findings.

Report

Subject

compact_str is a no_std Rust library that provides CompactString, a string type implementing the same small-string optimisation (SSO) as String but storing up to 24 bytes inline on 64-bit targets (12 bytes on 32-bit). A CompactString occupies exactly size_of::<String>() and is laid out as a tagged union of three variants — inline buffer, heap buffer, and &'static str — discriminated by the last byte. The crate also exposes a specialised ToCompactString trait (zero-cost dispatch for numeric and boolean primitives via castaway), a CompactStringExt trait (concat_compact, join_compact), and a format_compact! macro, plus optional integrations with serde, bytes, rkyv, borsh, diesel, sqlx, markup, arbitrary, proptest, quickcheck, smallvec, and zeroize.

Methodology

The published .crate contents were compared against the upstream Git repository at the commit recorded in .cargo_vcs_info.json (70361f7b) using diff -r. The full source tree was read: src/lib.rs (2662 lines), the src/repr/ submodules (~3300 lines), src/traits.rs, src/tests.rs (~2000 lines), src/macros.rs, src/unicode_data.rs, and every src/features/*.rs adapter. unsafe usage was enumerated with grep -rn "unsafe" contents/src (165 sites), and each unsafe block was paired with its SAFETY comment in src/repr/. System surface (std::process, std::net, std::fs, std::env, Command::) was checked with grep; no matches were found. The Cargo manifest was reviewed for build scripts (build = false), proc-macros (no [lib] proc-macro entry), and default features (default = ["std"]). The upstream fuzz/ workspace and .github/workflows/ were inspected to confirm the test surface (libfuzzer + AFL + honggfuzz harnesses, miri, proptest with PROPTEST_CASES=10000, MSRV CI, cross-platform CI). Tools used: diff (FreeBSD), grep (BSD), wc, ls, find, all at default versions shipped with macOS 25.5.

Results

The published crate contents match the upstream repository byte-for-byte in all source files; the only differences are cargo's standard Cargo.toml normalisation and the addition of .cargo_vcs_info.json. No binary artefacts are shipped (justifying has-binaries). There is no build.rs and the manifest's [lib] table does not set proc-macro = true, so the crate performs no compile-time code execution on consumers (justifying has-build-exec and has-install-exec).

The crate is no_std-by-default with an opt-in std feature. No code path opens files, sockets, processes, or reads environment variables; the only system interaction is the global allocator. Justifies uses-network, uses-filesystem, uses-exec, and uses-environment. There are no cryptographic primitives, no JIT, and no embedded interpreter (uses-crypto, uses-jit, uses-interpreter, impl-crypto, impl-interpreter, impl-jit, impl-protocol, impl-parser). The implementation is a string data structure (justifying impl-datastructure) and is not an algorithm in the sense the taxonomy contemplates (impl-algorithm). The crate declares unsafe impl Send/Sync for the representation but spawns no threads and uses no concurrency primitives (uses-concurrency, impl-concurrency).

unsafe is pervasive (165 sites) because the SSO scheme requires hand-managed memory: a NonNull<u8> heap pointer with capacity packed into spare bits of the discriminant word, reinterpretation between Repr, HeapBuffer, InlineBuffer, and StaticStr via mem::transmute, manual alloc::alloc/realloc/dealloc calls, raw ptr::copy[_nonoverlapping] in replace_range_*, insert_str, and the integer-to-string fast path adapted from libcore. Every unsafe block reviewed carries a SAFETY comment naming the invariant it relies on (length bound, char boundary, discriminant check, allocator layout). The invariants themselves are narrow and locally checked: e.g. replace_range calls ensure_range before the unsafe arms; InlineBuffer::new has debug_assert!(text.len() <= MAX_SIZE) paired with its safety contract; from_string checks the capacity tag before reusing the String's buffer.

The test surface backs this. src/tests.rs alone defines ~80 functions mixing test_case, quickcheck, and proptest over Unicode inputs, random byte slices, and pathological lengths up to 18 MB (justifying has-unit-tests and has-property-tests). The upstream fuzz/ workspace defines a single Scenario enum (fuzz/src/actions.rs) covering every mutating API — push, pop, replace_range, drain, insert, truncate, split_off, retain, shrink_to, zeroize, repeat — with three harnesses (libfuzzer, AFL, honggfuzz), justifying has-fuzz-tests. There is no tests/ directory in the published crate or the upstream workspace; the exhaustive in-tree suite plays the role of an integration test surface (justifying has-integration-tests = false). CI runs miri with -Zmiri-strict-provenance on every PR. Together these justify unsafe-safe, unsafe-documented, unsafe-minimal, unsafe-tested, datastructure-impl-safe, datastructure-impl-correct, datastructure-impl-tested, and datastructure-impl-bounds (growth is the documented 1.5x amortised; clone is documented O(n); no operation has an adversarial degenerate case).

The codebase shows no signs of malicious intent — no obfuscation, no data exfiltration, no telemetry — justifying is-benign.

No findings were raised.

Conclusion

compact_str is a high-quality, heavily-tested implementation of a non-trivial unsafe data structure. The unsafe surface is large but each block is small, locally justified, and exercised both by property-based tests under miri and by three independent fuzzers. The crate is suitable for production use.

Findings

No findings.

Annotations(15)

`src/features`

Each optional feature module (serde, bytes, smallvec, rkyv, borsh, diesel, sqlx, markup, arbitrary, proptest, quickcheck, zeroize) is a thin adapter to a well-known third-party trait. None of these features change the core invariants reviewed above.

`src/features/serde.rs`

Serialize/Deserialize for CompactString. Deserialization validates UTF-8 on every byte path (visit_bytes, visit_borrowed_bytes, visit_byte_buf).

`src/features/zeroize.rs`

Adapter around Repr::zeroize which performs a volatile-write loop plus a SeqCst compiler fence (src/repr/mod.rs:538). Volatile writes prevent the compiler from eliding the zeroing, but the helper is not constant-time and does not flush registers/caches; this is consistent with the zeroize crate's contract.

`src/lib.rs`

`src/lib.rs`, line 847-903

    pub fn replace_range(&mut self, range: impl RangeBounds<usize>, replace_with: &str) {
        let (start, end) = self.ensure_range(range);
        let dest_len = end - start;
        match dest_len.cmp(&replace_with.len()) {
            Ordering::Equal => unsafe { self.replace_range_same_size(start, end, replace_with) },
            Ordering::Greater => unsafe { self.replace_range_shrink(start, end, replace_with) },
            Ordering::Less => unsafe { self.replace_range_grow(start, end, replace_with) },
        }
    }

    /// Replace into the same size.
    unsafe fn replace_range_same_size(&mut self, start: usize, end: usize, replace_with: &str) {
        core::ptr::copy_nonoverlapping(
            replace_with.as_ptr(),
            self.as_mut_ptr().add(start),
            end - start,
        );
    }

    /// Replace, so self.len() gets smaller.
    unsafe fn replace_range_shrink(&mut self, start: usize, end: usize, replace_with: &str) {
        let total_len = self.len();
        let dest_len = end - start;
        let new_len = total_len - (dest_len - replace_with.len());
        let amount = total_len - end;
        let data = self.as_mut_ptr();
        // first insert the replacement string, overwriting the current content
        core::ptr::copy_nonoverlapping(replace_with.as_ptr(), data.add(start), replace_with.len());
        // then move the tail of the CompactString forward to its new place, filling the gap
        core::ptr::copy(
            data.add(total_len - amount),
            data.add(new_len - amount),
            amount,
        );
        // and lastly we set the new length
        self.set_len(new_len);
    }

    /// Replace, so self.len() gets bigger.
    unsafe fn replace_range_grow(&mut self, start: usize, end: usize, replace_with: &str) {
        let dest_len = end - start;
        self.reserve(replace_with.len() - dest_len);
        let total_len = self.len();
        let new_len = total_len + (replace_with.len() - dest_len);
        let amount = total_len - end;
        // first grow the string, so MIRI knows that the full range is usable
        self.set_len(new_len);
        let data = self.as_mut_ptr();
        // then move the tail of the CompactString back to its new place
        core::ptr::copy(
            data.add(total_len - amount),
            data.add(new_len - amount),
            amount,
        );
        // and lastly insert the replacement string
        core::ptr::copy_nonoverlapping(replace_with.as_ptr(), data.add(start), replace_with.len());
    }

replace_range and its three unsafe helpers (same_size, shrink, grow). Each path is reached only after ensure_range validates the bounds and char boundaries; reserve is called before set_len for the grow path. Exercised by both proptests and the ReplaceRange fuzz action.

`src/lib.rs`, line 998-1022

    pub fn insert_str(&mut self, idx: usize, string: &str) {
        assert!(self.is_char_boundary(idx), "idx must lie on char boundary");

        let new_len = self.len() + string.len();
        self.reserve(string.len());

        // SAFETY: We just checked that we may split self at idx.
        //         We set the length only after reserving the memory.
        //         We fill the gap with valid UTF-8 data.
        unsafe {
            // first move the tail to the new back
            let data = self.as_mut_ptr();
            core::ptr::copy(
                data.add(idx),
                data.add(idx + string.len()),
                new_len - idx - string.len(),
            );

            // then insert the new bytes
            core::ptr::copy_nonoverlapping(string.as_ptr(), data.add(idx), string.len());

            // and lastly resize the string
            self.set_len(new_len);
        }
    }

insert_str validates the index lies on a char boundary, reserves before computing pointers, then performs an in-place ptr::copy followed by a non-overlapping write. Verified by the InsertStr fuzz action.

`src/lib.rs`, line 2491-2525

/// An iterator over the exacted data by [`CompactString::drain()`].
#[must_use = "iterators are lazy and do nothing unless consumed"]
pub struct Drain<'a> {
    compact_string: *mut CompactString,
    start: usize,
    end: usize,
    chars: core::str::Chars<'a>,
}

// SAFETY: Drain keeps the lifetime of the CompactString it belongs to.
unsafe impl Send for Drain<'_> {}
unsafe impl Sync for Drain<'_> {}

impl fmt::Debug for Drain<'_> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_tuple("Drain").field(&self.as_str()).finish()
    }
}

impl fmt::Display for Drain<'_> {
    #[inline]
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.write_str(self.as_str())
    }
}

impl Drop for Drain<'_> {
    #[inline]
    fn drop(&mut self) {
        // SAFETY: Drain keeps a mutable reference to compact_string, so one one else can access
        //         the CompactString, but this function right now. CompactString::drain() ensured
        //         that the new extracted range does not split a UTF-8 character.
        unsafe { (*self.compact_string).replace_range_shrink(self.start, self.end, "") };
    }
}

Drain holds a raw *mut CompactString. Aliasing safety relies on Drain<'a> borrowing the parent for its lifetime via Chars<'a>. The Drop impl calls replace_range_shrink on the original string. Backed by the Drain fuzz action.

`src/repr`

Implementation of the core Repr data structure (SSO string with three variants). Justifies impl-datastructure.

`src/repr/bytes.rs`

bytes feature integration. from_utf8_buf defers the UTF-8 check until after a fast unchecked collect; the loop explicitly handles the edge case where the last inline byte would collide with the heap discriminant (test_fake_heap_variant covers this).

`src/repr/capacity.rs`

Encodes the heap capacity using usize-1 bytes, keeping the discriminant byte free. On 64-bit the value is always inline; on 32-bit a sentinel CAPACITY_IS_ON_THE_HEAP indicates the leading-word fallback. Round-trip is exhaustively tested in test_all_valid_32bit_values.

`src/repr/heap.rs`

Heap variant of the SSO representation. Manages a manual allocation via core::alloc::alloc/realloc/dealloc with capacity stored either inline (in the upper bytes of the cap word) or as a leading word on the heap (32-bit fallback for capacities > 16 MB). Every unsafe block carries a SAFETY comment covering allocator invariants, overflow checks, and aliasing. Supports uses-unsafe being asserted safe.

`src/repr/inline.rs`

`src/repr/inline.rs`, line 23-46

    pub(crate) unsafe fn new(text: &str) -> Self {
        debug_assert!(text.len() <= MAX_SIZE);

        let len = text.len();
        let mut buffer = InlineBuffer([0u8; MAX_SIZE]);

        // set the length in the last byte
        buffer.0[MAX_SIZE - 1] = len as u8 | LENGTH_MASK;

        // copy the string into our buffer
        //
        // note: in the case where len == MAX_SIZE, we'll overwrite the len, but that's okay because
        // when reading the length we can detect that the last byte is part of UTF-8 and return a
        // length of MAX_SIZE
        //
        // SAFETY:
        // * src (`text`) is valid for `len` bytes because `len` comes from `text`
        // * dst (`buffer`) is valid for `len` bytes because we assert src is less than MAX_SIZE
        // * src and dst don't overlap because we created dst
        //
        ptr::copy_nonoverlapping(text.as_ptr(), buffer.0.as_mut_ptr(), len);

        buffer
    }

InlineBuffer::new is a safety-tagged constructor: the caller must ensure text.len() <= MAX_SIZE. The buffer width and length-tag encoding (last byte = len | LENGTH_MASK, with len==MAX_SIZE inferred from the UTF-8 continuation byte) are verified exhaustively by test_unused_utf8_bytes over every char.

`src/repr/iter.rs`

FromIterator impls for char and string-like types. The string-like path eagerly fills an inline buffer and only heap-allocates when content exceeds MAX_SIZE; correctness is exercised by short_char_iter, packed_char_iter, long_char_iter, short_string_iter, long_short_string_iter.

`src/repr/last_utf8_char.rs`

An enum with explicit discriminants up to 217, restricting the last byte's value range so the compiler can use the high bits as a niche. This is what makes size_of::<Option<CompactString>>() == size_of::<CompactString>().

`src/repr/mod.rs`

`src/repr/mod.rs`, line 43-61

#[repr(C)]
pub(crate) struct Repr(
    /// We have a pointer in the representation to properly carry provenance.
    *const (),
    /// Then we need two `usize`s (aka WORDs) of data, for the first we just define a `usize`...
    usize,
    /// ...but the second we breakup into multiple pieces...
    #[cfg(target_pointer_width = "64")]
    u32,
    u16,
    u8,
    /// ...so that the last byte can be a [`LastByte`], which allows the compiler to see a niche
    /// value.
    LastByte,
);
static_assertions::assert_eq_size!([u8; MAX_SIZE], Repr);

unsafe impl Send for Repr {}
unsafe impl Sync for Repr {}

Core SSO representation: Repr is a 24-byte (8 on 32-bit: 12 bytes) struct that aliases inline / heap / static-str variants discriminated by the last byte. Backed by unsafe impl Send/Sync. Backs uses-unsafe and impl-datastructure.

`src/repr/num.rs`

Integer-to-string conversion adapted from libcore/fmt/num.rs. Uses unsafe pointer arithmetic into a pre-sized buffer; the number of digits is computed up-front via NumChars so the writer never overruns. Validated by proptests against u*::to_string() / i*::to_string().

`src/repr/static_str.rs`

Variant that wraps a &'static str in O(1). The pointer is taken from text.as_ptr() which is non-null for any valid &'static str, justifying the NonNull::new_unchecked.

`src/repr/traits.rs`

Defines the IntoRepr specialisation trait and implementations for f32/f64 (via ryu), bool, and char. The unreachable_unchecked() on char is gated on the UTF-8 length matching 1..=4, which holds for every char by construction.

Claims

Summary

Report

Subject

Methodology

Results

Conclusion

Findings

Annotations(15)

src/features

src/features/serde.rs

src/features/zeroize.rs

src/lib.rs

src/lib.rs, line 847-903

src/lib.rs, line 998-1022

src/lib.rs, line 2491-2525

src/repr

src/repr/bytes.rs

src/repr/capacity.rs

src/repr/heap.rs

src/repr/inline.rs

src/repr/inline.rs, line 23-46

src/repr/iter.rs

src/repr/last_utf8_char.rs

src/repr/mod.rs

src/repr/mod.rs, line 43-61

src/repr/num.rs

src/repr/static_str.rs

src/repr/traits.rs