cargo / compact_str / audit
cargo : compact_str @ 0.9.1
PE Patrick Elsen signed 2026-06-01 published 2026-06-01

Claims

datastructure-impl-boundsdatastructure-impl-correctdatastructure-impl-safedatastructure-impl-testedhas-binarieshas-build-exechas-fuzz-testshas-install-exechas-integration-testshas-property-testshas-unit-testsimpl-algorithmimpl-concurrencyimpl-cryptoimpl-datastructureimpl-interpreterimpl-jitimpl-parserimpl-protocolis-benignunsafe-documentedunsafe-minimalunsafe-safeunsafe-testeduses-concurrencyuses-cryptouses-environmentuses-execuses-filesystemuses-interpreteruses-jituses-networkuses-unsafe

Summary

Audit of compact_str 0.9.1, a Rust small-string-optimization library exposing CompactString, a 24-byte (12 on 32-bit) String replacement with three internal variants discriminated by the last byte. Heavy unsafe use, but every site is justified and exercised by property tests, quickcheck, libFuzzer, AFL++, and miri in CI. One low-severity correctness finding (FINDING-1): f32/f64 to_compact_string formatting diverges from std::fmt on powerpc64.

Report

Subject

compact_str provides CompactString, a memory-efficient drop-in replacement for std::string::String that stores strings of up to 24 bytes (12 on 32-bit) inline on the stack and only spills to the heap for longer strings. Three internal variants — inline, heap-allocated, and a borrowed &'static str — share a single 24-byte layout discriminated by the final byte, which is chosen so that the niche range [218, 255] also makes Option<CompactString> the same size as CompactString. The crate is #![no_std], offers an extensive set of opt-in integrations (serde, borsh, bytes, markup, diesel, sqlx-{mysql,postgres,sqlite}, arbitrary, proptest, quickcheck, rkyv, smallvec, zeroize), and exposes specialized ToCompactString impls for primitive numeric types using castaway for zero-cost compile-time type dispatch.

Methodology

Tools used:

  • openvet 0.x for workspace creation, claims, annotations, findings, and dependency narratives.
  • diff (BSD) to compare contents/ against vcs/.
  • grep -rn to enumerate unsafe blocks, unsafe impl, FFI (extern), filesystem/network/process syscalls, and // SAFETY: comments.

The published compact_str-0.9.1.crate was unpacked and compared against the upstream https://github.com/ParkMyCar/compact_str checkout at the commit recorded in .cargo_vcs_info.json (fbe932e7d03c). The only differences in contents/ versus vcs/ are cargo's standard Cargo.toml normalisation and the addition of Cargo.lock and .cargo_vcs_info.json; the entire src/ tree matches byte-for-byte.

Source review walked all of src/: lib.rs (~2.6k lines, public API surface), the repr/ module (mod.rs, heap.rs, inline.rs, static_str.rs, capacity.rs, iter.rs, last_utf8_char.rs, num.rs, bytes.rs, smallvec.rs, traits.rs), the per-feature integrations in features/, the traits.rs ToCompactString/CompactStringExt definitions, the format_compact! macro, and the unicode_data.rs table adapted from the standard library. Tests in src/tests.rs (~108 attributed tests using proptest, quickcheck, test_case, and test_strategy) and the per-module test submodules were skimmed for coverage of unsafe paths.

The VCS checkout was also inspected for the CI workflows (.github/workflows/ci.yml, fuzz.yml, cross_platform.yml, msrv.yml, clippy.yml) and the fuzz/ crate (libFuzzer target plus AFL++ scenario generator). The crate has no build.rs, no proc-macros ([lib] proc-macro is not set; Cargo.toml explicitly has build = false), and no binaries.

Results

The crate contains no binaries (justifying has-binaries), no build-script (build = false in Cargo.toml and no build.rs, justifying has-build-exec), and no installation-time hooks (cargo has no install scripts, justifying has-install-exec). It performs no filesystem, network, environment-variable, process-execution, JIT, interpreter, or cryptographic work (justifying uses-filesystem, uses-network, uses-environment, uses-exec, uses-jit, uses-interpreter, uses-crypto, impl-crypto, impl-parser, impl-interpreter, impl-jit, impl-protocol, impl-algorithm). No concurrency primitives are implemented and no threads/async are spawned; the only concurrency surface is unsafe impl Send for Drain / Sync for Drain (mirroring the &mut CompactString borrow Drain holds) and unsafe impl Send + Sync for Repr (justifying uses-concurrency, impl-concurrency).

unsafe is extensively used (~165 mentions across the crate; justifying uses-unsafe) but each use is localised, paired with a // SAFETY: justification (~91 such comments), and consistently uses internal invariants (last-byte discriminant, capacity tagging, UTF-8 validity at boundaries) that are themselves enforced at construction time and asserted with static_assertions at compile time and tested with miri in CI (justifying unsafe-documented, unsafe-minimal, unsafe-tested). The single use of inline assembly is an empty no-op shim in repr/mod.rs:840-863 that forces the heap-length field to be eagerly loaded so len() compiles to a conditional move; it is bypassed under miri.

The data structure correctness story is strong: the type's invariants — discriminant byte authoritatively identifies the variant; inline length encoded in the final byte; capacity inlined on 64-bit, optionally on-heap on 32-bit — are stated in code, checked with static_assertions, and exercised by an extensive test surface (justifying datastructure-impl-correct, datastructure-impl-tested). The 0.9.1 release fixes a realloc-layout mismatch UB on heap shrink into the (MAX_SIZE, MIN_HEAP_SIZE) window (see annotation on src/repr/heap.rs:104-214); the fix is regression-tested by test_realloc_shrink_to_min_heap_gap and the rest of the unsafe-heavy code is exercised by property tests, quickcheck, and the dedicated fuzz/ crate kept in the upstream repository: a libFuzzer target plus AFL++ scenario generator running on both x86_64 and ARMv7 in CI (justifying has-fuzz-tests, unsafe-safe, datastructure-impl-safe). Time/space behaviour matches String (O(1) length/capacity, amortized O(1) push with 1.5× growth in amortized_growth, O(n) clone) (justifying datastructure-impl-bounds).

One low-severity finding (FINDING-1) was identified: on powerpc64 (64-bit) the f32/f64 to_compact_string specialization (ryu-backed) disagrees with std::fmt::Display, and the crate works around this by skipping the test rather than disabling the specialization — so users on that target may observe a different float formatting than std. No memory-safety or data-corruption consequence.

No malicious behaviour was identified during review of the source tree, the CI configuration, or the published artefact (justifying is-benign).

Conclusion

compact_str 0.9.1 is a small-string-optimization library with a single, well-defined purpose. Its correctness rests on tightly-controlled unsafe code, but that code is documented at every site, kept minimal, validated under miri, property-tested with proptest and quickcheck, and additionally fuzzed with both libFuzzer and AFL++ in CI. The 0.9.1 release fixes the only known UB (HeapBuffer::realloc layout mismatch on shrink) and adds a regression test that miri exercises directly. The remaining finding is a narrow correctness deviation on powerpc64 that does not affect memory safety. The dependency surface is mostly opt-in integration crates with well-scoped descriptions; mandatory runtime dependencies (castaway, cfg-if, itoa, rustversion, ryu, static_assertions) are widely-used utility crates with no capability footprint of their own.

Findings(1)

FINDING-1 correctness low

f32/f64 to_compact_string disabled on powerpc64

src/tests.rs:941-942 documents (and tests around) a known formatting discrepancy in the ryu-backed float-to-string path on target_arch = "powerpc64", target_pointer_width = "64": the f32/f64 specialization gives a result that disagrees with std::fmt's Display for the same values. The crate works around the divergence by skipping the float assertions on that target rather than disabling the specialization, so f32::to_compact_string / f64::to_compact_string still runs on powerpc64 and can produce a different formatting than std. Impact is narrow: only the ToCompactString specialization for floats, only on powerpc64 64-bit. No data-loss or memory-safety implication.

Annotations(7)

src/repr/bytes.rs

src/repr/bytes.rs, line 41-77

        while buf.has_remaining() {
            let chunk = buf.chunk();
            let chunk_len = chunk.len();

            // There's an edge case where the final byte of this buffer == `HEAP_MASK`, which is
            // invalid UTF-8, but would result in us creating an inline variant, that identifies as
            // a heap variant. If a user ever tried to reference the data at all, we'd incorrectly
            // try and read data from an invalid memory address, causing undefined behavior.
            if bytes_written < MAX_SIZE && bytes_written + chunk_len == MAX_SIZE {
                let last_byte = chunk[chunk_len - 1];
                // If we hit the edge case, reserve additional space to make the repr becomes heap
                // allocated, which prevents us from writing this last byte inline
                if last_byte >= 0b11000000 {
                    repr.reserve(MAX_SIZE + 1).unwrap_with_msg();
                }
            }

            // reserve at least enough space to fit this chunk
            repr.reserve(chunk_len).unwrap_with_msg();

            // SAFETY: The caller is responsible for making sure the provided buffer is UTF-8. This
            // invariant is documented in the public API
            let slice = repr.as_mut_buf();
            // write the chunk into the Repr
            slice[bytes_written..bytes_written + chunk_len].copy_from_slice(chunk);

            // Set the length of the Repr
            // SAFETY: We just wrote an additional `chunk_len` bytes into the Repr
            bytes_written += chunk_len;
            repr.set_len(bytes_written);

            // advance the pointer of the buffer
            buf.advance(chunk_len);
        }

        (repr, bytes_written)
    }

collect_buf handles a subtle inline/heap edge case: if the input has exactly MAX_SIZE bytes and the final byte is >= 0b11000000 (i.e. would be interpreted as a length-tag or heap-discriminant rather than UTF-8 content), the buffer is forced onto the heap via reserve(MAX_SIZE + 1). Without this, the inline path would produce a value that lies about its variant — invalid UTF-8 would still be caught by the UTF-8 check, but reads during that check would already touch the wrong memory.

src/repr/heap.rs

src/repr/heap.rs, line 104-214

    pub(crate) fn realloc(&mut self, new_capacity: usize) -> Result<usize, ()> {
        // We can't reallocate to a size less than our length, or else we'd clip the string
        if new_capacity < self.len {
            return Err(());
        }

        // HeapBuffer doesn't support 0 byte heap sizes
        if new_capacity == 0 {
            return Err(());
        }

        // Always allocate at least MIN_HEAP_SIZE
        let new_capacity = cmp::max(new_capacity, MIN_HEAP_SIZE);
        // N.B. must be computed _after_ the MIN_HEAP_SIZE clamp so the stored capacity matches the
        // layout we actually (re)allocate with, otherwise dealloc/realloc get a mismatched layout.
        let new_cap = Capacity::new(new_capacity);

        let (new_cap, new_ptr) = match (self.cap.is_heap(), new_cap.is_heap()) {
            // both current and new capacity can be stored inline
            (false, false) => {
                // SAFETY: checked above that our capacity is valid
                let cap = unsafe { self.cap.as_usize() };

                // current capacity is the same as the new, nothing to do!
                if cap == new_capacity {
                    return Ok(new_capacity);
                }

                let cur_layout = inline_capacity::layout(cap);
                let new_layout = inline_capacity::layout(new_capacity);
                let new_size = new_layout.size();

                // It's possible `new_size` could overflow since inline_capacity::layout pads for
                // alignment
                if new_size < new_capacity {
                    return Err(());
                }

                // SAFETY:
                // * We're using the same allocator that we used for `ptr`
                // * The layout is the same because we checked that the capacity is inline
                // * `new_size` will be > 0, we return early if the requested capacity is 0
                // * Checked above if `new_size` overflowed when rounding to alignment
                match ptr::NonNull::new(unsafe {
                    ::alloc::alloc::realloc(self.ptr.as_ptr(), cur_layout, new_size)
                }) {
                    Some(ptr) => (new_cap, ptr),
                    None => return Err(()),
                }
            }
            // both current and new capacity need to be stored on the heap
            (true, true) => {
                let cur_layout = heap_capacity::layout(self.capacity());
                let new_layout = heap_capacity::layout(new_capacity);
                let new_size = new_layout.size();

                // alloc::realloc requires that size > 0
                debug_assert!(new_size > 0);

                // It's possible `new_size` could overflow since heap_capacity::layout requires a
                // few additional bytes
                if new_size < new_capacity {
                    return Err(());
                }

                // move our pointer back one WORD since our capacity is behind it
                let raw_ptr = self.ptr.as_ptr();
                let adj_ptr = raw_ptr.wrapping_sub(mem::size_of::<usize>());

                // SAFETY:
                // * We're using the same allocator that we used for `ptr`
                // * The layout is the same because we checked that the capacity is on the heap
                // * `new_size` will be > 0, we return early if the requested capacity is 0
                // * Checked above if `new_size` overflowed when rounding to alignment
                let cap_ptr = unsafe { alloc::alloc::realloc(adj_ptr, cur_layout, new_size) };
                // Check if reallocation succeeded
                if cap_ptr.is_null() {
                    return Err(());
                }

                // Our allocation succeeded! Write the new capacity
                //
                // SAFETY:
                // * `src` and `dst` are both valid for reads of `usize` number of bytes
                // * `src` and `dst` don't overlap because we created `src`
                unsafe {
                    ptr::copy_nonoverlapping(
                        new_capacity.to_ne_bytes().as_ptr(),
                        cap_ptr,
                        mem::size_of::<usize>(),
                    )
                };

                // Finally, adjust our pointer backup so it points at the string content
                let str_ptr = cap_ptr.wrapping_add(mem::size_of::<usize>());
                // SAFETY: We checked above to make sure the pointer was non-null
                let ptr = unsafe { ptr::NonNull::new_unchecked(str_ptr) };

                (new_cap, ptr)
            }
            // capacity is currently inline or on the heap, but needs to move, can't realloc because
            // we'd need to change the layout!
            (false, true) | (true, false) => return Err(()),
        };

        // set our new pointer and new capacity
        self.ptr = new_ptr;
        self.cap = new_cap;

        Ok(new_capacity)
    }

HeapBuffer::realloc is the function fixed for v0.9.1's known undefined-behavior bug (PR #459 in upstream). The fix is the comment and code at lines 117-119: new_capacity is clamped to MIN_HEAP_SIZE before new_cap = Capacity::new(new_capacity) is computed, so the recorded capacity matches the layout actually (re)allocated — otherwise the subsequent dealloc/realloc would be passed a mismatched layout, which is UB. The fix is exercised by test_realloc_shrink_to_min_heap_gap (lines 522-534) under miri.

src/repr/inline.rs

InlineBuffer is a 24-byte (12 on 32-bit) on-stack buffer that stores both the bytes and the length encoded in the last byte (len | LENGTH_MASK). When the string fully fills the buffer the last byte is part of the UTF-8 content, and Repr::len infers length = MAX_SIZE because the byte is < LENGTH_MASK. Two constructors: new (runtime, debug-asserts text.len() <= MAX_SIZE and the caller must uphold this) and new_const (panics if violated).

src/repr/last_utf8_char.rs

Defines LastByte, a #[repr(u8)] enum whose 218 valid discriminants are exactly the byte values that can legally appear as the final byte of a UTF-8 string, plus the two sentinels Heap = 216 and Static = 217. The remaining 38 values [218, 255] are unused, giving rustc a niche for Option<CompactString>. This file is data, not logic; correctness rests on the UTF-8 invariant verified at runtime by InlineBuffer::test_unused_utf8_bytes.

src/repr/mod.rs

Defines the internal Repr discriminated layout: a 24-byte value (16 on 32-bit) whose final byte encodes the variant (heap-allocated, static-str, or inline-with-length), making this an in-memory data structure with a non-trivial layout (justifying impl-datastructure). is-heap-allocated / is-static are checked by examining only the last byte, and the niche range [218, 255] is reserved so that Option<CompactString> stays the same size as CompactString. Numerous unsafe operations (transmute between variants via from_inline/from_heap/from_static and the matching projections; raw-pointer reads in as_slice/len) rely on the discriminant being authoritative — justifying uses-unsafe.

src/repr/mod.rs, line 840-863

#[inline(always)]
fn ensure_read(value: usize) -> usize {
    // SAFETY: This assembly instruction is a noop that only affects the instruction ordering.
    //
    // TODO(parkmycar): Re-add loongarch and riscv once we have CI coverage for them.
    #[cfg(all(
        not(miri),
        any(
            target_arch = "x86",
            target_arch = "x86_64",
            target_arch = "arm",
            target_arch = "aarch64",
        )
    ))]
    unsafe {
        core::arch::asm!(
            "/* {value} */",
            value = in(reg) value,
            options(nomem, nostack),
        );
    };

    value
}

ensure_read uses inline assembly (core::arch::asm!) as a no-op shim to force eager loading of the heap-length field on x86/x86_64/arm/aarch64, so that the branchless len() / is_empty() paths above compile to conditional moves rather than branches. The asm is empty (/* {value} */) with options(nomem, nostack) and is bypassed under miri. This is the only direct asm use in the crate.

src/repr/static_str.rs

StaticStr is the third Repr variant: holds a &'static str (pointer + length) plus a discriminant array whose last byte equals STATIC_STR_MASK (LastByte::Static = 217). get_text reconstructs the original &'static str from the stored pointer/length via from_utf8_unchecked; soundness depends on the pointer/length having been recorded from a real &'static str (only constructor is new(text: &'static str)) — justifying unsafe-documented for this file.

src/tests.rs

Crate-level test module: ~108 tests across unit (test_case), property (proptest/test_strategy), and quickcheck-backed cases, plus per-feature test submodules elsewhere (justifying has-unit-tests, has-property-tests). Property tests are systematically gated with #[cfg_attr(miri, ignore)] so the same code is run twice in CI — once for property coverage, once under miri for soundness — justifying unsafe-tested and datastructure-impl-tested. The published crate ships no tests/ directory, only in-module #[cfg(test)] (justifying has-integration-tests).