cargo : regex-automata @ 0.4.14
PE Patrick Elsen signed 2026-05-28 published 2026-05-28

Claims

algorithm-impl-boundsalgorithm-impl-safealgorithm-impl-testedconcurrency-documentedconcurrency-impl-correctconcurrency-impl-documentedconcurrency-impl-safeconcurrency-safedatastructure-impl-safedatastructure-impl-testedhas-binarieshas-build-exechas-fuzz-testshas-install-exechas-integration-testshas-property-testshas-unit-testsimpl-algorithmimpl-concurrencyimpl-cryptoimpl-datastructureimpl-interpreterimpl-jitimpl-parserimpl-protocolis-benignparser-impl-safeparser-impl-testedunsafe-documentedunsafe-minimalunsafe-safeunsafe-testeduses-concurrencyuses-cryptouses-environmentuses-execuses-filesystemuses-interpreteruses-jituses-networkuses-unsafe

Summary

regex-automata 0.4.14 is the linear-time finite-automata engine behind the regex crate, with a constant-time DFA serialization format. Read 64.5K LOC; the 57 unsafe sites cluster in the from_bytes deserialization casts (guarded by a documented validation chain over untrusted bytes), the bounds-elided DFA search loops, and a sharded object pool. No findings.

Report

Subject

regex-automata 0.4.14 is the automata layer beneath the regex crate. It compiles a regex (via regex-syntax's HIR) into finite automata and runs matches over them. It ships several engines: a Thompson NFA compiler, a PikeVM, a bounded backtracker, a fully-compiled dense DFA, a sparse DFA, a lazy/hybrid DFA, a one-pass DFA, and a meta engine that chooses among them. A distinctive feature is the dense and sparse DFA serialization: a DFA can be compiled offline, serialized to a &[u8] wire format, and later loaded in constant time with from_bytes (validated) or unsafe from_bytes_unchecked (unvalidated), including in no_std/no-alloc environments. The crate is no_std-capable; all four dependencies (aho-corasick, memchr, regex-syntax, log) are optional and feature-gated.

Methodology

Tools: openvet 0.6.0, ripgrep, diff, git, wc. The crate is ~64.5K lines across 70 source files, the largest in this batch. I surveyed for unsafe (57 sites: 37 unsafe {} blocks, 12 unsafe fn, 8 unsafe impl/trait), FFI, network, filesystem, process, environment, crypto, RNG, and concurrency. I read the unsafe-bearing files in full: util/wire.rs (the deserialization primitives), util/pool.rs and util/lazy.rs (the concurrency primitives), dfa/dense.rs, dfa/sparse.rs, dfa/accel.rs, dfa/search.rs, dfa/automaton.rs, and hybrid/search.rs/hybrid/dfa.rs, concentrating on the from_bytes deserialization path. I read lib.rs and the bounded-backtracker module for the time-complexity guarantees. I did not fetch a written specification for the bespoke binary format or formally verify algorithmic correctness against a reference, and I did not run miri, loom, or the fuzz targets.

Results

Every source file under contents/src is byte-identical to the VCS tree; only Cargo.toml differs, which is the expected cargo normalization. The pre-loaded vcs is a worktree (no .git directory at the symlink target), but it is fully present and diffable.

There is no FFI, no network, no filesystem access (the only std::fs hits are in rustdoc examples), no process execution, no environment access, no cryptography, and no RNG; hence uses-network, uses-filesystem, uses-exec, uses-environment, uses-crypto, uses-jit, and uses-interpreter are all false, as are the corresponding impl-crypto, impl-jit, impl-protocol, and impl-interpreter. The manifest sets build = false and the crate is not a proc macro, so has-build-exec and has-install-exec are false, and there are no precompiled assets (has-binaries). The crate carries extensive unit tests (94 #[test] in src, has-unit-tests), an integration test target plus suite-driven tests under tests/ (has-integration-tests), quickcheck property tests in src/util/primitives.rs and src/util/determinize/state.rs (has-property-tests), and a fuzz-regression harness in tests/fuzz with checked-in crash inputs from prior fuzzer findings on the DFA deserializer (has-fuzz-tests).

The crate implements automata data structures, the matching algorithms over them, a deserialization parser for its wire format, and its own concurrency primitives. The deserialization parser (impl-parser) is the highest-risk surface and is annotated on dfa/dense.rs: from_bytes decodes cheaply via unsafe from_bytes_unchecked and then runs a deliberately ordered validation chain that proves all transition targets are valid state IDs, all special-tagged states are genuinely special, and all accelerator indices and needle lengths are in range. The decode path validates stride2 (1..=9), rejects alphabet_len > stride, and uses the checked arithmetic helpers in util/wire.rs before its single from_raw_parts cast; alignment and slice length are checked first. These casts are sound because StateID/PatternID are repr(transparent) over u32 and their validity is a logical, not memory-safety, property. This justifies parser-impl-safe and uses-unsafe. The DFA search loops use get_unchecked under a documented two-invariant argument (valid state IDs from the unsafe trait Automaton contract, in-bounds haystack index from the loop guards), justifying unsafe-safe; every unsafe {} block carries a // SAFETY: comment and every unsafe fn/impl documents its contract, justifying unsafe-documented, and unsafe is confined to the deserialization casts, the search fast paths, and the pool/lazy primitives, justifying unsafe-minimal. The upstream code is structured with cfg(miri) accommodations throughout, indicating the suite is exercised under miri, and the DFA deserializer has dedicated fuzz-regression tests; this supports unsafe-tested.

The object pool and lazy initializer (util/pool.rs, util/lazy.rs) implement the crate's concurrency (impl-concurrency, uses-concurrency). The pool's hand-written unsafe impl Sync is justified by an owner-thread CAS protocol that gates the only non-Sync field, with a wrap-around panic guard against thread-ID reuse; the public types express thread-safety through Send/Sync bounds. This justifies concurrency-safe, concurrency-documented, concurrency-impl-safe, concurrency-impl-correct, and concurrency-impl-documented. The finite-automata engines carry a documented worst-case O(m * n) time bound, and the single backtracking engine bounds itself with a visited set to avoid catastrophic blowup, justifying impl-algorithm, algorithm-impl-safe, algorithm-impl-bounds, and impl-datastructure with datastructure-impl-safe. The breadth of the test suite (unit, integration suite, property, and fuzz-regression) supports parser-impl-tested, algorithm-impl-tested, and datastructure-impl-tested. I found no obfuscation, embedded blobs, telemetry, or suspicious endpoints, so is-benign. No findings were recorded.

I did not assert the *-correct claims for the parser, algorithm, or data structures, nor concurrency-impl-tested: verifying the bespoke binary format against a written spec and exercising the concurrency under loom/ThreadSanitizer were out of scope for this review.

Conclusion

regex-automata 0.4.14 is a linear-time regex engine with a bespoke, constant-time DFA serialization format. Its 57 unsafe sites are concentrated in the wire-format deserialization casts, the bounds-elided search fast paths, and a sharded object pool with a hand-written Sync impl; each is accompanied by a safety comment and, in the deserialization path, a documented validation chain that establishes the state-ID-validity invariant the unsafe code depends on. The crate performs no I/O of any kind, has no build-time execution, depends only on optional feature-gated crates, and ships unit, integration, property, and fuzz-regression tests including checked-in crash inputs targeting the deserializer. No security, safety, or correctness findings were recorded.

Findings

No findings.

Annotations(5)

src/dfa/dense.rs

src/dfa/dense.rs, line 2340-2447

    pub fn from_bytes(
        slice: &'a [u8],
    ) -> Result<(DFA<&'a [u32]>, usize), DeserializeError> {
        // SAFETY: This is safe because we validate the transition table, start
        // table, match states and accelerators below. If any validation fails,
        // then we return an error.
        let (dfa, nread) = unsafe { DFA::from_bytes_unchecked(slice)? };
        // Note that validation order is important here:
        //
        // * `MatchState::validate` can be called with an untrusted DFA.
        // * `TransistionTable::validate` uses `dfa.ms` through `match_len`.
        // * `StartTable::validate` needs a valid transition table.
        //
        // So... validate the match states first.
        dfa.accels.validate()?;
        dfa.ms.validate(&dfa)?;
        dfa.tt.validate(&dfa)?;
        dfa.st.validate(&dfa)?;
        // N.B. dfa.special doesn't have a way to do unchecked deserialization,
        // so it has already been validated.
        for state in dfa.states() {
            // If the state is an accel state, then it must have a non-empty
            // accelerator.
            if dfa.is_accel_state(state.id()) {
                let index = dfa.accelerator_index(state.id());
                if index >= dfa.accels.len() {
                    return Err(DeserializeError::generic(
                        "found DFA state with invalid accelerator index",
                    ));
                }
                let needles = dfa.accels.needles(index);
                if !(1 <= needles.len() && needles.len() <= 3) {
                    return Err(DeserializeError::generic(
                        "accelerator needles has invalid length",
                    ));
                }
            }
        }
        Ok((dfa, nread))
    }

    /// Deserialize a DFA with a specific state identifier representation in
    /// constant time by omitting the verification of the validity of the
    /// transition table and other data inside the DFA.
    ///
    /// This is just like [`DFA::from_bytes`], except it can potentially return
    /// a DFA that exhibits undefined behavior if its transition table contains
    /// invalid state identifiers.
    ///
    /// This routine is useful if you need to deserialize a DFA cheaply
    /// and cannot afford the transition table validation performed by
    /// `from_bytes`.
    ///
    /// # Example
    ///
    /// ```
    /// use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};
    ///
    /// let initial = DFA::new("foo[0-9]+")?;
    /// let (bytes, _) = initial.to_bytes_native_endian();
    /// // SAFETY: This is guaranteed to be safe since the bytes given come
    /// // directly from a compatible serialization routine.
    /// let dfa: DFA<&[u32]> = unsafe { DFA::from_bytes_unchecked(&bytes)?.0 };
    ///
    /// let expected = Some(HalfMatch::must(0, 8));
    /// assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);
    /// # Ok::<(), Box<dyn std::error::Error>>(())
    /// ```
    pub unsafe fn from_bytes_unchecked(
        slice: &'a [u8],
    ) -> Result<(DFA<&'a [u32]>, usize), DeserializeError> {
        let mut nr = 0;

        nr += wire::skip_initial_padding(slice);
        wire::check_alignment::<StateID>(&slice[nr..])?;
        nr += wire::read_label(&slice[nr..], LABEL)?;
        nr += wire::read_endianness_check(&slice[nr..])?;
        nr += wire::read_version(&slice[nr..], VERSION)?;

        let _unused = wire::try_read_u32(&slice[nr..], "unused space")?;
        nr += size_of::<u32>();

        let (flags, nread) = Flags::from_bytes(&slice[nr..])?;
        nr += nread;

        let (tt, nread) = TransitionTable::from_bytes_unchecked(&slice[nr..])?;
        nr += nread;

        let (st, nread) = StartTable::from_bytes_unchecked(&slice[nr..])?;
        nr += nread;

        let (ms, nread) = MatchStates::from_bytes_unchecked(&slice[nr..])?;
        nr += nread;

        let (special, nread) = Special::from_bytes(&slice[nr..])?;
        nr += nread;
        special.validate_state_len(tt.len(), tt.stride2)?;

        let (accels, nread) = Accels::from_bytes_unchecked(&slice[nr..])?;
        nr += nread;

        let (quitset, nread) = ByteSet::from_bytes(&slice[nr..])?;
        nr += nread;

        // Prefilters don't support serialization, so they're always absent.
        let pre = None;
        Ok((DFA { tt, st, ms, special, accels, pre, quitset, flags }, nr))
    }

DFA::from_bytes is the highest-risk surface: it deserializes an untrusted serialized automaton. It calls the unsafe fn from_bytes_unchecked (which performs only the cheap, constant-time decode) and then runs a full validation chain whose order is documented and load-bearing: accels.validate, ms.validate, tt.validate, st.validate, special.validate_state_len, and a per-state accelerator-index/needle-length bounds check. tt.validate (dense.rs:3620) walks every state and every transition, rejecting any transition target that is not a valid state-ID boundary and any special-tagged state that is not actually special. This establishes the "all transition targets are valid state IDs" invariant that the search hot path relies on. The from_bytes_unchecked decode itself validates stride2 in 1..=9, rejects alphabet_len > stride, and uses the checked-arithmetic helpers wire::shl/wire::mul/wire::add plus check_slice_len and check_alignment::<StateID> before the single core::slice::from_raw_parts cast at dense.rs:3442 (the comment notes it is the only unsafe code in that function). Justifies impl-parser, parser-impl-safe, uses-unsafe, and unsafe-safe.

src/dfa/search.rs

src/dfa/search.rs, line 83-124

    while at < input.end() {
        // SAFETY: There are two safety invariants we need to uphold here in
        // the loops below: that 'sid' and 'prev_sid' are valid state IDs
        // for this DFA, and that 'at' is a valid index into 'haystack'.
        // For the former, we rely on the invariant that next_state* and
        // start_state_forward always returns a valid state ID (given a valid
        // state ID in the former case). For the latter safety invariant, we
        // always guard unchecked access with a check that 'at' is less than
        // 'end', where 'end <= haystack.len()'. In the unrolled loop below, we
        // ensure that 'at' is always in bounds.
        //
        // PERF: See a similar comment in src/hybrid/search.rs that justifies
        // this extra work to make the search loop fast. The same reasoning and
        // benchmarks apply here.
        let mut prev_sid;
        while at < input.end() {
            prev_sid = unsafe { next_unchecked!(sid, at) };
            if dfa.is_special_state(prev_sid) || at + 3 >= input.end() {
                core::mem::swap(&mut prev_sid, &mut sid);
                break;
            }
            at += 1;

            sid = unsafe { next_unchecked!(prev_sid, at) };
            if dfa.is_special_state(sid) {
                break;
            }
            at += 1;

            prev_sid = unsafe { next_unchecked!(sid, at) };
            if dfa.is_special_state(prev_sid) {
                core::mem::swap(&mut prev_sid, &mut sid);
                break;
            }
            at += 1;

            sid = unsafe { next_unchecked!(prev_sid, at) };
            if dfa.is_special_state(sid) {
                break;
            }
            at += 1;
        }

The DFA search hot path uses get_unchecked on both the haystack and the transition table for speed, with a documented safety argument (lines 84-96). Two invariants are upheld: every state ID handed to next_state_unchecked is valid (guaranteed by the unsafe trait Automaton contract that a valid input state yields a valid output state, plus the from_bytes validation that all transition targets are valid), and at is always a valid haystack index (the while at < input.end() guards plus the at + 3 >= input.end() check before the four-step unrolled block keep at < end <= haystack.len()). The Automaton trait is declared unsafe (automaton.rs:108) precisely so that implementors must uphold "valid in implies valid out," which is what licenses the bounds-check elision here. next_state (the safe variant, dense.rs:3195) uses checked indexing. Justifies uses-unsafe, unsafe-safe, impl-algorithm, and algorithm-impl-safe.

src/lib.rs

src/lib.rs, line 1-30

/*!
This crate exposes a variety of regex engines used by the `regex` crate.
It provides a vast, sprawling and "expert" level API to each regex engine.
The regex engines provided by this crate focus heavily on finite automata
implementations and specifically guarantee worst case `O(m * n)` time
complexity for all searches. (Where `m ~ len(regex)` and `n ~ len(haystack)`.)

The primary goal of this crate is to serve as an implementation detail for the
`regex` crate. A secondary goal is to make its internals available for use by
others.

# Table of contents

* [Should I be using this crate?](#should-i-be-using-this-crate) gives some
reasons for and against using this crate.
* [Examples](#examples) provides a small selection of things you can do with
this crate.
* [Available regex engines](#available-regex-engines) provides a hyperlinked
list of all regex engines in this crate.
* [API themes](#api-themes) discusses common elements used throughout this
crate.
* [Crate features](#crate-features) documents the extensive list of Cargo
features available.

# Should I be using this crate?

If you find yourself here because you just want to use regexes, then you should
first check out whether the [`regex` crate](https://docs.rs/regex) meets
your needs. It provides a streamlined and difficult-to-misuse API for regex
searching.

The crate's documented design centers on finite-automata engines with a worst-case O(m * n) time bound (lib.rs:4-5), where m is the regex size and n the haystack length. The fully-compiled dense and sparse DFAs, the lazy/hybrid DFA, the PikeVM, and the one-pass DFA all run in linear time in the haystack. The bounded backtracker (nfa/thompson/backtrack.rs) is the one backtracking engine, and it tracks a visited set keyed by (state, offset) to avoid revisiting work, which is what keeps it within O(m * n) and prevents the catastrophic exponential blowup characteristic of unbounded backtracking regex engines. The backtracker is only usable on haystacks small enough for its visited capacity. This linear-time guarantee under adversarial patterns and inputs is the basis for algorithm-impl-bounds. The automata themselves (dense/sparse transition tables, NFA states, sparse sets) constitute the data structures the crate implements; justifies impl-datastructure, datastructure-impl-safe, and impl-algorithm.

src/util/pool.rs

src/util/pool.rs, line 374-431

    pub(super) struct Pool<T, F> {
        /// A function to create more T values when stack is empty and a caller
        /// has requested a T.
        create: F,
        /// Multiple stacks of T values to hand out. These are used when a Pool
        /// is accessed by a thread that didn't create it.
        ///
        /// Conceptually this is `Mutex<Vec<Box<T>>>`, but sharded out to make
        /// it scale better under high contention work-loads. We index into
        /// this sequence via `thread_id % stacks.len()`.
        stacks: Vec<CacheLine<Mutex<Vec<Box<T>>>>>,
        /// The ID of the thread that owns this pool. The owner is the thread
        /// that makes the first call to 'get'. When the owner calls 'get', it
        /// gets 'owner_val' directly instead of returning a T from 'stack'.
        /// See comments elsewhere for details, but this is intended to be an
        /// optimization for the common case that makes getting a T faster.
        ///
        /// It is initialized to a value of zero (an impossible thread ID) as a
        /// sentinel to indicate that it is unowned.
        owner: AtomicUsize,
        /// A value to return when the caller is in the same thread that
        /// first called `Pool::get`.
        ///
        /// This is set to None when a Pool is first created, and set to Some
        /// once the first thread calls Pool::get.
        owner_val: UnsafeCell<Option<T>>,
    }

    // SAFETY: Since we want to use a Pool from multiple threads simultaneously
    // behind an Arc, we need for it to be Sync. In cases where T is sync,
    // Pool<T> would be Sync. However, since we use a Pool to store mutable
    // scratch space, we wind up using a T that has interior mutability and is
    // thus itself not Sync. So what we *really* want is for our Pool<T> to by
    // Sync even when T is not Sync (but is at least Send).
    //
    // The only non-sync aspect of a Pool is its 'owner_val' field, which is
    // used to implement faster access to a pool value in the common case of
    // a pool being accessed in the same thread in which it was created. The
    // 'stack' field is also shared, but a Mutex<T> where T: Send is already
    // Sync. So we only need to worry about 'owner_val'.
    //
    // The key is to guarantee that 'owner_val' can only ever be accessed from
    // one thread. In our implementation below, we guarantee this by only
    // returning the 'owner_val' when the ID of the current thread matches the
    // ID of the thread that first called 'Pool::get'. Since this can only ever
    // be one thread, it follows that only one thread can access 'owner_val' at
    // any point in time. Thus, it is safe to declare that Pool<T> is Sync when
    // T is Send.
    //
    // If there is a way to achieve our performance goals using safe code, then
    // I would very much welcome a patch. As it stands, the implementation
    // below tries to balance safety with performance. The case where a Regex
    // is used from multiple threads simultaneously will suffer a bit since
    // getting a value out of the pool will require unlocking a mutex.
    //
    // We require `F: Send + Sync` because we call `F` at any point on demand,
    // potentially from multiple threads simultaneously.
    unsafe impl<T: Send, F: Send + Sync> Sync for Pool<T, F> {}

util/pool.rs implements a sharded object pool used to hand out reusable scratch state to concurrent searches. The std variant guards a Vec<Box<T>> per shard with std::sync::Mutex and adds an owner-thread fast path: the first caller's thread ID claims owner, and only that thread reads the UnsafeCell<Option<T>> field owner_val without locking. The hand-written unsafe impl Sync for Pool<T, F> (line 431) is justified by a comment: the only non-Sync field is owner_val, and access to it is gated by a CAS on the owner thread ID, so at most one thread ever touches it. Thread-ID reuse, which could let two threads believe they own the pool, is prevented by allocating IDs from a monotonically increasing AtomicUsize counter and panicking if it wraps to zero (lines 337-352). The companion util/lazy.rs implements a similar Lazy<T, F> with a documented unsafe impl Sync and a state machine guarding one-time initialization. Public types expose their thread-safety through Send/Sync bounds. Justifies uses-concurrency, impl-concurrency, concurrency-safe, concurrency-impl-safe, concurrency-impl-correct, concurrency-impl-documented, and concurrency-documented.

src/util/wire.rs

src/util/wire.rs, line 264-404

/// Safely converts a `&[u32]` to `&[StateID]` with zero cost.
#[cfg_attr(feature = "perf-inline", inline(always))]
pub(crate) fn u32s_to_state_ids(slice: &[u32]) -> &[StateID] {
    // SAFETY: This is safe because StateID is defined to have the same memory
    // representation as a u32 (it is repr(transparent)). While not every u32
    // is a "valid" StateID, callers are not permitted to rely on the validity
    // of StateIDs for memory safety. It can only lead to logical errors. (This
    // is why StateID::new_unchecked is safe.)
    unsafe {
        core::slice::from_raw_parts(
            slice.as_ptr().cast::<StateID>(),
            slice.len(),
        )
    }
}

/// Safely converts a `&mut [u32]` to `&mut [StateID]` with zero cost.
pub(crate) fn u32s_to_state_ids_mut(slice: &mut [u32]) -> &mut [StateID] {
    // SAFETY: This is safe because StateID is defined to have the same memory
    // representation as a u32 (it is repr(transparent)). While not every u32
    // is a "valid" StateID, callers are not permitted to rely on the validity
    // of StateIDs for memory safety. It can only lead to logical errors. (This
    // is why StateID::new_unchecked is safe.)
    unsafe {
        core::slice::from_raw_parts_mut(
            slice.as_mut_ptr().cast::<StateID>(),
            slice.len(),
        )
    }
}

/// Safely converts a `&[u32]` to `&[PatternID]` with zero cost.
#[cfg_attr(feature = "perf-inline", inline(always))]
pub(crate) fn u32s_to_pattern_ids(slice: &[u32]) -> &[PatternID] {
    // SAFETY: This is safe because PatternID is defined to have the same
    // memory representation as a u32 (it is repr(transparent)). While not
    // every u32 is a "valid" PatternID, callers are not permitted to rely
    // on the validity of PatternIDs for memory safety. It can only lead to
    // logical errors. (This is why PatternID::new_unchecked is safe.)
    unsafe {
        core::slice::from_raw_parts(
            slice.as_ptr().cast::<PatternID>(),
            slice.len(),
        )
    }
}

/// Checks that the given slice has an alignment that matches `T`.
///
/// This is useful for checking that a slice has an appropriate alignment
/// before casting it to a &[T]. Note though that alignment is not itself
/// sufficient to perform the cast for any `T`.
pub(crate) fn check_alignment<T>(
    slice: &[u8],
) -> Result<(), DeserializeError> {
    let alignment = core::mem::align_of::<T>();
    let address = slice.as_ptr().as_usize();
    if address % alignment == 0 {
        return Ok(());
    }
    Err(DeserializeError::alignment_mismatch(alignment, address))
}

/// Reads a possibly empty amount of padding, up to 7 bytes, from the beginning
/// of the given slice. All padding bytes must be NUL bytes.
///
/// This is useful because it can be theoretically necessary to pad the
/// beginning of a serialized object with NUL bytes to ensure that it starts
/// at a correctly aligned address. These padding bytes should come immediately
/// before the label.
///
/// This returns the number of bytes read from the given slice.
pub(crate) fn skip_initial_padding(slice: &[u8]) -> usize {
    let mut nread = 0;
    while nread < 7 && nread < slice.len() && slice[nread] == 0 {
        nread += 1;
    }
    nread
}

/// Allocate a byte buffer of the given size, along with some initial padding
/// such that `buf[padding..]` has the same alignment as `T`, where the
/// alignment of `T` must be at most `8`. In particular, callers should treat
/// the first N bytes (second return value) as padding bytes that must not be
/// overwritten. In all cases, the following identity holds:
///
/// ```ignore
/// let (buf, padding) = alloc_aligned_buffer::<StateID>(SIZE);
/// assert_eq!(SIZE, buf[padding..].len());
/// ```
///
/// In practice, padding is often zero.
///
/// The requirement for `8` as a maximum here is somewhat arbitrary. In
/// practice, we never need anything bigger in this crate, and so this function
/// does some sanity asserts under the assumption of a max alignment of `8`.
#[cfg(feature = "alloc")]
pub(crate) fn alloc_aligned_buffer<T>(size: usize) -> (Vec<u8>, usize) {
    // NOTE: This is a kludge because there's no easy way to allocate a Vec<u8>
    // with an alignment guaranteed to be greater than 1. We could create a
    // Vec<u32>, but this cannot be safely transmuted to a Vec<u8> without
    // concern, since reallocing or dropping the Vec<u8> is UB (different
    // alignment than the initial allocation). We could define a wrapper type
    // to manage this for us, but it seems like more machinery than it's worth.
    let buf = vec![0; size];
    let align = core::mem::align_of::<T>();
    let address = buf.as_ptr().as_usize();
    if address % align == 0 {
        return (buf, 0);
    }
    // Let's try this again. We have to create a totally new alloc with
    // the maximum amount of bytes we might need. We can't just extend our
    // pre-existing 'buf' because that might create a new alloc with a
    // different alignment.
    let extra = align - 1;
    let mut buf = vec![0; size + extra];
    let address = buf.as_ptr().as_usize();
    // The code below handles the case where 'address' is aligned to T, so if
    // we got lucky and 'address' is now aligned to T (when it previously
    // wasn't), then we're done.
    if address % align == 0 {
        buf.truncate(size);
        return (buf, 0);
    }
    let padding = ((address & !(align - 1)).checked_add(align).unwrap())
        .checked_sub(address)
        .unwrap();
    assert!(padding <= 7, "padding of {padding} is bigger than 7");
    assert!(
        padding <= extra,
        "padding of {padding} is bigger than extra {extra} bytes",
    );
    buf.truncate(size + padding);
    assert_eq!(size + padding, buf.len());
    assert_eq!(
        0,
        buf[padding..].as_ptr().as_usize() % align,
        "expected end of initial padding to be aligned to {align}",
    );
    (buf, padding)
}

The wire-format foundation centralizes the untrusted-input handling for all DFA deserialization. check_alignment::<T> (line 316) rejects misaligned slices before any cast; alloc_aligned_buffer (line 361) produces a correctly aligned owned buffer with bounded padding (asserts padding <= 7). read_label (line 414) bounds its NUL scan to 256 bytes so a corrupt input cannot cause an unbounded scan. The arithmetic helpers mul, add, and shl (lines 790-831) use checked operations and return DeserializeError on overflow, which is what prevents length computations on attacker-controlled sizes from wrapping. The zero-cost casts u32s_to_state_ids / u32s_to_pattern_ids (lines 266-309) are sound because StateID/PatternID are repr(transparent) over u32 and their validity is a logical property, not a memory-safety one (an out-of-range value can only cause a logic error, never UB, which is why the *_unchecked constructors are safe). Justifies unsafe-documented and unsafe-minimal.