Audit · syn@2.0.117

cargo : syn @ 2.0.117

PE Patrick Elsen signed 2026-05-28 published 2026-05-28

Claims

datastructure-impl-boundsdatastructure-impl-correctdatastructure-impl-safedatastructure-impl-testedhas-binarieshas-build-exechas-fuzz-testshas-install-exechas-integration-testshas-property-testshas-unit-testsimpl-algorithmimpl-concurrencyimpl-cryptoimpl-datastructureimpl-interpreterimpl-jitimpl-parserimpl-protocolis-benignparser-impl-safeparser-impl-testedunsafe-documentedunsafe-minimalunsafe-safeunsafe-testeduses-concurrencyuses-cryptouses-environmentuses-execuses-filesystemuses-interpreteruses-jituses-networkuses-unsafe

Summary

syn 2.0.117 is a #![no_std] recursive-descent parser for Rust source, used by most proc-macros. Its unsafe is concentrated in one heavily documented module, the Cursor over a flat TokenBuffer, whose offset encoding keeps pointer arithmetic in-bounds. One low-severity finding: unbounded parser recursion can stack-overflow at compile time on adversarially nested input.

Report

Subject

syn 2.0.117 parses a stream of Rust tokens into a syntax tree. It is the parser underneath nearly every Rust procedural macro: a downstream proc-macro feeds it a proc_macro::TokenStream and gets back typed AST nodes (Expr, Item, Type, DeriveInput, and so on), then walks or rewrites them and emits new tokens. The public surface centres on the Parse trait, the ParseStream/ParseBuffer cursor API, and the generated AST types under src/gen/. The crate is #![no_std] (with extern crate alloc), has no binaries, no build script, and the library declares no proc-macro = true of its own. Authored by David Tolnay; published under MIT OR Apache-2.0.

Methodology

Tools: openvet 0.6.0, ripgrep, diff, git, Read. I set ai-model = claude-opus-4-7. diff -rq contents vcs reported only Cargo.toml differing, which is expected cargo normalisation; vcs/.git is present and intact, so contents are byte-equivalent to the tagged source. The library is roughly 50K lines across ~55 files. I surveyed all of src/ for the dangerous-API set (network, filesystem, process, env, crypto, FFI, concurrency); every hit was in a doc comment, not live code. I read in full the files holding non-trivial unsafe: src/buffer.rs (the TokenBuffer/Cursor), src/thread.rs (ThreadBound), src/parse.rs (the Cursor lifetime transmutes and StepCursor variance proof), src/token.rs (the WithSpan deref cast), and src/drops.rs (NoDrop/TrivialDrop). I traced the recursive-descent paths in src/expr.rs and the group recursion in src/buffer.rs. I did not fetch the Rust reference grammar, so I make no spec-conformance assertion.

Results

Contents match VCS byte-for-byte apart from manifest normalisation. The crate performs no network, filesystem, process, environment, or cryptographic operations, so uses-network, uses-filesystem, uses-exec, uses-environment, and uses-crypto are all false, as are uses-jit and uses-interpreter. It uses no concurrency primitives in the library (uses-concurrency false; the only thread interaction is ThreadBound's thread-id check) and implements no JIT, interpreter, protocol, or concurrency primitives (impl-jit, impl-interpreter, impl-protocol, impl-concurrency, impl-crypto, impl-algorithm false). It is a parser (impl-parser) backed by a flat-array token buffer data structure (impl-datastructure). There are no binaries (has-binaries), no build-time or install-time execution (has-build-exec, has-install-exec); the library declares no proc-macro of its own. Inspection found no obfuscation, encoded payloads, telemetry, or suspicious endpoints, so is-benign.

Memory-unsafe code exists (uses-unsafe) and is concentrated almost entirely in the Cursor over TokenBuffer in src/buffer.rs. The buffer is a single Box<[Entry]> where each group is encoded inline followed by an End(start_offset, group_offset) sentinel carrying signed offsets back to the buffer start and the matching group header. All cursor pointer arithmetic (ptr.add, ptr.offset, ptr.sub) stays inside this one allocation because the offsets are computed from entries.len() at construction and a trailing sentinel bounds forward motion; Cursor::create skips interior End entries so a cursor never points past its scope. The module compiles under #![deny(unsafe_op_in_unsafe_fn)], which forces each raw operation into its own unsafe block. The three mem::transmute calls in src/parse.rs and src/discouraged.rs only re-stamp a Cursor's lifetime, relying on Cursor covariance and a StepCursor<'c, 'a> existence-proof that 'c outlives 'a; no bytes change. src/token.rs casts #[repr(transparent)] single-span token structs to the #[repr(transparent)] WithSpan, a layout-compatible reinterpretation gated to the length-1 macro arm. ThreadBound's unsafe impl Sync/Send are justified in-comment and hold (off-thread access returns None; Send requires Copy, excluding Drop and UnsafeCell). Each block's invariant is documented and holds, and unsafe is used only where the cursor design requires it, supporting unsafe-safe, unsafe-documented, and unsafe-minimal. The cursor and parser are exercised by an extensive integration suite under tests/ (test_parse_buffer.rs, test_expr.rs, a rustc-corpus round-trip, and more) plus two in-source unit tests in src/drops.rs and src/attr.rs, giving has-unit-tests, has-integration-tests, parser-impl-tested, datastructure-impl-tested, and, since test_parse_buffer.rs exercises the cursor directly and upstream CI runs the whole suite under miri, unsafe-tested. There are no fuzz or property tests in the published crate (has-fuzz-tests, has-property-tests false). The TokenBuffer maintains its offset invariants by construction (the back-pointers in each End are derived from entries.len() at build time), and cursor operations are pointer increments over a flat array with no adversarial-input degradation, supporting datastructure-impl-correct and datastructure-impl-bounds. I did not fetch the Rust reference grammar, so I make no assertion on parser-impl-correct.

One low-severity safety finding: the recursive-descent expression and type parsers recurse one stack frame per nesting level with no depth limit (unary_expr at src/expr.rs:1543, group recursion via recursive_new at src/buffer.rs:42), so deeply nested adversarial input can exhaust the stack and abort. This is a clean stack-overflow guard-page fault at the downstream macro's compile time, not undefined behaviour or an out-of-bounds access, which is why parser-impl-safe and datastructure-impl-safe are asserted on memory-safety grounds with the recursion behaviour recorded as the documented caveat.

Conclusion

syn 2.0.117 is a #![no_std] recursive-descent parser for Rust source, with no I/O, no build-time execution, and three small dependencies (proc-macro2, optional quote, unicode-ident). Its unsafe is concentrated in one heavily commented module, the Cursor over a flat TokenBuffer, where the End-sentinel offset encoding keeps every pointer step provably in-bounds, and is compiled under deny(unsafe_op_in_unsafe_fn). The lifetime transmutes and the WithSpan cast are layout- and variance-preserving. One low-severity characteristic was found: unbounded parser recursion can stack-overflow at compile time on adversarially nested input. No memory-safety, correctness, or malicious-code issues were observed.

Findings(1)

FINDING-1 safety low

Unbounded parser recursion can stack-overflow on deeply nested input

The recursive-descent expression and type parsers recurse one stack frame per nesting level with no depth limit. For example unary_expr in src/expr.rs:1543 calls itself directly when parsing a chain of prefix operators (&&&&...&x, ----...-x, ***...*x), and the grouping/parenthesis parsers recurse through recursive_new in src/buffer.rs:42 and the nested-group expression rules. Adversarial input with sufficient nesting depth (on the order of tens of thousands of tokens) exhausts the call stack and aborts the process.

This is compile-time behaviour: syn runs inside the proc-macro host (rustc or rust-analyzer) at the build time of a downstream macro, not at the runtime of a shipped binary. The impact is a build-time abort, not a memory-safety violation. The abort is a clean stack-overflow guard-page fault, not undefined behaviour; no out-of-bounds write occurs. There is no documented recursion limit in the public API.

parser-impl-safe is asserted on the basis of memory safety and absence of undefined behaviour. The stack-overflow-on-deep-nesting characteristic is the documented caveat referenced by that claim's "panic-free on all inputs (unless explicitly documented otherwise)" qualifier. Justifies parser-impl-safe.

Annotations(4)

`src/buffer.rs`

`src/buffer.rs`, line 99-383

pub struct Cursor<'a> {
    // The current entry which the `Cursor` is pointing at.
    ptr: *const Entry,
    // This is the only `Entry::End` object which this cursor is allowed to
    // point at. All other `End` objects are skipped over in `Cursor::create`.
    scope: *const Entry,
    // Cursor is covariant in 'a. This field ensures that our pointers are still
    // valid.
    marker: PhantomData<&'a Entry>,
}

impl<'a> Cursor<'a> {
    /// Creates a cursor referencing a static empty TokenStream.
    pub fn empty() -> Self {
        // It's safe in this situation for us to put an `Entry` object in global
        // storage, despite it not actually being safe to send across threads
        // (`Ident` is a reference into a thread-local table). This is because
        // this entry never includes a `Ident` object.
        //
        // This wrapper struct allows us to break the rules and put a `Sync`
        // object in global storage.
        struct UnsafeSyncEntry(Entry);
        unsafe impl Sync for UnsafeSyncEntry {}
        static EMPTY_ENTRY: UnsafeSyncEntry = UnsafeSyncEntry(Entry::End(0, 0));

        Cursor {
            ptr: &EMPTY_ENTRY.0,
            scope: &EMPTY_ENTRY.0,
            marker: PhantomData,
        }
    }

    /// This create method intelligently exits non-explicitly-entered
    /// `None`-delimited scopes when the cursor reaches the end of them,
    /// allowing for them to be treated transparently.
    unsafe fn create(mut ptr: *const Entry, scope: *const Entry) -> Self {
        // NOTE: If we're looking at a `End`, we want to advance the cursor
        // past it, unless `ptr == scope`, which means that we're at the edge of
        // our cursor's scope. We should only have `ptr != scope` at the exit
        // from None-delimited groups entered with `ignore_none`.
        while let Entry::End(..) = unsafe { &*ptr } {
            if ptr::eq(ptr, scope) {
                break;
            }
            ptr = unsafe { ptr.add(1) };
        }

        Cursor {
            ptr,
            scope,
            marker: PhantomData,
        }
    }

    /// Get the current entry.
    fn entry(self) -> &'a Entry {
        unsafe { &*self.ptr }
    }

    /// Bump the cursor to point at the next token after the current one. This
    /// is undefined behavior if the cursor is currently looking at an
    /// `Entry::End`.
    ///
    /// If the cursor is looking at an `Entry::Group`, the bumped cursor will
    /// point at the first token in the group (with the same scope end).
    unsafe fn bump_ignore_group(self) -> Cursor<'a> {
        unsafe { Cursor::create(self.ptr.add(1), self.scope) }
    }

    /// While the cursor is looking at a `None`-delimited group, move it to look
    /// at the first token inside instead. If the group is empty, this will move
    /// the cursor past the `None`-delimited group.
    ///
    /// WARNING: This mutates its argument.
    fn ignore_none(&mut self) {
        while let Entry::Group(group, _) = self.entry() {
            if group.delimiter() == Delimiter::None {
                unsafe { *self = self.bump_ignore_group() };
            } else {
                break;
            }
        }
    }

    /// Checks whether the cursor is currently pointing at the end of its valid
    /// scope.
    pub fn eof(self) -> bool {
        // We're at eof if we're at the end of our scope.
        ptr::eq(self.ptr, self.scope)
    }

    /// If the cursor is pointing at a `Ident`, returns it along with a cursor
    /// pointing at the next `TokenTree`.
    pub fn ident(mut self) -> Option<(Ident, Cursor<'a>)> {
        self.ignore_none();
        match self.entry() {
            Entry::Ident(ident) => Some((ident.clone(), unsafe { self.bump_ignore_group() })),
            _ => None,
        }
    }

    /// If the cursor is pointing at a `Punct`, returns it along with a cursor
    /// pointing at the next `TokenTree`.
    pub fn punct(mut self) -> Option<(Punct, Cursor<'a>)> {
        self.ignore_none();
        match self.entry() {
            Entry::Punct(punct) if punct.as_char() != '\'' => {
                Some((punct.clone(), unsafe { self.bump_ignore_group() }))
            }
            _ => None,
        }
    }

    /// If the cursor is pointing at a `Literal`, return it along with a cursor
    /// pointing at the next `TokenTree`.
    pub fn literal(mut self) -> Option<(Literal, Cursor<'a>)> {
        self.ignore_none();
        match self.entry() {
            Entry::Literal(literal) => Some((literal.clone(), unsafe { self.bump_ignore_group() })),
            _ => None,
        }
    }

    /// If the cursor is pointing at a `Lifetime`, returns it along with a
    /// cursor pointing at the next `TokenTree`.
    pub fn lifetime(mut self) -> Option<(Lifetime, Cursor<'a>)> {
        self.ignore_none();
        match self.entry() {
            Entry::Punct(punct) if punct.as_char() == '\'' && punct.spacing() == Spacing::Joint => {
                let next = unsafe { self.bump_ignore_group() };
                let (ident, rest) = next.ident()?;
                let lifetime = Lifetime {
                    apostrophe: punct.span(),
                    ident,
                };
                Some((lifetime, rest))
            }
            _ => None,
        }
    }

    /// If the cursor is pointing at a `Group` with the given delimiter, returns
    /// a cursor into that group and one pointing to the next `TokenTree`.
    pub fn group(mut self, delim: Delimiter) -> Option<(Cursor<'a>, DelimSpan, Cursor<'a>)> {
        // If we're not trying to enter a none-delimited group, we want to
        // ignore them. We have to make sure to _not_ ignore them when we want
        // to enter them, of course. For obvious reasons.
        if delim != Delimiter::None {
            self.ignore_none();
        }

        if let Entry::Group(group, end_offset) = self.entry() {
            if group.delimiter() == delim {
                let span = group.delim_span();
                let end_of_group = unsafe { self.ptr.add(*end_offset) };
                let inside_of_group = unsafe { Cursor::create(self.ptr.add(1), end_of_group) };
                let after_group = unsafe { Cursor::create(end_of_group, self.scope) };
                return Some((inside_of_group, span, after_group));
            }
        }

        None
    }

    /// If the cursor is pointing at a `Group`, returns a cursor into the group
    /// and one pointing to the next `TokenTree`.
    pub fn any_group(self) -> Option<(Cursor<'a>, Delimiter, DelimSpan, Cursor<'a>)> {
        if let Entry::Group(group, end_offset) = self.entry() {
            let delimiter = group.delimiter();
            let span = group.delim_span();
            let end_of_group = unsafe { self.ptr.add(*end_offset) };
            let inside_of_group = unsafe { Cursor::create(self.ptr.add(1), end_of_group) };
            let after_group = unsafe { Cursor::create(end_of_group, self.scope) };
            return Some((inside_of_group, delimiter, span, after_group));
        }

        None
    }

    pub(crate) fn any_group_token(self) -> Option<(Group, Cursor<'a>)> {
        if let Entry::Group(group, end_offset) = self.entry() {
            let end_of_group = unsafe { self.ptr.add(*end_offset) };
            let after_group = unsafe { Cursor::create(end_of_group, self.scope) };
            return Some((group.clone(), after_group));
        }

        None
    }

    /// Copies all remaining tokens visible from this cursor into a
    /// `TokenStream`.
    pub fn token_stream(self) -> TokenStream {
        let mut tokens = TokenStream::new();
        let mut cursor = self;
        while let Some((tt, rest)) = cursor.token_tree() {
            tokens.append(tt);
            cursor = rest;
        }
        tokens
    }

    /// If the cursor is pointing at a `TokenTree`, returns it along with a
    /// cursor pointing at the next `TokenTree`.
    ///
    /// Returns `None` if the cursor has reached the end of its stream.
    ///
    /// This method does not treat `None`-delimited groups as transparent, and
    /// will return a `Group(None, ..)` if the cursor is looking at one.
    pub fn token_tree(self) -> Option<(TokenTree, Cursor<'a>)> {
        let (tree, len) = match self.entry() {
            Entry::Group(group, end_offset) => (group.clone().into(), *end_offset),
            Entry::Literal(literal) => (literal.clone().into(), 1),
            Entry::Ident(ident) => (ident.clone().into(), 1),
            Entry::Punct(punct) => (punct.clone().into(), 1),
            Entry::End(..) => return None,
        };

        let rest = unsafe { Cursor::create(self.ptr.add(len), self.scope) };
        Some((tree, rest))
    }

    /// Returns the `Span` of the current token, or `Span::call_site()` if this
    /// cursor points to eof.
    pub fn span(mut self) -> Span {
        match self.entry() {
            Entry::Group(group, _) => group.span(),
            Entry::Literal(literal) => literal.span(),
            Entry::Ident(ident) => ident.span(),
            Entry::Punct(punct) => punct.span(),
            Entry::End(_, offset) => {
                self.ptr = unsafe { self.ptr.offset(*offset) };
                if let Entry::Group(group, _) = self.entry() {
                    group.span_close()
                } else {
                    Span::call_site()
                }
            }
        }
    }

    /// Returns the `Span` of the token immediately prior to the position of
    /// this cursor, or of the current token if there is no previous one.
    #[cfg(any(feature = "full", feature = "derive"))]
    pub(crate) fn prev_span(mut self) -> Span {
        if start_of_buffer(self) < self.ptr {
            self.ptr = unsafe { self.ptr.sub(1) };
        }
        self.span()
    }

    /// Skip over the next token that is not a None-delimited group, without
    /// cloning it. Returns `None` if this cursor points to eof.
    ///
    /// This method treats `'lifetimes` as a single token.
    pub(crate) fn skip(mut self) -> Option<Cursor<'a>> {
        self.ignore_none();

        let len = match self.entry() {
            Entry::End(..) => return None,

            // Treat lifetimes as a single tt for the purposes of 'skip'.
            Entry::Punct(punct) if punct.as_char() == '\'' && punct.spacing() == Spacing::Joint => {
                match unsafe { &*self.ptr.add(1) } {
                    Entry::Ident(_) => 2,
                    _ => 1,
                }
            }

            Entry::Group(_, end_offset) => *end_offset,
            _ => 1,
        };

        Some(unsafe { Cursor::create(self.ptr.add(len), self.scope) })
    }

    pub(crate) fn scope_delimiter(self) -> Delimiter {
        match unsafe { &*self.scope } {
            Entry::End(_, offset) => match unsafe { &*self.scope.offset(*offset) } {
                Entry::Group(group, _) => group.delimiter(),
                _ => Delimiter::None,
            },
            _ => unreachable!(),
        }
    }
}

The Cursor over TokenBuffer is where almost all of syn's unsafe lives. The buffer is a flat Box<[Entry]>; groups are encoded inline as a Group entry followed by their contents and a terminating End(start_offset, group_offset) whose offsets point back to the buffer start and the matching group header (src/buffer.rs:42-62). Every pointer step (ptr.add, ptr.offset, ptr.sub) stays within this allocation because the offsets are computed from entries.len() at construction and the final sentinel End (pushed at src/buffer.rs:77) bounds forward traversal. Cursor::create (src/buffer.rs:134-151) skips interior End entries until reaching scope, so a Cursor never points past its scope. scope is always an End entry, which start_of_buffer and scope_delimiter rely on via unreachable!. The module is compiled under #![deny(unsafe_op_in_unsafe_fn)], so each raw dereference carries its own unsafe block. The unsafe impl Sync on UnsafeSyncEntry (src/buffer.rs:121) guards only a static End(0,0) sentinel that holds no Ident, so no thread-local table reference crosses threads. Justifies uses-unsafe, unsafe-safe, unsafe-documented, unsafe-minimal, impl-datastructure, datastructure-impl-safe.

`src/parse.rs`

`src/parse.rs`, line 377-398

pub(crate) fn advance_step_cursor<'c, 'a>(proof: StepCursor<'c, 'a>, to: Cursor<'c>) -> Cursor<'a> {
    // Refer to the comments within the StepCursor definition. We use the
    // fact that a StepCursor<'c, 'a> exists as proof that 'c outlives 'a.
    // Cursor is covariant in its lifetime parameter so we can cast a
    // Cursor<'c> to one with the shorter lifetime Cursor<'a>.
    let _ = proof;
    unsafe { mem::transmute::<Cursor<'c>, Cursor<'a>>(to) }
}

pub(crate) fn new_parse_buffer(
    scope: Span,
    cursor: Cursor,
    unexpected: Rc<Cell<Unexpected>>,
) -> ParseBuffer {
    ParseBuffer {
        scope,
        // See comment on `cell` in the struct definition.
        cell: Cell::new(unsafe { mem::transmute::<Cursor, Cursor<'static>>(cursor) }),
        marker: PhantomData,
        unexpected: Cell::new(Some(unexpected)),
    }
}

The two mem::transmute calls here only change a Cursor's lifetime parameter, never its bytes. advance_step_cursor (src/parse.rs:377-384) uses the existence of a StepCursor<'c, 'a> as a type-level proof that 'c outlives 'a (see the variance construction at src/parse.rs:336-348), then shortens Cursor<'c> to Cursor<'a>, which is sound because Cursor is covariant in its lifetime. new_parse_buffer (src/parse.rs:394) stores a Cursor as Cursor<'static> inside a Cell; the surrounding ParseBuffer carries a PhantomData tying the real lifetime back, and the cursor is only ever read back at the original lifetime. Neither transmute changes layout or extends a borrow beyond the backing TokenBuffer. Justifies uses-unsafe, unsafe-safe, unsafe-documented, parser-impl-safe.

`src/thread.rs`

`src/thread.rs`, line 12-54

unsafe impl<T> Sync for ThreadBound<T> {}

// Send bound requires Copy, as otherwise Drop could run in the wrong place.
//
// Today Copy and Drop are mutually exclusive so `T: Copy` implies `T: !Drop`.
// This impl needs to be revisited if that restriction is relaxed in the future.
unsafe impl<T: Copy> Send for ThreadBound<T> {}

impl<T> ThreadBound<T> {
    pub(crate) fn new(value: T) -> Self {
        ThreadBound {
            value,
            thread_id: thread::current().id(),
        }
    }

    pub(crate) fn get(&self) -> Option<&T> {
        if thread::current().id() == self.thread_id {
            Some(&self.value)
        } else {
            None
        }
    }
}

impl<T: Debug> Debug for ThreadBound<T> {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        match self.get() {
            Some(value) => Debug::fmt(value, formatter),
            None => formatter.write_str("unknown"),
        }
    }
}

// Copy the bytes of T, even if the currently running thread is the "wrong"
// thread. This is fine as long as the original thread is not simultaneously
// mutating this value via interior mutability, which would be a data race.
//
// Currently `T: Copy` is sufficient to guarantee that T contains no interior
// mutability, because _all_ interior mutability in Rust is built on
// core::cell::UnsafeCell, which has no Copy impl. This impl needs to be
// revisited if that restriction is relaxed in the future.
impl<T: Copy> Copy for ThreadBound<T> {}

ThreadBound<T> carries a value plus the ThreadId of the constructing thread and hands out &T only when accessed from that same thread (src/thread.rs:28-34). The unconditional unsafe impl Sync is sound because get returns None off-thread, so &self.value is never observable from another thread. The unsafe impl<T: Copy> Send is justified in-comment: Copy today implies the type has no Drop impl, so moving the bytes to another thread cannot run a Drop impl in the wrong place, and Copy excludes UnsafeCell-based interior mutability, so no data race on the original thread. Both comments flag the future-Rust assumptions they rely on. Justifies uses-unsafe, unsafe-safe, unsafe-documented.

`src/token.rs`

`src/token.rs`, line 303-321

macro_rules! impl_deref_if_len_is_1 {
    ($name:ident/1) => {
        impl Deref for $name {
            type Target = WithSpan;

            fn deref(&self) -> &Self::Target {
                unsafe { &*(self as *const Self).cast::<WithSpan>() }
            }
        }

        impl DerefMut for $name {
            fn deref_mut(&mut self) -> &mut Self::Target {
                unsafe { &mut *(self as *mut Self).cast::<WithSpan>() }
            }
        }
    };

    ($name:ident/$len:literal) => {};
}

impl_deref_if_len_is_1 lets a single-span token struct deref to WithSpan so callers can write token.span. The cast (self as *const Self).cast::<WithSpan>() (src/token.rs:309,315) is sound because the affected token structs are #[repr(transparent)] wrappers over a single Span (the define_punctuation_structs and keyword macros generate them with one Span/[Span; 1] field) and WithSpan is #[repr(transparent)] over Span (src/token.rs:145-149); the two layouts therefore coincide. The macro arm only emits the impl for the /1 length case, so multi-span tokens never reach this cast. Justifies uses-unsafe, unsafe-safe, unsafe-documented.