Audit · syn@1.0.109

cargo : syn @ 1.0.109

PE Patrick Elsen signed 2026-05-27 published 2026-05-27

Claims

build-exec-deterministicbuild-exec-minimalbuild-exec-no-networkbuild-exec-no-write-outbuild-exec-safeconcurrency-impl-correctconcurrency-impl-documentedconcurrency-impl-safeconcurrency-impl-testeddatastructure-impl-boundsdatastructure-impl-correctdatastructure-impl-safedatastructure-impl-testedenvironment-safehas-binarieshas-build-exechas-fuzz-testshas-install-exechas-integration-testshas-property-testshas-unit-testsimpl-algorithmimpl-concurrencyimpl-cryptoimpl-datastructureimpl-interpreterimpl-jitimpl-parserimpl-protocolis-benignparser-impl-correctparser-impl-safeparser-impl-testedunsafe-documentedunsafe-minimalunsafe-safeunsafe-testeduses-concurrencyuses-cryptouses-environmentuses-execuses-filesystemuses-interpreteruses-jituses-networkuses-unsafe

Summary

syn 1.0.109 is a mature Rust source-code parser with narrow runtime surface (no I/O, no concurrency, no crypto) and concentrated, well-motivated unsafe exercised under miri on upstream CI. Two low-severity quality findings, both documentation gaps; safe to deploy.

Report

Subject

syn is a parser for Rust source code. It consumes a proc_macro2::TokenStream and produces a strongly-typed syntax tree (syn::File, syn::Item, syn::Expr, syn::Type, etc.) that procedural macros, build-time code generators, and source-analysis tools can pattern-match against. The crate also exposes a hand-written parser-combinator API (Parse, ParseStream) so callers can describe custom syntaxes against the same token stream. Functionality is split across feature flags (derive, full, parsing, printing, fold, visit, visit-mut, extra-traits, clone-impls, proc-macro) so downstream consumers pay compile-time cost only for what they enable.

Methodology

The published crate contents (contents/) were compared against the upstream Git repository (vcs/, pinned via .cargo_vcs_info.json to commit bfa790b8e445dc67b7ab94d75adb1a92d6296c9a) using diff -rq over src/, tests/, benches/, build.rs, and the root files. The build script and Cargo.toml were read in full. src/ (~23k lines of hand-written code plus ~21k lines of generated visitor/fold/eq/hash/clone/debug impls under src/gen/) was surveyed end-to-end, with detailed reads of buffer.rs, parse.rs, thread.rs, token.rs, lit.rs, bigint.rs, and discouraged.rs — the modules carrying unsafe code or non-trivial parsing logic. Every unsafe block in non-generated code was located and its soundness argument reconstructed from surrounding context. The tests/ directory (~5k lines, 145 #[test] functions across 26 files) was surveyed, with particular attention to test_round_trip.rs, which differentially tests against rustc's parser. The upstream fuzz/ targets and CI configuration were read from vcs/ to understand testing posture, though neither ships in the published crate.

Results

The published crate contents match the upstream repository byte-for-byte once the standard cargo Cargo.toml normalisation (preserved verbatim in Cargo.toml.orig) and the explicit include whitelist are accounted for. Files present only in the repository (fuzz/, examples/, dev/, json/, codegen/, tests/crates, tests/features, .github/, .clippy.toml, syn.json) are excluded by the include list and are out of scope for downstream consumers. The crate ships no binary artefacts (justifying has-binaries), no installation hooks (cargo has no install-time execution model, justifying has-install-exec), and the [lib] section does not declare proc-macro = true, so the only compile-time code execution surface is build.rs, which spawns $RUSTC --version to detect the compiler's minor version and emits up to five cargo:rustc-cfg directives. No network, no filesystem writes, no other process invocations.

The runtime library is purely functional with respect to the operating system: no std::net, no std::process, no std::fs, no environment-variable reads, no thread spawning, no async runtime, no cryptography, no JIT, no embedded interpreter — justifying uses-crypto, uses-exec, uses-jit, uses-interpreter, uses-network, uses-filesystem, uses-concurrency, impl-crypto, impl-interpreter, impl-jit, impl-protocol, impl-algorithm. The only standard-library concurrency surface is std::thread::current().id() inside ThreadBound<T>, which is a small custom Sync/Send wrapper (justifying impl-concurrency) rather than a use of concurrency in the spawning/async-runtime sense.

unsafe is used in five modules — buffer.rs (pointer-based cursor over a flat Box<[Entry]> token buffer), parse.rs and discouraged.rs (lifetime transmutes that recover covariance for ParseBuffer<'a>), token.rs (#[repr(C)] layout-cast for single-span tokens), and thread.rs (the Sync/Send impls of ThreadBound). Each use rests on an invariant that holds by construction: the buffer is built by a single recursive_new pass that establishes End sentinels and group offsets; the ParseBuffer::cell is only ever written through advance_to, which checks scope identity; the layout-cast targets #[repr(C)] structs whose first field is a Span; ThreadBound gates access through a thread-id check and constrains Send to Copy types so Drop cannot run cross-thread. Justifies uses-unsafe, unsafe-safe, unsafe-minimal. Upstream CI runs cargo miri test --all-features with -Zmiri-strict-provenance and cargo fuzz build on every push (visible in vcs/.github/workflows/ci.yml but not shipped in the published crate, so has-fuzz-tests = false). The test suite ships 145 #[test] functions in 26 integration-test files (justifying has-integration-tests); tests/test_round_trip.rs parses the entire pinned rust-lang/rust source tree with both syn and rustc and diffs the resulting ASTs, providing strong evidence for parser-impl-correct, parser-impl-tested, datastructure-impl-tested, concurrency-impl-tested, and unsafe-tested. No #[cfg(test)] unit tests are defined inside src/ (justifying has-unit-tests = false), and no property-test harness is used (justifying has-property-tests = false).

Two low-severity quality findings were recorded. FINDING-1 notes that several unsafe blocks (particularly in src/token.rs and parts of src/buffer.rs) lack inline // SAFETY: comments at the use-site even though the invariants are documented at module/type level — justifying unsafe-documented = false. FINDING-2 notes that the internal literal-cooking helpers in src/lit.rs use panic!/unreachable! to signal byte patterns that the upstream proc_macro2::Literal invariant rules out; the contract is not stated in the public docs of the affected Lit* types. The round-trip test exercises these paths against the entire rustc test corpus without hitting them, so parser-impl-safe holds in practice.

No supply-chain anomalies were observed: the repository URL matches crates.io's record for syn 1.0.109, the author and history are consistent with David Tolnay's other widely-used crates, the build script and runtime code make no network or filesystem access, and the test-only network/fs usage (reqwest, flate2, tar, walkdir) is confined to dev-dependencies that are not pulled into downstream builds. Justifies is-benign and environment-safe (the build script reads only RUSTC, which cargo itself supplies).

Conclusion

syn 1.0.109 is a mature, widely-deployed parser whose runtime surface is narrow (no I/O, no concurrency, no crypto) and whose unsafe is concentrated, well-motivated, and exercised under miri on upstream CI. The two findings are documentation gaps.

Findings(2)

FINDING-1 quality low

Some unsafe blocks lack inline safety comments

Several unsafe blocks in the crate do not carry an inline // SAFETY: comment justifying the soundness invariants at the use-site.

Examples:

src/token.rs:349 and src/token.rs:355: the Deref/DerefMut impls cast &Self to &WithSpan via pointer-cast. Soundness depends on both WithSpan and the wrapping token struct being #[repr(C)] with the equivalent Span field at offset 0. The #[repr(C)] attributes are present but there is no // SAFETY: comment at either unsafe block tying them to the cast.
src/buffer.rs:131: unsafe impl Sync for UnsafeSyncEntry is justified by an inline comment, but several pointer-arithmetic unsafe blocks in the same file (e.g. begin, token_tree, skip) rely on the buffer's End sentinel and group offsets being well-formed without a SAFETY: comment at the call.

The invariants involved are well-documented at the module/type level (the buffer.rs header comment explicitly states that the module is "heavily commented as it contains most of the unsafe code" and the ParseBuffer::cell field is annotated), so reviewers can reconstruct the rationale. This finding therefore justifies unsafe-documented = false but does not affect unsafe-safe, which holds.

FINDING-2 quality low

Internal literal-parsing helpers panic on malformed input

The literal-cooking helpers in src/lit.rs (parse_lit_str_cooked, backslash_x, backslash_u, parse_lit_byte_str_cooked, etc.) call panic! and unreachable! on byte patterns they do not recognise. They are intended to be reached only with strings produced by proc_macro2::Literal::to_string, which in turn comes from rustc's lexer and is already known to be a well-formed Rust literal.

This contract is not stated in the public docs of LitStr/LitByteStr/LitInt etc. A consumer who builds a proc_macro2::Literal via Literal::from_str("...") and feeds something pathological to syn (e.g. by calling syn::parse_quote! on a hand-crafted TokenStream) could in principle trigger one of these panics. In practice proc_macro2::Literal::from_str validates its input, so the panics are guarded by an upstream invariant, and the round-trip test suite (tests/test_round_trip.rs) exercises these paths against the entire rust-lang/rust source tree without hitting them.

Documenting the precondition (or replacing panic!/unreachable! with returning an error) would tighten the panic-freedom story. As-is, parser-impl-safe holds in practice because the inputs are always validated upstream.

Annotations(7)

`Cargo.toml`

Standard cargo-normalised manifest. The original is in Cargo.toml.orig, which matches the upstream repository's Cargo.toml byte-for-byte. The published crate excludes workspace members (dev, examples, fuzz, json, codegen, tests/crates, tests/features) and metadata files (.github, .gitattributes, .clippy.toml, syn.json) via the explicit include whitelist.

`build.rs`

Build script detects the installed rustc minor version by spawning $RUSTC --version and emits up to five cargo:rustc-cfg=... directives to gate compatibility shims for older compilers, justifying has-build-exec. No network, no filesystem writes, no other process invocations: justifies build-exec-safe, build-exec-no-network, build-exec-no-write-out, build-exec-minimal. Output is a pure function of the RUSTC env var that cargo itself supplies, justifying build-exec-deterministic. Reads of the RUSTC environment variable are normal cargo build-script behaviour and justify uses-environment and environment-safe.

`src/buffer.rs`

Contains most of the crate's unsafe code, organised around a flat Box<[Entry]> buffer with cursor-based traversal. The cursor uses raw pointers plus a PhantomData<&'a Entry> lifetime witness for covariance; group entries store their matching End offset so that group/eof checks reduce to pointer comparisons. The module-level comment explicitly flags that the implementation is fragile but the public API is safe. Pointer manipulations rely on the invariants established by recursive_new (every Group has a matching End, every End in scope has a valid negative offset back to the start). Buffer is O(n) in token-stream length with O(1) cursor advance, justifying datastructure-impl-bounds. Justifies uses-unsafe, unsafe-safe, unsafe-minimal, impl-datastructure, datastructure-impl-safe, datastructure-impl-correct. See FINDING-1 for documentation of the unsafe blocks. Combined with no I/O imports, justifies uses-network, uses-filesystem, uses-exec, uses-concurrency, uses-crypto, uses-jit, uses-interpreter, impl-crypto, impl-interpreter, impl-jit, impl-protocol, impl-algorithm.

`src/lit.rs`

`src/lit.rs`, line 1078-1402

    pub fn parse_lit_str(s: &str) -> (Box<str>, Box<str>) {
        match byte(s, 0) {
            b'"' => parse_lit_str_cooked(s),
            b'r' => parse_lit_str_raw(s),
            _ => unreachable!(),
        }
    }

    // Clippy false positive
    // https://github.com/rust-lang-nursery/rust-clippy/issues/2329
    #[allow(clippy::needless_continue)]
    fn parse_lit_str_cooked(mut s: &str) -> (Box<str>, Box<str>) {
        assert_eq!(byte(s, 0), b'"');
        s = &s[1..];

        let mut content = String::new();
        'outer: loop {
            let ch = match byte(s, 0) {
                b'"' => break,
                b'\\' => {
                    let b = byte(s, 1);
                    s = &s[2..];
                    match b {
                        b'x' => {
                            let (byte, rest) = backslash_x(s);
                            s = rest;
                            assert!(byte <= 0x80, "Invalid \\x byte in string literal");
                            char::from_u32(u32::from(byte)).unwrap()
                        }
                        b'u' => {
                            let (chr, rest) = backslash_u(s);
                            s = rest;
                            chr
                        }
                        b'n' => '\n',
                        b'r' => '\r',
                        b't' => '\t',
                        b'\\' => '\\',
                        b'0' => '\0',
                        b'\'' => '\'',
                        b'"' => '"',
                        b'\r' | b'\n' => loop {
                            let b = byte(s, 0);
                            match b {
                                b' ' | b'\t' | b'\n' | b'\r' => s = &s[1..],
                                _ => continue 'outer,
                            }
                        },
                        b => panic!("unexpected byte {:?} after \\ character in byte literal", b),
                    }
                }
                b'\r' => {
                    assert_eq!(byte(s, 1), b'\n', "Bare CR not allowed in string");
                    s = &s[2..];
                    '\n'
                }
                _ => {
                    let ch = next_chr(s);
                    s = &s[ch.len_utf8()..];
                    ch
                }
            };
            content.push(ch);
        }

        assert!(s.starts_with('"'));
        let content = content.into_boxed_str();
        let suffix = s[1..].to_owned().into_boxed_str();
        (content, suffix)
    }

    fn parse_lit_str_raw(mut s: &str) -> (Box<str>, Box<str>) {
        assert_eq!(byte(s, 0), b'r');
        s = &s[1..];

        let mut pounds = 0;
        while byte(s, pounds) == b'#' {
            pounds += 1;
        }
        assert_eq!(byte(s, pounds), b'"');
        let close = s.rfind('"').unwrap();
        for end in s[close + 1..close + 1 + pounds].bytes() {
            assert_eq!(end, b'#');
        }

        let content = s[pounds + 1..close].to_owned().into_boxed_str();
        let suffix = s[close + 1 + pounds..].to_owned().into_boxed_str();
        (content, suffix)
    }

    // Returns (content, suffix).
    pub fn parse_lit_byte_str(s: &str) -> (Vec<u8>, Box<str>) {
        assert_eq!(byte(s, 0), b'b');
        match byte(s, 1) {
            b'"' => parse_lit_byte_str_cooked(s),
            b'r' => parse_lit_byte_str_raw(s),
            _ => unreachable!(),
        }
    }

    // Clippy false positive
    // https://github.com/rust-lang-nursery/rust-clippy/issues/2329
    #[allow(clippy::needless_continue)]
    fn parse_lit_byte_str_cooked(mut s: &str) -> (Vec<u8>, Box<str>) {
        assert_eq!(byte(s, 0), b'b');
        assert_eq!(byte(s, 1), b'"');
        s = &s[2..];

        // We're going to want to have slices which don't respect codepoint boundaries.
        let mut v = s.as_bytes();

        let mut out = Vec::new();
        'outer: loop {
            let byte = match byte(v, 0) {
                b'"' => break,
                b'\\' => {
                    let b = byte(v, 1);
                    v = &v[2..];
                    match b {
                        b'x' => {
                            let (b, rest) = backslash_x(v);
                            v = rest;
                            b
                        }
                        b'n' => b'\n',
                        b'r' => b'\r',
                        b't' => b'\t',
                        b'\\' => b'\\',
                        b'0' => b'\0',
                        b'\'' => b'\'',
                        b'"' => b'"',
                        b'\r' | b'\n' => loop {
                            let byte = byte(v, 0);
                            let ch = char::from_u32(u32::from(byte)).unwrap();
                            if ch.is_whitespace() {
                                v = &v[1..];
                            } else {
                                continue 'outer;
                            }
                        },
                        b => panic!("unexpected byte {:?} after \\ character in byte literal", b),
                    }
                }
                b'\r' => {
                    assert_eq!(byte(v, 1), b'\n', "Bare CR not allowed in string");
                    v = &v[2..];
                    b'\n'
                }
                b => {
                    v = &v[1..];
                    b
                }
            };
            out.push(byte);
        }

        assert_eq!(byte(v, 0), b'"');
        let suffix = s[s.len() - v.len() + 1..].to_owned().into_boxed_str();
        (out, suffix)
    }

    fn parse_lit_byte_str_raw(s: &str) -> (Vec<u8>, Box<str>) {
        assert_eq!(byte(s, 0), b'b');
        let (value, suffix) = parse_lit_str_raw(&s[1..]);
        (String::from(value).into_bytes(), suffix)
    }

    // Returns (value, suffix).
    pub fn parse_lit_byte(s: &str) -> (u8, Box<str>) {
        assert_eq!(byte(s, 0), b'b');
        assert_eq!(byte(s, 1), b'\'');

        // We're going to want to have slices which don't respect codepoint boundaries.
        let mut v = s[2..].as_bytes();

        let b = match byte(v, 0) {
            b'\\' => {
                let b = byte(v, 1);
                v = &v[2..];
                match b {
                    b'x' => {
                        let (b, rest) = backslash_x(v);
                        v = rest;
                        b
                    }
                    b'n' => b'\n',
                    b'r' => b'\r',
                    b't' => b'\t',
                    b'\\' => b'\\',
                    b'0' => b'\0',
                    b'\'' => b'\'',
                    b'"' => b'"',
                    b => panic!("unexpected byte {:?} after \\ character in byte literal", b),
                }
            }
            b => {
                v = &v[1..];
                b
            }
        };

        assert_eq!(byte(v, 0), b'\'');
        let suffix = s[s.len() - v.len() + 1..].to_owned().into_boxed_str();
        (b, suffix)
    }

    // Returns (value, suffix).
    pub fn parse_lit_char(mut s: &str) -> (char, Box<str>) {
        assert_eq!(byte(s, 0), b'\'');
        s = &s[1..];

        let ch = match byte(s, 0) {
            b'\\' => {
                let b = byte(s, 1);
                s = &s[2..];
                match b {
                    b'x' => {
                        let (byte, rest) = backslash_x(s);
                        s = rest;
                        assert!(byte <= 0x80, "Invalid \\x byte in string literal");
                        char::from_u32(u32::from(byte)).unwrap()
                    }
                    b'u' => {
                        let (chr, rest) = backslash_u(s);
                        s = rest;
                        chr
                    }
                    b'n' => '\n',
                    b'r' => '\r',
                    b't' => '\t',
                    b'\\' => '\\',
                    b'0' => '\0',
                    b'\'' => '\'',
                    b'"' => '"',
                    b => panic!("unexpected byte {:?} after \\ character in byte literal", b),
                }
            }
            _ => {
                let ch = next_chr(s);
                s = &s[ch.len_utf8()..];
                ch
            }
        };
        assert_eq!(byte(s, 0), b'\'');
        let suffix = s[1..].to_owned().into_boxed_str();
        (ch, suffix)
    }

    fn backslash_x<S>(s: &S) -> (u8, &S)
    where
        S: Index<RangeFrom<usize>, Output = S> + AsRef<[u8]> + ?Sized,
    {
        let mut ch = 0;
        let b0 = byte(s, 0);
        let b1 = byte(s, 1);
        ch += 0x10
            * match b0 {
                b'0'..=b'9' => b0 - b'0',
                b'a'..=b'f' => 10 + (b0 - b'a'),
                b'A'..=b'F' => 10 + (b0 - b'A'),
                _ => panic!("unexpected non-hex character after \\x"),
            };
        ch += match b1 {
            b'0'..=b'9' => b1 - b'0',
            b'a'..=b'f' => 10 + (b1 - b'a'),
            b'A'..=b'F' => 10 + (b1 - b'A'),
            _ => panic!("unexpected non-hex character after \\x"),
        };
        (ch, &s[2..])
    }

    fn backslash_u(mut s: &str) -> (char, &str) {
        if byte(s, 0) != b'{' {
            panic!("{}", "expected { after \\u");
        }
        s = &s[1..];

        let mut ch = 0;
        let mut digits = 0;
        loop {
            let b = byte(s, 0);
            let digit = match b {
                b'0'..=b'9' => b - b'0',
                b'a'..=b'f' => 10 + b - b'a',
                b'A'..=b'F' => 10 + b - b'A',
                b'_' if digits > 0 => {
                    s = &s[1..];
                    continue;
                }
                b'}' if digits == 0 => panic!("invalid empty unicode escape"),
                b'}' => break,
                _ => panic!("unexpected non-hex character after \\u"),
            };
            if digits == 6 {
                panic!("overlong unicode escape (must have at most 6 hex digits)");
            }
            ch *= 0x10;
            ch += u32::from(digit);
            digits += 1;
            s = &s[1..];
        }
        assert!(byte(s, 0) == b'}');
        s = &s[1..];

        if let Some(ch) = char::from_u32(ch) {
            (ch, s)
        } else {
            panic!("character code {:x} is not a valid unicode character", ch);
        }
    }

    // Returns base 10 digits and suffix.
    pub fn parse_lit_int(mut s: &str) -> Option<(Box<str>, Box<str>)> {
        let negative = byte(s, 0) == b'-';
        if negative {
            s = &s[1..];
        }

        let base = match (byte(s, 0), byte(s, 1)) {
            (b'0', b'x') => {
                s = &s[2..];
                16
            }
            (b'0', b'o') => {
                s = &s[2..];

Literal cooking routines (parse_lit_str_cooked, parse_lit_byte_str_cooked, backslash_x, backslash_u, etc.) assume the input is a syntactically valid Rust literal as produced by rustc's lexer through proc_macro2::Literal::to_string. They reach for panic!/unreachable! on unexpected byte patterns rather than returning errors, which is the basis for FINDING-2. The crate is a parser implementation justifying impl-parser; under the upstream-validated input contract, parser-impl-safe holds.

`src/parse.rs`

`src/parse.rs`, line 252-268

pub struct ParseBuffer<'a> {
    scope: Span,
    // Instead of Cell<Cursor<'a>> so that ParseBuffer<'a> is covariant in 'a.
    // The rest of the code in this module needs to be careful that only a
    // cursor derived from this `cell` is ever assigned to this `cell`.
    //
    // Cell<Cursor<'a>> cannot be covariant in 'a because then we could take a
    // ParseBuffer<'a>, upcast to ParseBuffer<'short> for some lifetime shorter
    // than 'a, and then assign a Cursor<'short> into the Cell.
    //
    // By extension, it would not be safe to expose an API that accepts a
    // Cursor<'a> and trusts that it lives as long as the cursor currently in
    // the cell.
    cell: Cell<Cursor<'static>>,
    marker: PhantomData<Cursor<'a>>,
    unexpected: Cell<Option<Rc<Cell<Unexpected>>>>,
}

The ParseBuffer struct documents the rationale for storing Cell<Cursor<'static>> instead of Cell<Cursor<'a>>: the latter is invariant in 'a, which would let a caller down-cast ParseBuffer<'a> to a shorter lifetime and then write a shorter-lived cursor into the cell. The PhantomData<Cursor<'a>> field restores the desired variance and the surrounding code preserves the invariant that the cell only ever holds a cursor derived from the original 'a lifetime. Supports the unsafe-safe and unsafe-documented assessment for this module.

`src/thread.rs`

ThreadBound<T> wraps a value with the ThreadId of its constructor and unsafely implements Sync for all T and Send for T: Copy. get() returns Some(&T) only when called from the constructing thread, so cross-thread observers cannot reach the inner T. The Copy bound on the Send impl avoids running T's Drop on the wrong thread. This is a small custom Sync-maker rather than a general-purpose primitive, justifying impl-concurrency, concurrency-impl-safe, concurrency-impl-correct, concurrency-impl-documented.

`tests/test_round_trip.rs`

Differential test: clones a pinned revision of the rust-lang/rust source tree, walks every .rs file, parses each with both syn::parse_file and rustc's rustc_parse (via #![feature(rustc_private)]), then compares the resulting ASTs with SpanlessEq. This is the crate's primary correctness check against the reference parser. The test runs on nightly-only CI and is gated by #![cfg(not(syn_disable_nightly_tests))] and #![cfg(not(miri))]. Justifies parser-impl-correct, parser-impl-tested, datastructure-impl-tested, concurrency-impl-tested, unsafe-tested, has-integration-tests.