Audit · wasmparser@0.244.0

cargo : wasmparser @ 0.244.0

PE Patrick Elsen signed 2026-05-28 published 2026-05-28

Claims

algorithm-impl-safehas-binarieshas-build-exechas-fuzz-testshas-install-exechas-integration-testshas-property-testshas-unit-testsimpl-algorithmimpl-concurrencyimpl-cryptoimpl-datastructureimpl-interpreterimpl-jitimpl-parserimpl-protocolis-benignparser-impl-safeunsafe-documentedunsafe-minimalunsafe-safeuses-concurrencyuses-cryptouses-environmentuses-execuses-filesystemuses-interpreteruses-jituses-networkuses-unsafe

Summary

wasmparser 0.244.0 is the bytecodealliance WebAssembly binary parser and validator. The audit read the binary reader, validator core, and resource-limit code; LEB128 decoding rejects overlong and out-of-range encodings, section and count fields are bounds-checked with lazy per-item iteration, and validator limits use checked arithmetic. One documented unsafe transmute over a #[repr(transparent)] type. No findings.

Report

Subject

wasmparser 0.244.0 is the WebAssembly binary parser and validator from the bytecodealliance wasm-tools workspace. It is an event-driven, zero-copy reader: Parser yields a stream of Payload values over a wasm module or component, and Validator consumes those payloads to type-check the module against the core wasm and component-model specifications. The public API exposes the low-level BinaryReader, the section readers in src/readers, the operator decoder, and the full validator. It is the front-end stage shared by downstream engines such as wasmtime; it parses and validates but does not compile or execute wasm. The crate is no_std-capable, with std, validate, serde, component-model, hash-collections, and simd features enabled by default.

Methodology

Tools: openvet 0.6.0, ripgrep, diff, git, Read. The published contents/ tree holds 39,831 lines of Rust across 38 files. I read binary_reader.rs (LEB128 decoders, EOF bounds checks, string and reader length-prefix handling), limits.rs (resource caps) and parser.rs (section dispatch, the delimited length-tracking helper) in full or near-full; the validator core (validator.rs count checks, validator/operators.rs control-flow and operand stack, validator/func.rs locals, validator/core/canonical.rs subtype-depth limiting, validator/names.rs for the single unsafe site) in the load-bearing sections; and the src/readers section-iterator machinery and src/collections wrappers more selectively. I surveyed the whole tree with ripgrep for unsafe, FFI, network, filesystem, process, environment, crypto, RNG, and concurrency primitives. A diff -rq contents vcs shows the only difference is the cargo-normalised Cargo.toml; the source is byte-identical to the seeded VCS snapshot. I did not fetch the WebAssembly specification, so I did not evaluate spec conformance.

Results

The source under contents/src is byte-equivalent to vcs/src; only Cargo.toml differs, as expected from cargo normalisation. The survey found no network, filesystem, process, environment, cryptographic, RNG, or concurrency usage anywhere in the crate, so uses-network, uses-filesystem, uses-environment, uses-exec, uses-crypto, and uses-concurrency are all false; the sha256 occurrences are component-model import-name string parsing, not cryptographic operations. There is no build.rs and build = false in the manifest, no install hooks, and no pre-compiled assets, so has-build-exec, has-install-exec, and has-binaries are false. The crate neither compiles nor runs wasm, so uses-jit, uses-interpreter, impl-jit, impl-interpreter, impl-protocol, and impl-crypto are false.

The crate implements a parser (impl-parser) and the validation algorithms that type-check it (impl-algorithm). The collection types in src/collections are thin wrappers that delegate to indexmap, hashbrown, or BTreeMap rather than novel storage, so impl-datastructure is false, and the crate uses no concurrency primitives of its own, so impl-concurrency is false.

The crate contains exactly one unsafe block (uses-unsafe), at validator/names.rs:38: a core::mem::transmute coercing &str to &KebabStr, where KebabStr is #[repr(transparent)] over str. The transparent representation makes the layouts identical, the safety comment states the invariant, and this is the only unsafe in a crate that otherwise sets unsafe_code = "deny", so unsafe-safe, unsafe-documented, and unsafe-minimal hold. I did not run miri or a sanitizer against it, so I did not assert unsafe-tested.

Parser safety on adversarial input (parser-impl-safe) rests on three mechanisms. First, the LEB128 decoders in binary_reader.rs:441-658 reject overlong encodings (a continuation bit on the final permitted byte) and out-of-range values (unused high bits set), and every byte read goes through read_u8, which is EOF-checked, so a truncated integer yields an error rather than a panic or over-read. Second, section and string lengths are bounds-checked: read_size/read_string cap lengths against limits.rs constants, ensure_has_bytes rejects reads past the buffer end, and delimited (parser.rs:1280) uses checked_sub to keep function-body reads inside the declared section byte range. Third, attacker-controlled count fields do not drive eager allocation: SectionLimited (readers.rs:84) stores only the count and reads items one at a time through its iterator, hitting EOF when the section data is exhausted. The validator additionally enforces the WebAssembly JS API resource limits in limits.rs (types, functions, imports, exports, globals, tables, memories, tags, locals, subtyping depth, and the component-model caps) through check_max, which uses checked arithmetic. Locals are capped at MAX_WASM_FUNCTION_LOCALS via checked_add (operators.rs:4601) and stored in a bounded cache plus a binary-searched overflow list. The control-flow and operand stacks are pushed and popped in lockstep, pop_ctrl verifies the operand stack returns to the frame height, branch targets are bounds-checked by jump, and out-of-bounds local indices are rejected by local; these support algorithm-impl-safe. Subtype-chain depth is capped at MAX_WASM_SUBTYPING_DEPTH (63) in canonical.rs, bounding recursion in type matching.

The crate carries #[cfg(test)] mod tests unit tests in five source files (has-unit-tests) and one integration test, tests/big-module.rs, which exercises hundred-thousand-element index spaces (has-integration-tests). The published crate ships no fuzz/ directory and no proptest or quickcheck harness, so has-fuzz-tests and has-property-tests are false; upstream fuzz targets live in the parent workspace, outside this crate. Because I did not run the fuzzers or the spec test suite, I did not assert parser-impl-tested, parser-impl-correct, algorithm-impl-tested, algorithm-impl-correct, or algorithm-impl-bounds. No malicious or obfuscated code, telemetry, hidden endpoints, or time-based behaviour was found, so is-benign holds. No findings were recorded.

Conclusion

wasmparser 0.244.0 parses and validates WebAssembly with a single documented unsafe transmute whose #[repr(transparent)] invariant holds. Its adversarial-input handling is layered: EOF-checked LEB128 decoding that rejects overlong and out-of-range encodings, length and byte-range bounds checks on sections and strings, lazy per-item section iteration that prevents count-driven allocation amplification, and validator resource limits enforced with checked arithmetic against the WebAssembly JS API caps. The crate performs no I/O of any kind and pulls in four optional dependencies behind features. No security, safety, or correctness findings were recorded. Spec conformance and fuzz coverage were not independently re-run, and the corresponding *-correct and *-tested claims were left unasserted.

Findings

No findings.

Annotations(7)

`src/binary_reader.rs`

`src/binary_reader.rs`, line 441-512

    pub fn read_var_u32(&mut self) -> Result<u32> {
        // Optimization for single byte i32.
        let byte = self.read_u8()?;
        if (byte & 0x80) == 0 {
            Ok(u32::from(byte))
        } else {
            self.read_var_u32_big(byte)
        }
    }

    fn read_var_u32_big(&mut self, byte: u8) -> Result<u32> {
        let mut result = (byte & 0x7F) as u32;
        let mut shift = 7;
        loop {
            let byte = self.read_u8()?;
            result |= ((byte & 0x7F) as u32) << shift;
            if shift >= 25 && (byte >> (32 - shift)) != 0 {
                let msg = if byte & 0x80 != 0 {
                    "invalid var_u32: integer representation too long"
                } else {
                    "invalid var_u32: integer too large"
                };
                // The continuation bit or unused bits are set.
                return Err(BinaryReaderError::new(msg, self.original_position() - 1));
            }
            shift += 7;
            if (byte & 0x80) == 0 {
                break;
            }
        }
        Ok(result)
    }

    /// Advances the `BinaryReader` up to four bytes to parse a variable
    /// length integer as a `u64`.
    ///
    /// # Errors
    ///
    /// If `BinaryReader` has less than one or up to eight bytes remaining, or
    /// the integer is larger than 64 bits.
    #[inline]
    pub fn read_var_u64(&mut self) -> Result<u64> {
        // Optimization for single byte u64.
        let byte = u64::from(self.read_u8()?);
        if (byte & 0x80) == 0 {
            Ok(byte)
        } else {
            self.read_var_u64_big(byte)
        }
    }

    fn read_var_u64_big(&mut self, byte: u64) -> Result<u64> {
        let mut result = byte & 0x7F;
        let mut shift = 7;
        loop {
            let byte = u64::from(self.read_u8()?);
            result |= (byte & 0x7F) << shift;
            if shift >= 57 && (byte >> (64 - shift)) != 0 {
                let msg = if byte & 0x80 != 0 {
                    "invalid var_u64: integer representation too long"
                } else {
                    "invalid var_u64: integer too large"
                };
                // The continuation bit or unused bits are set.
                return Err(BinaryReaderError::new(msg, self.original_position() - 1));
            }
            shift += 7;
            if (byte & 0x80) == 0 {
                break;
            }
        }
        Ok(result)

LEB128 decoding for var_u32 and var_u64. Both detect overlong encodings (continuation bit set on the final permitted byte) and integer-too-large (unused high bits non-zero). The shift-25 boundary check for u32 (byte >> (32 - shift) != 0) and the shift-57 boundary check for u64 are correct: at the final byte only the bits that fit in the target width may be set, and any continuation bit there is rejected. The signed variants (read_var_i32, read_var_s33, read_var_i64) apply the analogous sign-and-unused-bit check and sign-extend the result. All decoders advance only via read_u8, which is EOF-bounds-checked. Justifies parser-impl-safe.

`src/limits.rs`

All resource limits are defined here. Limits cover types, functions, imports, exports, globals, tables, memories, tags, br_table size, struct fields, subtyping depth, and string size. For component model: module size (1 GiB), type decls, record/variant fields, etc. These limits are consistent with the WebAssembly JS API spec. Supports parser-impl-safe bounds checking.

`src/parser.rs`

`src/parser.rs`, line 980-995

            //
            // Limiting via `Parser::max_size` (nested parsing) happens above in
            // `fn parse`, and limiting by our section size happens via
            // `delimited`. Actual parsing of the function body is delegated to
            // the caller to iterate over the `FunctionBody` structure.
            State::FunctionBody { remaining, mut len } => {
                let body = delimited(reader, &mut len, |r| {
                    Ok(FunctionBody::new(r.read_reader()?))
                })?;
                self.state = State::FunctionBody {
                    remaining: remaining - 1,
                    len,
                };
                Ok(CodeSectionEntry(body))
            }
        }

Section-level size tracking: each function body is read via read_reader which uses read_var_u32 for the body size, and delimited enforces that reads stay within the declared byte range. The parent parser also tracks max_size across nested parsers. Truncated or oversized sections result in EOF or bounds errors, not undefined behaviour.

`src/validator.rs`

`src/validator.rs`, line 70-93

fn check_max(cur_len: usize, amt_added: u32, max: usize, desc: &str, offset: usize) -> Result<()> {
    if max
        .checked_sub(cur_len)
        .and_then(|amt| amt.checked_sub(amt_added as usize))
        .is_none()
    {
        if max == 1 {
            bail!(offset, "multiple {desc}");
        }

        bail!(offset, "{desc} count exceeds limit of {max}");
    }

    Ok(())
}

fn combine_type_sizes(a: u32, b: u32, offset: usize) -> Result<u32> {
    match a.checked_add(b) {
        Some(sum) if sum < MAX_WASM_TYPE_SIZE => Ok(sum),
        _ => Err(format_err!(
            offset,
            "effective type size exceeds the limit of {MAX_WASM_TYPE_SIZE}",
        )),
    }

check_max prevents count fields (types, functions, imports, exports, globals, tables, memories, tags) from exceeding their per-category limits. combine_type_sizes enforces MAX_WASM_TYPE_SIZE (1 000 000) to bound recursive type complexity.

`src/validator/core/canonical.rs`

`src/validator/core/canonical.rs`, line 125-163

    fn check_subtype(
        &mut self,
        rec_group: RecGroupId,
        id: CoreTypeId,
        types: &mut TypeAlloc,
        offset: usize,
    ) -> Result<()> {
        let ty = &types[id];
        if !self.features().gc() && (!ty.is_final || ty.supertype_idx.is_some()) {
            bail!(offset, "gc proposal must be enabled to use subtypes");
        }

        self.check_composite_type(&ty.composite_type, &types, offset)?;

        let depth = if let Some(supertype_index) = ty.supertype_idx {
            debug_assert!(supertype_index.is_canonical());
            let sup_id = self.at_packed_index(types, rec_group, supertype_index, offset)?;
            if types[sup_id].is_final {
                bail!(offset, "sub type cannot have a final super type");
            }
            if !types.matches(id, sup_id) {
                bail!(offset, "sub type must match super type");
            }
            let depth = types.get_subtyping_depth(sup_id) + 1;
            if usize::from(depth) > crate::limits::MAX_WASM_SUBTYPING_DEPTH {
                bail!(
                    offset,
                    "sub type hierarchy too deep: found depth {}, cannot exceed depth {}",
                    depth,
                    crate::limits::MAX_WASM_SUBTYPING_DEPTH,
                );
            }
            depth
        } else {
            0
        };
        types.set_subtyping_depth(id, depth);

        Ok(())

Subtype hierarchy depth is explicitly limited by MAX_WASM_SUBTYPING_DEPTH (63), preventing deep subtype chains from causing unbounded recursion in type checking. This is part of parser-impl-safe resource bounds.

`src/validator/names.rs`

`src/validator/names.rs`, line 34-41

    pub(crate) fn new_unchecked<'a>(s: impl AsRef<str> + 'a) -> &'a Self {
        // Safety: `KebabStr` is a transparent wrapper around `str`
        // Therefore transmuting `&str` to `&KebabStr` is safe.
        #[allow(unsafe_code)]
        unsafe {
            core::mem::transmute::<_, &Self>(s.as_ref())
        }
    }

The only unsafe block in this crate. Uses core::mem::transmute to coerce &str to &KebabStr, which is marked #[repr(transparent)] over str. The safety comment is accurate and the invariant holds. Justifies uses-unsafe and unsafe-safe, unsafe-documented, unsafe-minimal.

`src/validator/operators.rs`

`src/validator/operators.rs`, line 947-1005

    /// Pushes a new frame onto the control stack.
    ///
    /// This operation is used when entering a new block such as an if, loop,
    /// or block itself. The `kind` of block is specified which indicates how
    /// breaks interact with this block's type. Additionally the type signature
    /// of the block is specified by `ty`.
    fn push_ctrl(&mut self, kind: FrameKind, ty: BlockType) -> Result<()> {
        // Push a new frame which has a snapshot of the height of the current
        // operand stack.
        let height = self.operands.len();
        let init_height = self.local_inits.push_ctrl();
        self.control.push(Frame {
            kind,
            block_type: ty,
            height,
            unreachable: false,
            init_height,
        });
        // All of the parameters are now also available in this control frame,
        // so we push them here in order.
        for ty in self.params(ty)? {
            self.push_operand(ty)?;
        }
        Ok(())
    }

    /// Pops a frame from the control stack.
    ///
    /// This function is used when exiting a block and leaves a block scope.
    /// Internally this will validate that blocks have the correct result type.
    fn pop_ctrl(&mut self) -> Result<Frame> {
        // Read the expected type and expected height of the operand stack the
        // end of the frame.
        let frame = self.control.last().unwrap();
        let ty = frame.block_type;
        let height = frame.height;
        let init_height = frame.init_height;

        // reset_locals in the spec
        self.local_inits.pop_ctrl(init_height);

        // Pop all the result types, in reverse order, from the operand stack.
        // These types will, possibly, be transferred to the next frame.
        for ty in self.results(ty)?.rev() {
            self.pop_operand(Some(ty))?;
        }

        // Make sure that the operand stack has returned to is original
        // height...
        if self.operands.len() != height {
            bail!(
                self.offset,
                "type mismatch: values remaining on stack at end of block"
            );
        }

        // And then we can remove it!
        Ok(self.control.pop().unwrap())
    }

Control-flow frame push/pop. push_ctrl grows the control stack unboundedly but is indirectly bounded by MAX_WASM_FUNCTION_SIZE (7.6 MB): each block instruction requires at least one opcode byte plus an end byte, so stack depth is bounded by roughly 3.8 M. No explicit per-function nesting limit is enforced, but unbounded stack growth cannot occur within the byte-size constraint.

`src/validator/operators.rs`, line 920-935

        }
        Ok(MaybeType::Known(actual))
    }

    /// Fetches the type for the local at `idx`, returning an error if it's out
    /// of bounds.
    fn local(&self, idx: u32) -> Result<ValType> {
        match self.locals.get(idx) {
            Some(ty) => Ok(ty),
            None => bail!(
                self.offset,
                "unknown local {}: local index out of bounds",
                idx
            ),
        }
    }

The validator implements WebAssembly type checking and structured-control-flow analysis. local (operators.rs) returns an error on out-of-bounds local indices; the operand stack and control stack are pushed and popped in lockstep, with pop_ctrl verifying the operand stack returns to the frame's recorded height and jump rejecting branch depths beyond the control stack. Type-size growth is bounded by combine_type_sizes against MAX_WASM_TYPE_SIZE. All index and count checks use checked arithmetic and return Result errors rather than panicking or indexing out of bounds. Justifies algorithm-impl-safe.

Claims

Summary

Report

Subject

Methodology

Results

Conclusion

Findings

Annotations(7)

src/binary_reader.rs

src/binary_reader.rs, line 441-512

src/limits.rs

src/parser.rs

src/parser.rs, line 980-995

src/validator.rs

src/validator.rs, line 70-93

src/validator/core/canonical.rs

src/validator/core/canonical.rs, line 125-163

src/validator/names.rs

src/validator/names.rs, line 34-41

src/validator/operators.rs

src/validator/operators.rs, line 947-1005

src/validator/operators.rs, line 920-935