Audit · shlex@1.3.0

cargo : shlex @ 1.3.0

PE Patrick Elsen signed 2026-05-27 published 2026-05-27

Claims

has-binarieshas-build-exechas-fuzz-testshas-install-exechas-integration-testshas-property-testshas-unit-testsimpl-algorithmimpl-concurrencyimpl-cryptoimpl-datastructureimpl-interpreterimpl-jitimpl-parserimpl-protocolis-benignparser-impl-correctparser-impl-safeparser-impl-testedunsafe-documentedunsafe-minimalunsafe-safeunsafe-testeduses-concurrencyuses-cryptouses-environmentuses-execuses-filesystemuses-interpreteruses-jituses-networkuses-unsafe

Summary

shlex 1.3.0 is the post-CVE remediation release of a small no_std POSIX shell word-splitter/quoter; five from_utf8_unchecked calls are all justified by a documented byte-level UTF-8 invariant and exercised by upstream fuzz harnesses. No findings; safe to deploy for non-interactive shell quoting.

Report

Subject

shlex is a Rust library that splits strings into shell words and quotes strings for safe inclusion in shell commands, modelled on Python's shlex. It exposes both a &str-based API (Shlex, split, Quoter, try_quote, try_join) and a byte-string equivalent under shlex::bytes. It is no_std when the default std feature is disabled.

Methodology

The published crate contents were compared against the upstream Git repository at the tag for v1.3.0 using diff -r. All source files (lib.rs, bytes.rs, total ~930 lines including unit tests) were read in full. The manifest was inspected. The source tree was grepped for unsafe; every block was reviewed against the documented invariant ("given valid UTF-8, the byte API always returns valid UTF-8"). Particular attention was paid to the Quoter logic, since RUSTSEC-2024-0006 (CVE-2024-28854/28861) tracked nul-byte and control-character shell-injection issues which v1.3.0 was the remediation release for.

Results

The diff between the published crate contents and the upstream Git repository shows only Cargo's standard manifest normalisation and the omission of the out-of-tree fuzz/ directory. All source files match byte-for-byte.

The crate ships no binary artefacts (justifying has-binaries), no build.rs, and is not declared as a proc-macro library (justifying has-build-exec). There are no install-time hooks (justifying has-install-exec). The crate performs no filesystem access, no network calls, no process spawning (despite the topic of shell words), no environment-variable reads, no cryptographic operations, no JIT, no interpreter, and no concurrency. This justifies uses-filesystem, uses-network, uses-exec, uses-environment, uses-crypto, uses-concurrency, uses-jit, and uses-interpreter.

The crate does not implement cryptography, an interpreter, a JIT, a protocol, a data structure, a non-trivial algorithm, or any concurrency primitive, justifying impl-crypto, impl-interpreter, impl-jit, impl-protocol, impl-datastructure, impl-algorithm, and impl-concurrency. The crate implements both a shell-word parser and a shell-quoter (justifying impl-parser). The parser handles POSIX single/double quoting and backslash escapes; unclosed quotes or trailing backslashes are reported via the had_error flag without panicking. The quoter explicitly enumerates ASCII characters safe to emit unquoted/single-quoted/double-quoted, treats all bytes >= 0x80 as requiring quoting, and rejects nul bytes by default. Unit tests cover the canonical POSIX inputs, multi-shell compatibility cases, and invalid UTF-8, justifying has-unit-tests; there are no separate integration or property tests (justifying has-integration-tests and has-property-tests). The upstream Git repository contains libfuzzer-sys harnesses round-tripping the quoter against real shells, Python, and wordexp, justifying has-fuzz-tests. These together justify parser-impl-safe, parser-impl-correct, and parser-impl-tested.

The crate uses unsafe only in the &str API surface, where five from_utf8_unchecked / core::str::from_utf8_unchecked calls convert the validated byte output back to UTF-8 strings (justifying uses-unsafe). Each unsafe block carries a SAFETY comment citing the invariant in bytes::Quoter::quote (lines 200-205) that the quoter only inserts ASCII bytes around existing content and never modifies multibyte UTF-8, and that the parser only copies bytes from the input. Justifies unsafe-documented, unsafe-safe, and unsafe-minimal (no alternative exists short of revalidating every output). The upstream fuzz harnesses indirectly exercise these unsafe blocks, justifying unsafe-tested.

No findings were raised. The crate had a security advisory (RUSTSEC-2024-0006) for prior versions concerning nul bytes and control characters in quote; v1.3.0 introduces QuoteError::Nul, deprecates the old quote/join in favour of try_quote/try_join, and documents the residual control-character risks in quoting_warning.md. is-benign holds.

Conclusion

shlex 1.3.0 is the post-CVE remediation release of a small, well-scoped shell-quoting library. The byte/&str split is clean, the documented UTF-8 invariant holds across all five unsafe conversions, and the parser/quoter are exhaustively unit-tested plus fuzzed in the upstream repository. The crate is safe to deploy for non-interactive shell-quoting use cases; callers piping output to an interactive shell should still consult quoting_warning.md.

Findings

No findings.

Annotations(2)

`src/bytes.rs`

`src/bytes.rs`, line 206-230

    pub fn quote<'a>(&self, mut in_bytes: &'a [u8]) -> Result<Cow<'a, [u8]>, QuoteError> {
        if in_bytes.is_empty() {
            // Empty string.  Special case that isn't meaningful as only part of a word.
            return Ok(b"''"[..].into());
        }
        if !self.allow_nul && in_bytes.iter().any(|&b| b == b'\0') {
            return Err(QuoteError::Nul);
        }
        let mut out: Vec<u8> = Vec::new();
        while !in_bytes.is_empty() {
            // Pick a quoting strategy for some prefix of the input.  Normally this will cover the
            // entire input, but in some case we might need to divide the input into multiple chunks
            // that are quoted differently.
            let (cur_len, strategy) = quoting_strategy(in_bytes);
            if cur_len == in_bytes.len() && strategy == QuotingStrategy::Unquoted && out.is_empty() {
                // Entire string can be represented unquoted.  Reuse the allocation.
                return Ok(in_bytes.into());
            }
            let (cur_chunk, rest) = in_bytes.split_at(cur_len);
            assert!(rest.len() < in_bytes.len()); // no infinite loop
            in_bytes = rest;
            append_quoted_chunk(&mut out, cur_chunk, strategy);
        }
        Ok(out.into())
    }

quote operates on bytes and only inserts ASCII delimiters (single/double quote, backslash) around existing bytes. Multi-byte UTF-8 codepoints are preserved because the byte-by-byte logic only inspects bytes < 0x80 for special handling. Implements quoting; tests cover empty, ASCII, and invalid UTF-8 inputs.

`src/bytes.rs`, line 60-156

    fn parse_word(&mut self, mut ch: u8) -> Option<Vec<u8>> {
        let mut result: Vec<u8> = Vec::new();
        loop {
            match ch as char {
                '"' => if let Err(()) = self.parse_double(&mut result) {
                    self.had_error = true;
                    return None;
                },
                '\'' => if let Err(()) = self.parse_single(&mut result) {
                    self.had_error = true;
                    return None;
                },
                '\\' => if let Some(ch2) = self.next_char() {
                    if ch2 != '\n' as u8 { result.push(ch2); }
                } else {
                    self.had_error = true;
                    return None;
                },
                ' ' | '\t' | '\n' => { break; },
                _ => { result.push(ch as u8); },
            }
            if let Some(ch2) = self.next_char() { ch = ch2; } else { break; }
        }
        Some(result)
    }

    fn parse_double(&mut self, result: &mut Vec<u8>) -> Result<(), ()> {
        loop {
            if let Some(ch2) = self.next_char() {
                match ch2 as char {
                    '\\' => {
                        if let Some(ch3) = self.next_char() {
                            match ch3 as char {
                                // \$ => $
                                '$' | '`' | '"' | '\\' => { result.push(ch3); },
                                // \<newline> => nothing
                                '\n' => {},
                                // \x => =x
                                _ => { result.push('\\' as u8); result.push(ch3); }
                            }
                        } else {
                            return Err(());
                        }
                    },
                    '"' => { return Ok(()); },
                    _ => { result.push(ch2); },
                }
            } else {
                return Err(());
            }
        }
    }

    fn parse_single(&mut self, result: &mut Vec<u8>) -> Result<(), ()> {
        loop {
            if let Some(ch2) = self.next_char() {
                match ch2 as char {
                    '\'' => { return Ok(()); },
                    _ => { result.push(ch2); },
                }
            } else {
                return Err(());
            }
        }
    }

    fn next_char(&mut self) -> Option<u8> {
        let res = self.in_iter.next().copied();
        if res == Some(b'\n') { self.line_no += 1; }
        res
    }
}

impl<'a> Iterator for Shlex<'a> {
    type Item = Vec<u8>;
    fn next(&mut self) -> Option<Self::Item> {
        if let Some(mut ch) = self.next_char() {
            // skip initial whitespace
            loop {
                match ch as char {
                    ' ' | '\t' | '\n' => {},
                    '#' => {
                        while let Some(ch2) = self.next_char() {
                            if ch2 as char == '\n' { break; }
                        }
                    },
                    _ => { break; }
                }
                if let Some(ch2) = self.next_char() { ch = ch2; } else { return None; }
            }
            self.parse_word(ch)
        } else { // no initial character
            None
        }
    }

}

Shlex iterator implements POSIX-shell word splitting with single/double quote and backslash escapes. Errors (unclosed quote, trailing backslash) set had_error and end iteration. Implementation is straightforward byte-by-byte; no indexing arithmetic that could overflow.

`src/lib.rs`

`src/lib.rs`, line 38-40

#![cfg_attr(not(feature = "std"), no_std)]

extern crate alloc;

Crate is no_std when default std feature disabled; uses alloc for Vec/String/Cow.

`src/lib.rs`, line 67-72

    type Item = String;
    fn next(&mut self) -> Option<String> {
        self.0.next().map(|byte_word| {
            // Safety: given valid UTF-8, bytes::Shlex will always return valid UTF-8.
            unsafe { String::from_utf8_unchecked(byte_word) }
        })

from_utf8_unchecked is sound: bytes::Shlex only ever copies or skips bytes from the input; given valid UTF-8 input (via Shlex::new(in_str: &str).as_bytes()), every byte placed in the output came from a valid UTF-8 boundary. Justifies uses-unsafe, unsafe-safe, unsafe-documented.

`src/lib.rs`, line 156-174

    pub fn join<'a, I: IntoIterator<Item = &'a str>>(&self, words: I) -> Result<String, QuoteError> {
        // Safety: given valid UTF-8, bytes::join() will always return valid UTF-8.
        self.inner.join(words.into_iter().map(|s| s.as_bytes()))
            .map(|bytes| unsafe { String::from_utf8_unchecked(bytes) })
    }

    /// Given a single word, return a string suitable to encode it as a shell argument.
    pub fn quote<'a>(&self, in_str: &'a str) -> Result<Cow<'a, str>, QuoteError> {
        Ok(match self.inner.quote(in_str.as_bytes())? {
            Cow::Borrowed(out) => {
                // Safety: given valid UTF-8, bytes::quote() will always return valid UTF-8.
                unsafe { core::str::from_utf8_unchecked(out) }.into()
            }
            Cow::Owned(out) => {
                // Safety: given valid UTF-8, bytes::quote() will always return valid UTF-8.
                unsafe { String::from_utf8_unchecked(out) }.into()
            }
        })
    }

Quoter::join and Quoter::quote rely on the documented invariant in bytes::Quoter::quote (line 200-205): it only inserts ASCII characters around existing content, never modifying a multibyte UTF-8 character. Therefore the from_utf8_unchecked cast is sound.