cargo : regex-syntax @ 0.8.10
PE Patrick Elsen signed 2026-05-28 published 2026-05-28

README.md

97 lines · markdown

regex-syntax============This crate provides a robust regular expression parser.[![Build status](https://github.com/rust-lang/regex/workflows/ci/badge.svg)](https://github.com/rust-lang/regex/actions)[![Crates.io](https://img.shields.io/crates/v/regex-syntax.svg)](https://crates.io/crates/regex-syntax)### Documentationhttps://docs.rs/regex-syntax### OverviewThere are two primary types exported by this crate: `Ast` and `Hir`. The formeris a faithful abstract syntax of a regular expression, and can convert regularexpressions back to their concrete syntax while mostly preserving its originalform. The latter type is a high level intermediate representation of a regularexpression that is amenable to analysis and compilation into byte codes orautomata. An `Hir` achieves this by drastically simplifying the syntacticstructure of the regular expression. While an `Hir` can be converted back toits equivalent concrete syntax, the result is unlikely to resemble the originalconcrete syntax that produced the `Hir`.### ExampleThis example shows how to parse a pattern string into its HIR:```rustuse regex_syntax::{hir::Hir, parse};let hir = parse("a|b").unwrap();assert_eq!(hir, Hir::alternation(vec![    Hir::literal("a".as_bytes()),    Hir::literal("b".as_bytes()),]));```### SafetyThis crate has no `unsafe` code and sets `forbid(unsafe_code)`. While it'spossible this crate could use `unsafe` code in the future, the standardfor doing so is extremely high. In general, most code in this crate is notperformance critical, since it tends to be dwarfed by the time it takes tocompile a regular expression into an automaton. Therefore, there is little needfor extreme optimization, and therefore, use of `unsafe`.The standard for using `unsafe` in this crate is extremely high because thiscrate is intended to be reasonably safe to use with user supplied regularexpressions. Therefore, while there may be bugs in the regex parser itself,they should _never_ result in memory unsafety unless there is either a bugin the compiler or the standard library. (Since `regex-syntax` has zerodependencies.)### Crate featuresBy default, this crate bundles a fairly large amount of Unicode data tables(a source size of ~750KB). Because of their large size, one can disable someor all of these data tables. If a regular expression attempts to use Unicodedata that is not available, then an error will occur when translating the `Ast`to the `Hir`.The full set of features one can disable are[in the "Crate features" section of the documentation](https://docs.rs/regex-syntax/*/#crate-features).### TestingSimply running `cargo test` will give you very good coverage. However, becauseof the large number of features exposed by this crate, a `test` script isincluded in this directory which will test several feature combinations. Thisis the same script that is run in CI.### MotivationThe primary purpose of this crate is to provide the parser used by `regex`.Specifically, this crate is treated as an implementation detail of the `regex`,and is primarily developed for the needs of `regex`.Since this crate is an implementation detail of `regex`, it may experiencebreaking change releases at a different cadence from `regex`. This is onlypossible because this crate is _not_ a public dependency of `regex`.Another consequence of this de-coupling is that there is no direct way tocompile a `regex::Regex` from a `regex_syntax::hir::Hir`. Instead, one mustfirst convert the `Hir` to a string (via its `std::fmt::Display`) and thencompile that via `Regex::new`. While this does repeat some work, compilationtypically takes much longer than parsing.Stated differently, the coupling between `regex` and `regex-syntax` exists onlyat the level of the concrete syntax.