| Feature | Example | ES2018 | ES2024+ | Subfeatures & JS differences | |
|---|---|---|---|---|---|
| Characters | Literal | E, ! |
✅ | ✅ |
✔ Code point based matching (same as JS with flag u, v)✔ Standalone ], {, } don't require escaping |
| Identity escape | \E, \! |
✅ | ✅ |
✔ Different set than JS ✔ Allows multibyte chars |
|
| Escaped metachar | \\, \. |
✅ | ✅ |
✔ Same as JS |
|
| Control code escape | \t |
✅ | ✅ |
✔ The JS set plus \a, \e |
|
\xNN |
\x7F |
✅ | ✅ |
✔ Allows 1 hex digit ✔ Above 7F, is UTF-8 encoded byte (≠ JS)✔ Error for invalid encoded bytes |
|
\uNNNN |
\uFFFF |
✅ | ✅ |
✔ Same as JS with flag u, v |
|
🆚 \x{…} |
\x{A} |
✅ | ✅ |
✔ Allows leading 0s up to 8 total hex digits |
|
| Escaped num | \20 |
✅ | ✅ |
✔ Can be backref, error, null, octal, identity escape, or any of these combined with literal digits, based on complex rules that differ from JS ✔ Always handles escaped single digit 1-9 outside char class as backref ✔ Allows null with 1-3 0s ✔ Error for octal ≥ 200 |
|
| Caret notation |
\cA,🆚 \C-A
|
✅ | ✅ |
✔ With A-Za-z (JS: only \c form) |
|
| Character sets | Digit | \d, \D |
✅ | ✅ |
✔ Unicode by default (≠ JS) |
| Word | \w, \W |
✅ | ✅ |
✔ Unicode by default (≠ JS) |
|
| Whitespace | \s, \S |
✅ | ✅ |
✔ Unicode by default ✔ No JS adjustments to Unicode set (− \uFEFF, +\x85) |
|
| 🆕 Hex digit | \h, \H |
✅ | ✅ |
✔ ASCII |
|
| Dot | . |
✅ | ✅ |
✔ Excludes only \n (≠ JS) |
|
| 🆕 Any | \O |
✅ | ✅ |
✔ Any char (with any flags) ✔ Identity escape in char class |
|
🆕 Not \n |
\N |
✅ | ✅ |
✔ Identity escape in char class |
|
| 🆕 Newline | \R |
✅ | ✅ |
✔ Matched atomically ✔ Identity escape in char class |
|
| 🆕 Grapheme | \X |
☑️ | ☑️ |
● Uses a close approximation ✔ Matched atomically ✔ Identity escape in char class |
|
| Unicode property |
\p{L},\P{L}
|
✅ | ✅ |
✔ Binary properties ✔ Categories ✔ Scripts ✔ Aliases ✔ POSIX properties ✔ Invert with \p{^…}, \P{^…}✔ Insignificant spaces, hyphens, underscores, and casing in names ✔ \p, \P without { is an identity escape✔ Error for key prefixes ✔ Error for props of strings ❌ Blocks (wontfix[1]) |
|
| Character classes | Base | […], [^…] |
✅ | ✅ |
✔ Unescaped - outside of range is literal in some contexts (different than JS rules in any mode)✔ Leading unescaped ] is literal✔ Fewer chars require escaping than JS |
| Range | [a-z] |
✅ | ✅ |
✔ Same as JS with flag u, v✔ Allows \x{…} above 10FFFF at end of range to mean last valid code point |
|
| 🆕 POSIX class |
[[:word:]],[[:^word:]]
|
☑️[2] | ✅ |
✔ All use Unicode definitions |
|
| Nested class | […[…]] |
☑️[3] | ✅ |
✔ Same as JS with flag v |
|
| Intersection | […&&…] |
❌ | ✅ |
✔ Doesn't require nested classes for intersection of union and ranges ✔ Allows empty segments |
|
| Assertions | Line start, end | ^, $ |
✅ | ✅ |
✔ Always "multiline" ✔ Only \n as newline✔ ^ doesn't match after string-terminating \n |
| 🆕 String start, end | \A, \z |
✅ | ✅ |
✔ Same as JS ^ $ without JS flag m |
|
| 🆕 String end or before terminating newline | \Z |
✅ | ✅ |
✔ Only \n as newline |
|
| 🆕 Search start | \G |
✅ | ✅ |
✔ Matches at start of match attempt (not end of prev match; advances after 0-length match) |
|
| Word boundary | \b, \B |
✅ | ✅ |
✔ Unicode based (≠ JS) |
|
| Lookaround |
(?=…),(?!…),(?<=…),(?<!…)
|
✅ | ✅ |
✔ Allows variable-length quantifiers and alternation within lookbehind ✔ Lookahead invalid within lookbehind ✔ Capturing groups invalid within negative lookbehind ✔ Negative lookbehind invalid within positive lookbehind |
|
| Quantifiers | Greedy, lazy | *, +?, {2,}, etc. |
✅ | ✅ |
✔ Includes all JS forms ✔ Adds {,n} for min 0✔ Explicit bounds have upper limit of 100,000 (unlimited in JS) ✔ Error with assertions (same as JS with flag u, v) and directives |
| 🆕 Possessive | ?+, *+, ++, {3,2} |
✅ | ✅ |
✔ + suffix doesn't make {…} quantifiers possessive (creates a quantifier chain)✔ Reversed {…} ranges are possessive |
|
| 🆕 Chained | **, ??+*, {2,3}+, etc. |
✅ | ✅ |
✔ Further repeats the preceding repetition |
|
| Groups | Noncapturing | (?:…) |
✅ | ✅ |
✔ Same as JS |
| 🆕 Atomic | (?>…) |
✅ | ✅ |
✔ Supported |
|
| Capturing | (…) |
✅ | ✅ |
✔ Is noncapturing if named capture present |
|
| Named capturing |
(?<a>…),🆚 (?'a'…)
|
✅ | ✅ |
✔ Duplicate names allowed (including within the same alternation path) unless directly referenced by a subroutine ✔ Error for names invalid in Oniguruma (more permissive than JS) |
|
| Backreferences | Numbered | \1 |
✅ | ✅ |
✔ Error if named capture used ✔ Refs the most recent of a capture/subroutine set |
| 🆕 Enclosed numbered, relative |
\k<1>,\k'1',\k<-1>,\k'-1'
|
✅ | ✅ |
✔ Error if named capture used ✔ Allows leading 0s ✔ Refs the most recent of a capture/subroutine set ✔ \k without < or ' is an identity escape |
|
| Named |
\k<a>,🆚 \k'a'
|
✅ | ✅ |
✔ For duplicate group names, rematch any of their matches (multiplex), atomically ✔ Refs the most recent of a capture/subroutine set (no multiplex) ✔ Combination of multiplex and most recent of capture/subroutine set if duplicate name is indirectly created by a subroutine ✔ Error for backref to valid group name that includes -/+ |
|
| To nonparticipating groups | ☑️ | ☑️ |
✔ Error if group to the right[4] ✔ Duplicate names (and subroutines) to the right not included in multiplex ✔ Fail to match (or don't include in multiplex) ancestor groups and groups in preceding alternation paths ❌ Some rare cases are indeterminable at compile time and use the JS behavior of matching an empty string |
||
| Subroutines | 🆕 Numbered, relative |
\g<1>,\g'1',\g<-1>,\g'-1',\g<+1>,\g'+1'
|
✅ | ✅ |
✔ Error if named capture used ✔ Allows leading 0s All subroutines (incl. named): ✔ Allowed before reffed group ✔ Can be nested (any depth) ✔ Reuses flags from the reffed group (ignores local flags) ✔ Replaces most recent captured values (for backrefs) ✔ \g without < or ' is an identity escape |
| 🆕 Named |
\g<a>,\g'a'
|
✅ | ✅ |
● Same behavior as numbered ✔ Error if reffed group uses duplicate name |
|
| Recursion | 🆕 Full pattern |
\g<0>,\g'0'
|
☑️[5] | ☑️[5] |
✔ 20-level depth limit |
| 🆕 Numbered, relative, named |
(…\g<1>?…),(…\g<-1>?…),(?<a>…\g<a>?…), etc.
|
☑️[5] | ☑️[5] |
✔ 20-level depth limit |
|
| Other | Alternation | …|… |
✅ | ✅ |
✔ Same as JS |
| 🆕 Absence repeater[6] | (?~…) |
✅ | ✅ |
✔ Supported |
|
| 🆕 Comment group | (?#…) |
✅ | ✅ |
✔ Allows escaping \), \\✔ Comments allowed between a token and its quantifier ✔ Comments between a quantifier and the ?/+ that makes it lazy/possessive changes it to a quantifier chain |
|
| 🆕 Fail[7] | (*FAIL) |
✅ | ✅ |
✔ Supported |
|
| 🆕 Keep | \K |
☑️ | ☑️ |
● Supported at top level if no top-level alternation is used |
|
| JS features unknown to Oniguruma are handled using Oniguruma syntax rules | ✅ | ✅ |
✔ \u{…} is an error✔ [], [^] are errors✔ [\q{…}] matches q, etc.✔ [a--b] includes the invalid reversed range a to - |
||
| Invalid Oniguruma syntax | ✅ | ✅ |
✔ Error |
||
| Flags | Supported in top-level flags and flag modifiers | ||||
| Ignore case | i |
✅ | ✅ |
✔ Unicode case folding (same as JS with flag u, v)[8] |
|
| 🆚 Dot all | m |
✅ | ✅ |
✔ Equivalent to JS flag s |
|
| 🆕 Extended | x |
✅ | ✅ |
✔ Unicode whitespace ignored ✔ Line comments with #✔ Whitespace/comments allowed between a token and its quantifier ✔ Whitespace/comments between a quantifier and the ?/+ that makes it lazy/possessive changes it to a quantifier chain✔ Whitespace/comments separate tokens (ex: \1 0)✔ Whitespace and # not ignored in char classes |
|
| Currently supported only in top-level flags | |||||
| 🆕 Digit is ASCII | D |
✅ | ✅ |
✔ ASCII \d, \p{Digit}, etc. |
|
| 🆕 Space is ASCII | S |
✅ | ✅ |
✔ ASCII \s, \p{Space}, etc. |
|
| 🆕 Word is ASCII[9] | W |
✅ | ✅ |
✔ ASCII \w, \p{Word}, \b, etc. |
|
| 🆕 Text segment mode is grapheme | y{g} |
✅ | ✅ |
✔ Grapheme based \X, \y |
|
| Flag modifiers | Group | (?im-x:…) |
✅ | ✅ |
✔ Unicode case folding for i✔ Allows enabling and disabling the same flag (priority: disable) ✔ Allows lone or multiple - |
| 🆕 Directive | (?im-x) |
✅ | ✅ |
✔ Continues until end of pattern or group (spanning alternatives) |
|
| Compile-time options | ONIG_OPTION_CAPTURE_GROUP |
✅ | ✅ |
✔ Unnamed captures and numbered calls allowed when using named capture |
|
ONIG_OPTION_SINGLELINE |
✅ | ✅ |
✔ ^ → \A✔ $ → \Z |
||