Hacker News

Yes, it's inconsistent with Perl. But there are many things in ripgrep's default regex engine that are inconsistent with Perl, including the fact that all patterns are guaranteed to finish a search in linear time with respect to the haystack. (So no look-around or back-references are supported.) It is a non-goal of ripgrep to be consistent with Perl. Thankfully, if you want that, then you can get pretty close by passing the -P/--pcre2 flag.

With that said, I do like Perl's philosophy here. And it was my philosophy too up until recently. I decided to make an exception for \< and \> given their prevalence.

It was also only relatively recently that I made it possible for superfluous escapes to exist. Prior to ripgrep 14, unrecognized escapes were forbidden:

    $ echo '@' | rg-13.0.0 '\@'
    regex parse error:
        \@
        ^^
    error: unrecognized escape sequence
    $ echo '@' | rg '\@'
    @

I had done it this way to make it possible to add new escape sequences in a semver compatible release. But in reality, if I were to ever add new escape sequences, it use one of the ascii alpha-numeric characters, as Perl does. So I decided it was okay to forever and always give up the ability to make, e.g., `\@` mean something other than just matching a literal `@`.

`\<` and `\>` are forever and always the lone exceptions to this. It is perhaps a trap for beginners, but there are many traps in regexes, and this seemed worth it.

Note that `\b{start}` and `\b{end}` also exist and are aliases for `\<` and `\>`. The more niche `\b{start-half}` and `\b{end-half}` also exist, and those are what are used to implement the -w/--word-regexp flag. (Their semantics match GNU grep's -w/--word-regexp.) For example, `\b-2\b` will not match in `foo -2 bar` since `-` is not a word character and `\b` demands `\w` on one side and `\W` on the other. However, `rg -w -e -2` will match `-2` in `foo -2 bar`:

    $ echo 'foo -2 bar' | rg -w -e '\b-2\b'
    $ echo 'foo -2 bar' | rg -w -e -2
    foo -2 bar

xoranth 2 years ago [ - ]

> Also, \< and \> are in ripgrep 14

Isn't that inconsistent with the way Perl's regex syntax was designed? In Perl's syntax an escaped non-ASCII character is always a literal [^1], and that is guaranteed not to change.

That's nice for beginners because it saves you from having to memorize all the metacharacters. If you are in doubt you on whether something has a special meaning, you just escape it.

[^1]: https://perldoc.perl.org/perlrebackslash#The-backslash

burntsushi 2 years ago [ - ]

Ok, makes sense. And thanks for the detailed explaination about word boundaries and the hint about the --pcre flag (I hadn't realized it existed).