Hacker News

lizmat 2 years ago [ - ]

- Support for NFG (Normalization Form Grapheme), which e.g. means that you only need to specify 'é' if you want to look for an 'é', and not have to worry about whether the text you're searching in, consists of the single codepoint 'é', or that it has the decomposed version.

- support for --ignoremark, which means you 'e' will match any accented 'e', such as éëêèęėē.

burntsushi 2 years ago [ - ]

Ah yeah that's a good one! It does very likely have enormous implications for performance though. I wonder if I should add it as an opt-in feature to ripgrep. Although its support will be inherently limited in some capacity since character classes will always be limited to matching a single codepoint. (i.e., No UTS#18 Level 2 support.)

Have you found folks using these particular Unicode features in practice? I don't think anyone has request it for ripgrep.

lizmat 2 years ago [ - ]

> It does very likely have enormous implications for performance though.

Well, but that's only one of the reasons why rak is a lot slower. There's something else going on, but I currently don't have the mindset to investigate this deeply.