Hacker News

GuB-42 2 years ago [ - ]

It is "better" in the sense that it uses Raku regexes, which are incredibly powerful, and if that's not enough for you, you can even write code. "rak" is giving you a general-purpose programming language (Raku) that is optimized for text processing, with convenient shortcuts for the most common tasks.

In contrast "rg" only designed to search text files, but it does that really really fast.

I makes sense to have both.

tialaramex 2 years ago [ - ]

It definitely makes sense to have this tool if you're comfortable writing Raku. I'm not sure it makes a whole lot of sense for most people to learn Raku to use this tool, and if you're not going to learn Raku I don't think there's much value add over tools like ripgrep.

gugod 2 years ago [ - ]

Both learning and the ability of tools are gradual though. For doing naive substring search all grep-alike tools works perfectly, and anyone who are willing to spend some effort and learn a bit of tool-specific features can gain a lot more more benefits.

rak is backed by the entirety of raku language and therefore is much easier for some crafting something that's less trivial to generalize in a one-liner. For most of grep-alike tools, their regex engine would be some sort of ceilings of what can be done but for rak, the ceiling is as tall as the raku programming language. That's rak's niche.

librasteve 2 years ago [ - ]

in a strange way I both agree and disagree with this: + I use raku a lot so it is very natural to embed raku code in a Rak CLI - I think that there a several benefits of the Rak tool that vanilla grep (or ripgrep) does not offer (eg. embedding code snippets in the expression, unicode, better regex syntax) that would probably make it worthwhile to learn the Rak examples while not having to bother with wider raku

burntsushi 2 years ago [ - ]

What Unicode support does Rak (or Raku) have that ripgrep does not?

lizmat 2 years ago [ - ]

- Support for NFG (Normalization Form Grapheme), which e.g. means that you only need to specify 'é' if you want to look for an 'é', and not have to worry about whether the text you're searching in, consists of the single codepoint 'é', or that it has the decomposed version.

- support for --ignoremark, which means you 'e' will match any accented 'e', such as éëêèęėē.

burntsushi 2 years ago [ - ]

Ah yeah that's a good one! It does very likely have enormous implications for performance though. I wonder if I should add it as an opt-in feature to ripgrep. Although its support will be inherently limited in some capacity since character classes will always be limited to matching a single codepoint. (i.e., No UTS#18 Level 2 support.)

Have you found folks using these particular Unicode features in practice? I don't think anyone has request it for ripgrep.

lizmat 2 years ago [ - ]

> It does very likely have enormous implications for performance though.

Well, but that's only one of the reasons why rak is a lot slower. There's something else going on, but I currently don't have the mindset to investigate this deeply.