The language here is Mojo, which the article seems to assume you know and doesn't say enough for you to deduce until half way through and after multiple code examples. I don't know how you're supposed to know this as even the blog it's on is mostly about Vale. From the intro I was expecting it to be about C++.
> Eliminate redundant matrix operations (like two transposes next to each other)
In 2016, I was trying to construct orthogonal irreducible matrix representations of various groups (“irreps”). The problem was that most of the papers describing how to construct these matrices used a recursive approach that depended on having already constructed the matrix elements of a lower dimensional irrep. Thus the irrep dimension n became quite an annoying parameter, and function calls were very slow because you had to construct the irrep for each new group element from the ground up on every single call.
I ended up using Julia’s @generated functions to dynamically create new versions of the matrix construction code for each distinct value of n for each type of group. So essentially it would generate “unrolled” code on the fly and then use LLVM to compile that a single time, after which all successive calls for a specific group and irrep dimension were extremely fast. Was really quite cool. The only downside was that you couldn’t generate very high dimensional irreps because LLVM would begin to struggle with the sheer volume of code it needed to compile, but for my project at the time that wasn’t much of a concern.
> Mojo, D, Nim, and Zig can do it, and C++ as of C++20. There are likely some other languages that can do it, but these are the only ones that can truly run normal run-time code at compile time
For several years I have written a dedicated compiler for regular expressions. You basically pass a regular expression and get an object file containing optimized matching code. It uses LLVM library internally to perform optimizations and machine code generation. It should be generally faster to compile compared to solutions involving constexpr-based metaprogramming.
I am surprised, that there is no programming language doing similar stuff - having regular expressions which are compiled as native code instead of just using a runtime library like PCRE2. Implementing this in C++ or Rust should be relatively easy.
To tie this specific example to a larger framework: In scala land, Tiark Rompf's Lightweight Modular Staging system handled this class of metaprogramming elegantly, and the 'modular' part included support of multiple compilation targets. The idea was that one could incrementally define/extend DSLs that produce an IR, optimizations in that IR, and code generation for chunks of DSLs. Distinctions about stage are straight-forward type-signature changes. The worked example in this post is very similar to one of the tutorials for that system: https://scala-lms.github.io/tutorials/regex.html
I did something similar to this with C++ templates - it's Parsing Expression Grammar based, so not full regex, but enough for a lot of tasks:
using sign = Atoms<'+', '-'>;
using digit = Range<'0', '9'>;
using onenine = Range<'1', '9'>;
using digits = Some<digit>;
using integer = Seq<Opt<Atom<'-'>>, Oneof<Seq<onenine, digits>, digit>>;
using fraction = Seq<Atom<'.'>, digits>;
using exponent = Seq<Atoms<'e', 'E'>, Opt<sign>, digits>;
using number = Seq<integer, Opt<fraction>, Opt<exponent>>;
and I've confirmed that it does all get inlined and optimized on -O3.
Gave me a smile to see the shout out to LISP in there.
Reading this take on it, it feels like a JIT compiler could also accomplish a fair bit of this? I'm also reminded of the way a lot of older programs would generate tables during build time. I'm assuming that is still fairly common?
Seems like an optimization that could be applied quite generally - as the author mentions at the end there’s lots of places this could be used.
The problem with applying this technique generally is the amount of code generated. But what if you can optimize that too.. perhaps share the common parts of the AST between the copies of the code that are generated, and overlay the changes with some datastructure.
Depending on the userbase of the site, simply checking for @gmail.com at the end, I'd bet, would result in a quick win, as well as restricting the username's alphabet to allowed Gmail characters.
The other optimization I'd guess at would be to async/thread/process the checking before and after the @ symbol, so they can run in parallel (ish). Extra cpu time, but speed > CPU cycle count for this benchmark.
In the math department, we had a Moodle the students in the first year of my university in Argentina.
When we started like 15 years ago, the emails of the students and TA were evenly split in 30% Gmail, 30% Yahoo!, 30% Hotmail and 10% others (very aproxímate numbers).
Now the students have like 80% Gmail, 10% Live/Outlook/Hotmail and 10% others/Yahoo. Some of the TA are much older, so perhaps "only" 50% use Gmail.
The difference is huge. I blame the mandatory gmail account for the cell phone.
So, checking only @gmail.com is too strict, but a first fast check for @gmail.com and later the complete regex may improve the speed a lot in the real word.
Tell us you used to work at Google, without telling us.
"simply do X" is such a programmer fallacy at this point I'm surprised we don't have a catchy name for it yet, together with a XKCD for making the point extra clear.
Tell us you don't actually work with any Google engineers... blah blah blah
The trope is "At Google we..." and then casually mention "violating" the CAP theorum with Spanner or something.
It is simple, and I really do hope any first year CS student could extract a substring from a string. Have LLMs so atrophied our programming ability that extraction of a substring is considered evidence of a superior programmer?
Having never heard of mojo before, I found this article fascinating. It provides a great example of how a toy regex parser works and an excellent explanation of why vanilla regex tends to be slow. It also presents a novel solution: compiling the regex into regular code, which can then be optimized by the compiler.
The original is 'lex', written in 1975 by Mike Lesk and Eric Schmidt.
Yes, that Eric Schmidt, CEO of Google.
1987 was the clone, 'flex' :-)
It did "compiling the regex into regular code, which can then be optimized by the compiler" before the C programming language as we know it was created. I think 'lex' was compiling regex to C before the C language even had 'struct' types, 'printf' or 'malloc'.
The language here is Mojo, which the article seems to assume you know and doesn't say enough for you to deduce until half way through and after multiple code examples. I don't know how you're supposed to know this as even the blog it's on is mostly about Vale. From the intro I was expecting it to be about C++.
The author works for Modular. He shared the write up on the Mojo Discord. I think Mojo users were the intended audience.
> Eliminate redundant matrix operations (like two transposes next to each other)
In 2016, I was trying to construct orthogonal irreducible matrix representations of various groups (“irreps”). The problem was that most of the papers describing how to construct these matrices used a recursive approach that depended on having already constructed the matrix elements of a lower dimensional irrep. Thus the irrep dimension n became quite an annoying parameter, and function calls were very slow because you had to construct the irrep for each new group element from the ground up on every single call.
I ended up using Julia’s @generated functions to dynamically create new versions of the matrix construction code for each distinct value of n for each type of group. So essentially it would generate “unrolled” code on the fly and then use LLVM to compile that a single time, after which all successive calls for a specific group and irrep dimension were extremely fast. Was really quite cool. The only downside was that you couldn’t generate very high dimensional irreps because LLVM would begin to struggle with the sheer volume of code it needed to compile, but for my project at the time that wasn’t much of a concern.
> Mojo, D, Nim, and Zig can do it, and C++ as of C++20. There are likely some other languages that can do it, but these are the only ones that can truly run normal run-time code at compile time
Pretty sure Julia can do it.
https://bur.gy/2022/05/27/what-makes-julia-delightful.html confirms your hunch
For several years I have written a dedicated compiler for regular expressions. You basically pass a regular expression and get an object file containing optimized matching code. It uses LLVM library internally to perform optimizations and machine code generation. It should be generally faster to compile compared to solutions involving constexpr-based metaprogramming.
I am surprised, that there is no programming language doing similar stuff - having regular expressions which are compiled as native code instead of just using a runtime library like PCRE2. Implementing this in C++ or Rust should be relatively easy.
.NET had this since 2.0 if not 1.0
To tie this specific example to a larger framework: In scala land, Tiark Rompf's Lightweight Modular Staging system handled this class of metaprogramming elegantly, and the 'modular' part included support of multiple compilation targets. The idea was that one could incrementally define/extend DSLs that produce an IR, optimizations in that IR, and code generation for chunks of DSLs. Distinctions about stage are straight-forward type-signature changes. The worked example in this post is very similar to one of the tutorials for that system: https://scala-lms.github.io/tutorials/regex.html
Unfortunately, so far as I can tell:
- LMS has not been updated for years and never moved to scala 3. https://github.com/TiarkRompf/virtualization-lms-core
- LMS was written to also use "scala-virtualized" which is in a similar situation
There's a small project to attempt to support it with virtualization implemented in scala 3 macros, but it's missing some components: https://github.com/metareflection/scala3-lms?tab=readme-ov-f...
I'd love to see this fully working again.
I did something similar to this with C++ templates - it's Parsing Expression Grammar based, so not full regex, but enough for a lot of tasks:
and I've confirmed that it does all get inlined and optimized on -O3.JSON parser example here - https://github.com/aappleby/matcheroni/blob/main/examples/js...
> can compilers really execute general code at compile-time?
Cue the smug Lisp weenies laughing quietly in the background.
Gave me a smile to see the shout out to LISP in there.
Reading this take on it, it feels like a JIT compiler could also accomplish a fair bit of this? I'm also reminded of the way a lot of older programs would generate tables during build time. I'm assuming that is still fairly common?
Yes, this is all straightforward with Lisp macros. Beyond that, you can call the compile function in Common Lisp and do all this at run time too.
In fact, there's https://github.com/telekons/one-more-re-nightmare for CL.
Seems like an optimization that could be applied quite generally - as the author mentions at the end there’s lots of places this could be used.
The problem with applying this technique generally is the amount of code generated. But what if you can optimize that too.. perhaps share the common parts of the AST between the copies of the code that are generated, and overlay the changes with some datastructure.
https://github.com/hanickadot/compile-time-regular-expressio...
FYI: This is a C++ template version of compile time regex class.
A 54:47 presentation at CppCon 2018 is worth more than a thousand words...
see https://www.youtube.com/watch?v=QM3W36COnE4
followup CppCon 2019 video at https://www.youtube.com/watch?v=8dKWdJzPwHw
As the above github repo mentions, more info at https://www.compile-time.re/
Depending on the userbase of the site, simply checking for @gmail.com at the end, I'd bet, would result in a quick win, as well as restricting the username's alphabet to allowed Gmail characters.
The other optimization I'd guess at would be to async/thread/process the checking before and after the @ symbol, so they can run in parallel (ish). Extra cpu time, but speed > CPU cycle count for this benchmark.
[Rehashing an old comment]
In the math department, we had a Moodle the students in the first year of my university in Argentina.
When we started like 15 years ago, the emails of the students and TA were evenly split in 30% Gmail, 30% Yahoo!, 30% Hotmail and 10% others (very aproxímate numbers).
Now the students have like 80% Gmail, 10% Live/Outlook/Hotmail and 10% others/Yahoo. Some of the TA are much older, so perhaps "only" 50% use Gmail.
The difference is huge. I blame the mandatory gmail account for the cell phone.
So, checking only @gmail.com is too strict, but a first fast check for @gmail.com and later the complete regex may improve the speed a lot in the real word.
Maybe I am old, but I like to keep as much communication as possible going through the university email. It just feels more official somehow.
Tell us you used to work at Google, without telling us.
"simply do X" is such a programmer fallacy at this point I'm surprised we don't have a catchy name for it yet, together with a XKCD for making the point extra clear.
Tell us you don't actually work with any Google engineers... blah blah blah
The trope is "At Google we..." and then casually mention "violating" the CAP theorum with Spanner or something.
It is simple, and I really do hope any first year CS student could extract a substring from a string. Have LLMs so atrophied our programming ability that extraction of a substring is considered evidence of a superior programmer?
Having never heard of mojo before, I found this article fascinating. It provides a great example of how a toy regex parser works and an excellent explanation of why vanilla regex tends to be slow. It also presents a novel solution: compiling the regex into regular code, which can then be optimized by the compiler.
this is literally how 'lex' works. the one written in 1987 by Vern Paxson.
The original is 'lex', written in 1975 by Mike Lesk and Eric Schmidt.
Yes, that Eric Schmidt, CEO of Google.
1987 was the clone, 'flex' :-)
It did "compiling the regex into regular code, which can then be optimized by the compiler" before the C programming language as we know it was created. I think 'lex' was compiling regex to C before the C language even had 'struct' types, 'printf' or 'malloc'.