I just noticed it takes literally ~5 minutes to train millions parameters on slow CPU...but before you call Yudkowsky that "it's over", an important note: the main bottleneck is the corpus size, params are just 'cleverness' but given limited info it's powerless.

Anyway, here is the project:

https://github.com/bggb7781-collab/lrnnsmdds/tree/main

couple of notes:

1. single C file, no dependencies. Below are literally all the "dependencies", not even custom header (copy paste from the top of the single c file):

#define _POSIX_C_SOURCE 200809L

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> #include <time.h> #include <stdint.h> #include <stdbool.h> #include <float.h> #include <getopt.h> #include <errno.h>

4136 lines of code in one file at the moment, that's all.

2. easiest way to compile on Windows: download Cygwin (https://www.cygwin.com/), then navigate to the directory where your lrnnsmdds.c file is and just run gcc on it with some optimizations, such as:

gcc -std=c17 -O3 -march=native --fast-math -o lrnn lrnnsmdds.c -lm

On Linux just run gcc, if for whatever reason you don't have gcc on Linux do sudo && apt-get install gcc --y ,or something...

On Apple: i've no idea or maybe just use vmware and install ubuntu and then run it.

Of course you can 'git clone' and go to the dir, but again: it's one file! copy it...

The repo has tiny toy corpus included where i've borrowed (hopefully it's not plagiarism!) the name "John Gordon" from one of my favorite books "Star Kings", by E. Hamilton. Just the first and last name are copied, the content is unique (well several poorly written sentences by myself...). Obviously it will overfit and result on copy-paste on such small corpus, the sole goal is to check if everything runs and not if it's the A-G-I. You'd need your own 100kb+ if you want to generate unique meaningful text.

3. why/what/when/how?

The github repo is self-explanatory i believe about features, uses and goals but in an attempt to summarize:

My main motivation was to create a fast alternative to transformers which works on CPU only, hence you see the bizarre/not-easy task of doing this in C and not python and the lack of dependencies. In addition I was hoping it will also be clever alternative hence you see all those features more stacked than 90s BMW 850. The 'reservoir' is the most novel feature though, it offers quick exact recall arguably different than RWKV 8 or the latest Mamba, in fact name of the architecture SMDDS comes from the first letters of the implemented features:

* S. SwiGLU in Channel Mixing (more coherence) * M. Multi-Scale Token Shift (larger context) * D. Data-Dependent Decay with Low-Rank (speed in large context) * D. Dynamic State Checkpointing (faster/linear generation) * S. Slot-memory reservoir (perfect recall, transformers style).

If you face some issue just email me (easiest).

the good, the bad the ugly:

It is more or less working text-to-text novel alternative architecture, it's not trying to imitate transformers nor LSTM, Mamba, RWKV though it shares many features with them - the bad is that it's not blazing fast, if you're armed with ryzen/i7 16 cores or whatever and patience you can try training it on several small books via word tokenizer and low perplexity (under 1.2...) and see if it looks smarter/faster. Since this is open source obviously the hope is to be improved: make it cuda-friendly, improve the features, port to python etc.

Depending on many factors I may try to push for v2 in July, August, September. My focus at the moment will be to test and scale since the features are many, it compiles with zero warnings on the 2 laptops i've tested(windows/cygwin and ubuntu) and the speed is comparable to transformers. 10x!

This file _really_ needs a license header. Anyone who's at least marginally license-conscience is not going to touch this without a license declaration in the source file.

Edit: i now see there's a separate LICENSE file in the github repo, but (A) that's not what this post directly links to, nor (B) is there any mention of that license in the source file.

hmmm 10x, i added this line on top in the comment on top inside the file:

"PolyForm Noncommercial License 1.0.0"

(free for non-commercial) ^ I'm not sure it's the best license though, actually gpt 5.4 says it's a bad idea, but after a long puzzling wheather to make it GPL free for everything or refrain from posting on github at all i picked the middleground...i may change it later.