I've been experimenting since 2019 with ways to minimize RAM usage for tiny MLP inference on microcontrollers. [0]

This project is the result of that exploration: a fully static-allocation approach to MLP inference in ANSI C, using a simple 2-slot ring buffer to keep memory usage predictable and extremely low, while at the same time fast.

I believe this is close to the practical lower bound for RAM usage in general-purpose CPU MLP inference without sacrificing speed or introducing runtime complexity.

A more aggressive approach I've previously used is allocating and freeing memory per layer-to-layer pair during inference, but that introduces overhead and fragmentation if not used carefully. [1]

Curious how it compares to other minimal inference implementations people have seen (or built). Feedback and edge cases welcome. Hope you like it. Have fun. <3

[0]: https://github.com/GiorgosXou/NeuralNetworks#-research [1]: look for REDUCE_RAM_DELETE_OUTPUTS in the source of [0]

[dead]