Hacker News

Author here.

Thank you for all the good and curious comments.

For 72B models, around *36GB memory works fine* by the way. I ran the benchmark and shared the results on the website: https://opengraviton.github.io/index.html

While working on this research I realized something important: the way most current models are trained is extremely inefficient. Because of that, I started developing *graviton-native*, which trains AI models from scratch using more efficient architectures.

The idea is to design models that are optimized for efficiency from the beginning. My expectation is that this approach could bring around *~70% efficiency improvement*. Combined with OpenGraviton, I believe this could eventually make it possible to run *trillion-parameter scale models locally*.

You can find the paper here: https://opengraviton.github.io/paper.html

And the repository here: https://github.com/opengraviton/graviton-native

Right now I’m training a *72B model* using this approach. I’ll share the results soon and update the website.