Had to fix hardware detection myself only to find engine.generate()'s not implemented and yields "".

Maybe the author could get a large param model to help him get this done though.

Happy to help if needed. The project is already tested and benchmarked with several models and everything is working as expected. If you run into any specific issues, feel free to open an issue or PR.