Depends on the model size, batch size, input sequence length, ... etc. With a small model like this you'll never get a 'good' output but you can maximise its potential.
Depends on the model size, batch size, input sequence length, ... etc. With a small model like this you'll never get a 'good' output but you can maximise its potential.