Does this work by just training once with next token prediction? Want to understand better how it creates fluent sentences if anyone can provide insights.