This is very typical in reinforcement learning. You just expand the state to include some more time periods. It definitely raises some academic eyebrows (since it’s not technically memory less), but hey if it works, it works

It is memoryless just in a different state space.

In the state space that includes all of the time periods, with infinitesimal granularity since the birth of the universe.

Welcome to a well formulated POMDP.