I'm pretty sure that most cases of x86 reordering issues are a matter of the compiler reordering things, which isn't (afaik) solved with just "volatile". Caveat - haven't dealt with this for at least over a year (multicore sync without using OS primitives in general).
Literally the volatile keyword in Java is to make the Java compiler aware in order to insert memory barriers. That only guarantees consistent reads or writes, but it doesn't make it thread safe (i.e. write after read), that's what atomics are for.
Also not only compilers reorder things, most processors nowadays do OoOE; even if the order from the compiler is perfect in theory, different latencies for different instruction operands may lead to execute later things earlier not to stall the CPU.
Note that this is only true of the Java/C# volatile keyword. The volatile keyword in C/C++ is solely about direct access in hardware to memory-mapped locations, for such purposes as controlling external devices; it is entirely unrelated to the C11 memory model for concurrency, does not provide the same guarantees, and should never be used for that purpose.
x86 has a Total Store Order (TSO) memory model, which effectively means (in a mental model where only 1 shared memory operation happens at once and completes before the next) stores are queued but loads can be executed immediately even if stores are queued in the store buffer.
On a single core a load can be served from the store buffer (queue), but other cores can't see those stores yet, which is where all the inconsistencies come from.