There are 2 unavoidable atomic updates for RC, the allocation and the free event. That alone will significantly increase the amount of traffic per thread back to main memory.
A lifetime system could possibly eliminate those, but it'd be hard to add to the JVM at this point. The JVM sort of has it in terms of escape analysis, but that's notoriously easy to defeat with pretty typical java code.
Why would an allocation require an atomic write for a reference count?
Swift routinely optimizes out reference count traffic.
> Why would an allocation require an atomic write for a reference count?
It won't always require it, but it usually will because you have to ensure the memory containing the reference count is correctly set before handing off a pointer to the item. This has to be done almost first thing in the construction of the item.
It's not impossible that a smart compiler could see and remove that initialization and destruction if it can determine that the item never escapes the current scope. But if it does escape it by, for example, being added to a list or returned from a function, then those two atomic writes are required.