You can optimize further by unrolling the loop. For example:

  .L2:
        move.w d1,(a0)
        move.w d1,(a0)
        move.w d1,(a0)
        move.w d1,(a0)
        dbra d0,.L2
        rts

But what about the effect on cache… oh, wait!

;-)