Curious if anyone has done experiments where they compare output of Python vs annotated Python. Do type hints help LLMs?