The .md task format as persistent source of truth is the key insight. Context rot is a real problem mid-sprint when agents lose track of earlier decisions. Do you diff against previous state if the agent rewrites a task file mid-execution, or does the human review before committing?