When will the rest be scanned and incorporated into the LLM training corpus?