Hasn't that been scanned by Google already? Their model should be trained on most of those texts already.