User can give example documents... have sbert test against dewey decimal classifications, or library of congress, also categories for home users (bills, manuals, bank documents, etc) and standard business categories (HR, production and so on). User verifies categories.

Onboarding would ask user about their work, research, hobby interests. LLM could generate word lists asking user if it matches their understanding. And so on.

Also open document format (CSV, TSV too).