Hacker News

A nice way to use traditional ML models today is to do feature extraction with a LLM and classification on top with trad ML model. Why? because this way you can tune your own decision boundary, and piggy back on features from a generic LLM to power the classifier.

For example CV triage, you use a LLM with a rubric to extract features, choosing the features you are going to rely on does a lot of work here. Then collect a few hundred examples, label them (accept/reject) and train your trad ML model on top, it will not have the LLM biases.

You can probably use any LLM for feature preparation, and retrain the small model in seconds as new data is added. A coding agent can write its own small-model-as-a-tool on the fly and use it in the same session.

benrutter 2 days ago [ - ]

What do you mean by "feature extraction with an LLM?". I can get this for text based data, but would you do that on numeric data? Seems like there are better tools you could use for auto-ML in that sphere?

Unless by LLM feature extraction you mean something like "have claude code write some preprocessing pipeline"?

visarga a day ago [ - ]

It's for unstructured inputs, text and images, where you need to extract specific features such as education level, experience with various technologies and tasks. The trick is to choose those features that actually matter for your company, and build a classifier on top so the decision is also calibrated by your own triage policy with a small training/test set. It works with few examples because it just needs a small classifier with few parameters to learn.

mirsadm 2 days ago [ - ]

Isn't the whole point for it to learn what features to extract?