It's a general heuristic for any task.

https://docs.aws.amazon.com/nova/latest/userguide/fine-tune-...

> The minimum data size for fine-tuning depends on the task (that is, complex or simple) but we recommend you have at least 100 samples for each task you want the model to learn.

https://platform.openai.com/docs/guides/supervised-fine-tuni...

> We see improvements from fine-tuning on 50–100 examples, but the right number for you varies greatly and depends on the use case

https://pmc.ncbi.nlm.nih.gov/articles/PMC11140272/

> Model thresholds indicate points of diminishing marginal return from increased training data set sample size measured by the number of sentences, with point estimates ranging from 439 sentences for RoBERTa_large to 527 sentences for GPT-2_large.

> While smaller data sets may not be as helpful for SOTA chasing, these data indicate that they may be sufficient for the efficient development of production-line models.

Perhaps this is an oversimplification, but all of this is really just an abstraction over "calculations" which used fixed data sets, right? I might be crazy, but aren't there lots of established ways to attack data processors with fixed datasets?

Example: algorithm (A) processes dataset (D) to create output (O). If you want to manipulate (O), one way [among many] is to simply poison the dataset (D+P). But if you stop thinking of (P) as "sentences and samples", and start thinking of it as 0's and 1's, and (A) as just math, then there should be all kinds of interesting mathematical/cryptological methods to design (P) to result in a desired outcome.

In other words, it's just math. Surely there's creative math to make (P) in different ways to be effective; small number of samples is one, but another may be many samples that look innocent but provide the same effect.

Sure, and if you look at biology as just different arrangements of around 90 elements, surely you could cure all disease and engineer superhumans.