It doesn't matter how accurate the models are, it's not a "data set" (in the scientific sense), it's more of a conclusion set. Maybe the conclusions are spot on. Maybe not. I have no idea.

Right. At my most generous, this is a dataset about LLM behavior when asked to infer nutritional value. It is in no way a nutrition dataset. It is perhaps useful as half of a benchmark for accuracy, compared to actual ground truth. Unlike a scientist, you're not motivated or resourced enough to create the ground truth dataset. So you took a shortcut and hid it from the landing page.

This workflow, this motivation, this business model, this marketing is an affront to truth itself.

I think there is a real conversation to be had about “data” in a post LMM world, but I actually don’t care about debating definitions here, I care about whether the product works within a reasonable margin of error.

I envisioned many lines of inquiry from HN but the idea that a compressed TSV of nutritional data is not a "dataset" (definition: a collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer) was unexpected.

Your response is such a perfect example of why the "data science" movement is a cancer on actual science. So many graduate from programs and boot camps (or just read blog posts) that teach them all the technical mechanics of working with data, but nothing about actual science.

You sound like you're having a bad day. Go take a walk, its just someones side project on HN. They arent trying to destroy science for you, they were simply sharing something they enjoyed building. You dont have to use it or like it, but it has nothing to do with "science". Its not that deep bro.

The problem is that it’s _not_ simply data. Definition: is information collected from the world.

This is data from the world that has altered and augmented with stuff from a model. The informational content has been altered by stuff not from the world. Therefore it’s no longer data, according to the above definition.

That isn’t to say that it can’t be useful, or anything like that. But it’s _not_ information collected from the world. And that’s why people who care about science and a strict definition of data would be offended by calling this a dataset.

FWIW, I like that you include water content, libraries like google's health connect seem to have completely separate data structures for nutrition and hydration.

Thank you :)

Ignore them. Congratz on finishing your project!

> a compressed TSV of nutritional data

What is the source of that nutritional data?

There are many HN users who are opposed to LLM.

Some of them are fundamentalists, and no amount of reason will reach them (read the comments on the Ghibli-style images to get a sample), others are opposed for very self-interested reasons: "It is difficult to get a man to understand something when his income depends on his not understanding it"

Yesterday, I vibe coded a DNS server in python from scratch in half a day (!) and it works extremely well after spending a few minutes on manually improving a specific edge case for reverse DNS using AAAA records: dig -x requests use the exploded form in the ip6.arpa, while I think it's better for the AAAA entries to keep using the compressed form, and I wanted to generate the reverse algorithmically from AAAA and A records.

Just ignore them, as your approach is sound: I have experience creating, curating and improving datasets with LLMs.

Like vibe coding, it works very well if you know what you are doing: here, you just have to use statistics to leverage the non deterministic aspects of AI in your favor.

Good luck with your app!

> Like vibe coding, it works very well if you know what you are doing (emphasis mine)

This is true of so very many things involving computers (and tools in general, really) and LLMs are no exception. Just like any tool, "knowing what you are doing" is the really important part, but so many folks are convinced that these "AI" things can do the thinking part for them, and it's just not the case (yet). You gotta know what you're doing and how to properly use the tool to avoid a lotta the "foot-guns" and get the most benefit outta these things.