So the same concept of an LLM training and inferring tokenized language, it’s doing tokenized aminos. Instead of artificial intelligence/language it’s doing artificial evolution/life I guess?
2. A biotech startup used this model to engineer a protein that converts adult cells into stem cells, at a higher efficiency than existing techniques. (But still only a tiny fraction of cells convert)
> We initialized it from a scaled-down version of GPT‑4o to take advantage of GPT models’ existing knowledge, then further trained it on a dataset composed mostly of protein sequences, along with biological text and tokenized 3D structure data, elements most protein language models omit.
> A large portion of the data was enriched to contain additional contextual information about the proteins in the form of textual descriptions, co-evolutionary homologous sequences, and groups of proteins that are known to interact.
These bits made me wonder what would have happened if they had only used the supplementary biological data with an untrained LLM model.
So the same concept of an LLM training and inferring tokenized language, it’s doing tokenized aminos. Instead of artificial intelligence/language it’s doing artificial evolution/life I guess?
If I’m understanding this right:
1. They have a protein model similar to AlphaFold
2. A biotech startup used this model to engineer a protein that converts adult cells into stem cells, at a higher efficiency than existing techniques. (But still only a tiny fraction of cells convert)
Application to life extension seems speculative.
Yes, that's how I read it too.
Seems odd that OpenAI would want to get involved in this space, feels like Deepmind has a huge headstart already.
> We initialized it from a scaled-down version of GPT‑4o to take advantage of GPT models’ existing knowledge, then further trained it on a dataset composed mostly of protein sequences, along with biological text and tokenized 3D structure data, elements most protein language models omit.
> A large portion of the data was enriched to contain additional contextual information about the proteins in the form of textual descriptions, co-evolutionary homologous sequences, and groups of proteins that are known to interact.
These bits made me wonder what would have happened if they had only used the supplementary biological data with an untrained LLM model.
[flagged]
You are probably going to be downvoted this, and with some justification even (i.e. implying something about visitors).
Anyways, this is old and the original post didn't get much attention anyways: https://news.ycombinator.com/item?id=44985844
[flagged]
Please don't do this here.
Edit: we asked you to stop posting unsubstantive comments quite recently (https://news.ycombinator.com/item?id=44832751) and have done so many times before (https://news.ycombinator.com/item?id=33700337).
If you continue doing this, we'll end up banning you. I don't want to ban you, so if you'd please review https://news.ycombinator.com/newsguidelines.html and fix this, that would be good.