The non-professional side of Organic Chemistry is one place where I think AI would really shine.
Feels complex like solving a Rubik's cube to write down synthesis steps but it is all a sequence of memorized tricks. Do Cannizaro if you want this, Bergmann to do that.
But the synthesis plan is only 10% of the actual work.
The gap between writing down the synthesis step and actually doing it is also extremely large.
Even if you get the right molecule, it might be the wrong way around or just clump up into a useless mess.
The Ritonavir episode of Veritasium is a great example of how all chemistry on paper is a mere shadow of what actually happens in real life.
> Feels complex like solving a Rubik's cube to write down synthesis steps but it is all a sequence of memorized tricks. Do Cannizaro if you want this, Bergmann to do that.
I remember two years ago, when I actually got into using graph data structures, wondering if maybe the "space" of available reactions for any given starter and target molecules could be mapped as a graph, with intermediates as nodes and reactions as weighted directed edges, so synthesis becomes pathfinding through chemical space.
Turns out, it’s a thing! [^0]
Edit: Makes you wonder how much interesting stuff is sitting in plain sight, waiting for someone with the right cross-domain awareness / knowledge / whatever to notice it.
[0]: https://pmc.ncbi.nlm.nih.gov/articles/PMC9574932/
There is a lot of graph theory in Chemistry - modelling chemicals as (vertex/edge coloured) graphs, reaction networks, etc.
Of course some molecules (eg aromatic systems, like ferrocene) are not naturally representable as graphs. I wonder if it is the same with synthesis - are there reactions hard to model as a graph (or petri net or whatever). One simple example I know is that you have to be careful with including a node for 'water' as it gets connected to everything else! Or at least in biochemistry it does.
Why is ferrocene ungraphable or in this context unable to be modelled in that way?
I meant metallocenes in general:
https://en.wikipedia.org/wiki/Metallocene
A metal atom sandwiched between two Cp rings. You _can_ model this as 5 single bonds between each atom of a ring (so 10 total C-M bonds), or you have to have some kind of 'edge' (bond) between the ring as a whole and the metal.
The more general issue is that a graph model of a chemical assumes a 'bond' is between exactly two atoms. Three-center hydrogen bonds are another example where this model fails to capture the chemistry very well.
Of course, it's a tradeoff - you can model _most_ compounds with just graphs (plus atom type, charge, chirality) and the relatively few that do not quite fit are special cases.
Hamilton Morris and his stuff on clandestine chemistry is super interesting in this domain. Sometimes the chemistry is straight forward but access to certain chemicals is hard, so the procedure must change based on what's available not necessarily what's ideal
> Even if you get the right molecule, it might be the wrong way around or just clump up into a useless mess.
Sounds a lot like vibe coding lol
Modern biochemistry (so far) IS vibe coding lol. You mostly have vibes on how the chemistry should work, based on (very strong) natural evidence coupled with theoretical development and lab studies. Then you mix and match, goading bacteria and praying that they produce what you want in good measure. Then you take their secretions and run chromatography studies on them to check if that's what you actually want, or whether it's just some random bullshit. If it's the latter, you have to toss that out and start all over again.
Or in some instances that "random bullshit" turns out to be something actually novel and/or useful.
At least vibe coding can only explode in your face metaphorically.
Until vibe code is used for weapons system, or explosive manufacturers, or.. or...
The world today is coding.
Already is. Missile that hit the Iranian girl's school was vibe targeted.
Organic chemistry seems like a discipline better done by chemists than forward deployed staff with their payoff function sharply truncated at an IPO which at this point may or may not happen on schedule.