So this happened: DeepMind (with 48 authors, including a new member of the British nobility) decided to compete with me. Or rather, with some of my work from 10+ years ago. Apparently, AlphaFold 3 can now predict how any drug-like molecule will bind to its target protein. And it does so better than AutoDock Vina (the most cited molecular docking program, which I built):
I agree with the skepticism of the author. Due to the lack of clarity on training/test bias avoidance and evaluation methods, it's not yet clear whether alphafold 3 performance in protein-ligand interaction are conclusive.
I have seen this first hand when working with CHEMBL data, a naive data-split will cause you to overestimate your model performance as (1) assay results for a particular ligand can be obtained from same/different experiments which leads to data leakage (2) many assays test for ligands with similar molecular structures, and these similar structures can end up in different train/test datasets.
One thing I would also like to see the alphafold team address is model generalization. Here is a good experiment:
1. Take your train and test ligands.
2. Compute the mean normalized jaccard index between each test ligand and train ligands.
3. Bucketize the test ligands for every 10th percentile of the normalized Jaccard Index difference.
4. Report the performance comparing alpha fold and vina.
In other words, to test model generalizability, we want to see the extent to which ligands have to be similar to training before we see gains with respect to simulation methods (e.g. vina). From experience, I expect to see some signals of overfitting.
Also, thank you for your work on vina, I learned quite a lot about this space just from reading your repo code!
I agree with the skepticism of the author. Due to the lack of clarity on training/test bias avoidance and evaluation methods, it's not yet clear whether alphafold 3 performance in protein-ligand interaction are conclusive.
I have seen this first hand when working with CHEMBL data, a naive data-split will cause you to overestimate your model performance as (1) assay results for a particular ligand can be obtained from same/different experiments which leads to data leakage (2) many assays test for ligands with similar molecular structures, and these similar structures can end up in different train/test datasets.
One thing I would also like to see the alphafold team address is model generalization. Here is a good experiment:
1. Take your train and test ligands.
2. Compute the mean normalized jaccard index between each test ligand and train ligands.
3. Bucketize the test ligands for every 10th percentile of the normalized Jaccard Index difference.
4. Report the performance comparing alpha fold and vina.
In other words, to test model generalizability, we want to see the extent to which ligands have to be similar to training before we see gains with respect to simulation methods (e.g. vina). From experience, I expect to see some signals of overfitting.
Also, thank you for your work on vina, I learned quite a lot about this space just from reading your repo code!