Mar 122020
 

CrystEngComm, 2020, DOI: 10.1039/D0CE00111B

The performance of a model is dependent on the quality and information content of the data used to build it. By applying machine learning approaches to a standard chemical dataset, we developed a 4-class classification algorithm that is able to predict the hydrogen bond network dimensionality that a molecule would adopt in its crystal form with an accuracy of 59% (in comparison to a 25% random threshold), exclusively from two and lower dimensional molecular descriptors. Although better than random, the performance level achieved by the model did not meet the standards for its reliable application. The practical value of our model was improved by wrapping the model around a confidence tool that increases model robustness, quantifies prediction trust, and allows one to operate a classifier virtually up to any accuracy level. Using this tool, the performance of the model could be improved up to 73% or 89% with the compromise that only 34% and 8% of the total set of test examples could be predicted. We anticipate that the ability to adjust the performance of reliable 2D based models to the requirements of its different applications may increase their practical value, making them suitable to tasks that range from initial virtual library filtering to profile specific compound identification.

Oct 042019
 

Noah is researching development of better descriptors of molecules for use in machine learning to prediction of crystal properties. Despite being stuck in the basement, Noah’s favourite building on the Chemistry estate is the CRL. His favourite functional is B3LYP and when not in the lab he plays hockey and 5-a-side football.

Oct 042019
 

Aditya is developing machine learning methods for predicting the classification of various crystallographic properties. He mostly works with Python 3 and associated ML and DL libraries.

In his spare time he enjoys baking and hiking. His go to space group is Fmmm and his favourite intermolecular interaction is pi-pi stacking – classic.

Oct 032019
 

George is investigating the extent to which macroscopic material properties (e.g., habit) can be predicted from information about the constituent molecules of a material. The association is expected to be weak but significant and could be of use for in-silico screening of molecules for particular properties.

Oct 052017
 

George is following up his Part II year in the group with a DPhil project in collaboration with ANSTO neutron scattering facility in Sydney. He is spending most of 2019 at ANSTO collecting data on crystalline materials that need neutron diffraction to exctract crucial infomation about their structure.

In his spare time George may be found running.

Publications

A hexagonal planar transition-metal complex

Jul 062017
 

CrystEngComm, 2017, 19, 5336 – 5340 [ doi:10.1039/C7CE00587C ]

A data-driven approach to predicting co-crystal formation reduces the number of experiments required to successfully produce new co-crystals. A machine learning algorithm trained on an in-house set of co-crystallization experiments results in a 2.6-fold enrichment of successful co-crystal formation in a ranked list of co-formers, using an unseen set of paracetamol test experiments.

Jun 202017
 

CrystEngComm, 2017,19, 3737-3745 [ doi:10.1039/C7CE00738H ]

We present here the crystallisation outcomes for 319 publicly available compounds in up to 18 different solvents spread over 5710 individual single solvent evaporation trials. The recorded data is part of a much larger, corresponding in-house database and includes both positive as well as negative crystallisation outcomes. Such data can be used for statistical analyses of solvent performances, machine learning approaches or investigation of the crystallisation behaviour in structurally similar compound classes.

The presented data suggests that crystallisation behaviour in different solvents is not correlated with chemical similarity among clusters of highly similar compounds. Further, our machine learning models can be used to guide the solvent choice when crystallising a compound. In a retrospective evaluation, these models proved potent to reduce the workload to a third of our initial protocol, while still guaranteeing crystallisation success rates greater than 92%.

Publisher’s copy