MIT Researchers Pushing Machine Learning to Speed Drug Development

Designing new molecules for pharmaceuticals is primarily a manual, time-consuming process that’s prone to error. But MIT researchers have now taken a step toward fully automating the design process, which could drastically speed things up — and produce better results.

Drug discovery relies on lead optimization. In this process, chemists select a target (“lead”) molecule with known potential to interact with a specific biological target, then tweak its chemical properties for higher potency and other factors.

Chemists use expert knowledge and conduct manual tweaking of the structure of molecules, adding and subtracting functional groups — groups of atoms and bonds with specific properties. Even when they use systems that predict optimal desired properties, chemists still need to do each modification step themselves. This can take a significant amount of time at each step and still not produce molecules with desired properties.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Department of Electrical Engineering and Computer Science (EECS) have developed a model that better selects lead molecule candidates based on desired properties. It also modifies the molecular structure needed to achieve a higher potency, while ensuring the molecule is still chemically valid.

The model basically takes as input molecular structure data and directly creates molecular graphs — detailed representations of a molecular structure, with nodes representing atoms and edges representing bonds. It breaks those graphs down into smaller clusters of valid functional groups that it uses as “building blocks” that help it more accurately reconstruct and better modify molecules.

“The motivation behind this was to replace the inefficient human modification process of designing molecules with automated iteration and assure the validity of the molecules we generate,” says Wengong Jin, a PhD student in CSAIL and lead author of a paper describing the model that’s being presented at the 2018 International Conference on Machine Learning in July.

Joining Jin on the paper are Regina Barzilay, the Delta Electronics Professor at CSAIL and EECS and Tommi S. Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science in CSAIL, EECS, and at the Institute for Data, Systems, and Society.

The research was conducted as part of the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium between MIT and eight pharmaceutical companies, announced in May. The consortium identified lead optimization as one key challenge in drug discovery.

“Today, it’s really a craft, which requires a lot of skilled chemists to succeed, and that’s what we want to improve,” Barzilay says. “The next step is to take this technology from academia to use on real pharmaceutical design cases, and demonstrate that it can assist human chemists in doing their work, which can be challenging.”

“Automating the process also presents new machine-learning challenges,” Jaakkola says. “Learning to relate, modify, and generate molecular graphs drives new technical ideas and methods.”

Read the source article at MIT News.