Language of chemistry is unveiled by molecular make-up

Posted: Published on August 8th, 2014

This post was added by Dr P. Richardson

DO YOU speak chemistry? Analysing molecular structures as if they were sentences has revealed hidden "words" that are key to their make-up.

The approach suggests that algorithms like the ones Google uses in search engines might reveal ways to mix up molecules or invent drugs.

Linguists can analyse text by ranking words according to how often they appear. This "bag-of-words" approach can help to distinguish different kinds of texts. For example, the word rankings for spam emails are different to those of genuine messages, which helps algorithms filter out spam.

Bartosz Grzybowski of Northwestern University in Evanston, Illinois, and his colleagues wondered if a similar approach could find the most important parts of a molecule. "Chemistry is about recognising certain patterns of atoms," he says. "In linguistics we also recognise patterns: those are words."

The team took thousands of molecules, each representing a sentence, and applied a bag-of-words algorithm. By comparing pairs of molecules, they noted arrangements of atoms that appear in both, like a ring of carbons or a particular group that connects to oxygen and hydrogen, and ranked the frequency of these common fragments.

The researchers had expected the "words" to correspond to functional groups clusters of atoms that chemists recognise as controlling a molecule's chemical reactions. But surprisingly, it was other, larger fragments that seemed to make up crucial chemical "words". The distribution of the functional groups was much less language-like.

To test the chemical dictionary that this produced, the team borrowed the logic of search engines to find the fragments that carry the most information. Search engines serve up the most relevant sites by looking at how often your search term appears on a particular page in comparison with the internet as a whole. For example, the word "the" crops up frequently across all internet pages, so it doesn't carry much weight in determining what a page is about. The same technique is used to produce "word clouds" that can visually summarise a document.

So the team ran an algorithm to identify the fragments in the molecule with the highest information content. When chemists synthesise complex molecules from scratch, they look for key bonds that connect simpler compounds to serve as building blocks. It turned out that the bonds connecting the most informative fragments were often these key bonds.

In a test of 68 molecules, a panel of 10 chemists agreed that one of the top three bonds the algorithm chose was an important bond for 66 of the molecules. "The most informative ones appear to be the best," says Grzybowski (Angewandte Chemie, doi.org/f2s2vc). This shows the algorithm, with no chemical knowledge of its own, can replicate some of the skill of human chemists.

"We're trained to recognise patterns as organic chemists, and the patterns are related to the functionalities," says Robert Paton of the University of Oxford. "This approach is obviously not limited by the constraints of the human mind, so it can pick out unique fragments that you might not always spot."

Read the rest here:
Language of chemistry is unveiled by molecular make-up

Related Posts
This entry was posted in Chemistry. Bookmark the permalink.

Comments are closed.