Software Mines Science Papers to Make New Discoveries

November 26th, 2013

In other news: Majority of Landmark Cancer Studies Cannot Be Replicated.

Via: MIT Technology Review:

Software that read tens of thousands of research papers and then predicted new discoveries about the workings of a protein that’s key to cancer could herald a faster approach to developing new drugs.

The software, developed in collaboration between IBM and Baylor College of Medicine, was set loose on more than 60,000 research papers that focused on p53, a protein involved in cell growth implicated in most cancers. By parsing sentences in the documents, the software could build an understanding of what is known about enzymes called kinases that act on p53 and regulate its behavior; these enzymes are common targets for cancer treatments. It then generated a list of other proteins mentioned in the literature that were probably undiscovered kinases, based on what it knew about those already identified. Most of its predictions tested so far have turned out to be correct.

“We have tested 10,” Olivier Lichtarge of Baylor said Tuesday. “Seven seem to be true kinases.” He presented preliminary results of his collaboration with IBM at a meeting on the topic of Cognitive Computing held at IBM’s Almaden research lab.

Lichtarge also described an earlier test of the software in which it was given access to research literature published prior to 2003 to see if it could predict p53 kinases that have been discovered since. The software found seven of the nine kinases discovered after 2003.

“P53 biology is central to all kinds of disease,” says Lichtarge, and so it seemed to be the perfect way to show that software-generated discoveries might speed up research that leads to new treatments. He believes the results so far show that to be true, although the kinase-hunting experiments are yet to be reviewed and published in a scientific journal, and more lab tests are still planned to confirm the findings so far. “Kinases are typically discovered at a rate of one per year,” says Lichtarge. “The rate of discovery can be vastly accelerated.”

Lichtarge said that although the software was configured to look only for kinases, it also seems capable of identifying previously unidentified phosphatases, which are enzymes that reverse the action of kinases. It can also identify other types of protein that may interact with p53.

The Baylor collaboration is intended to test a way of extending a set of tools that IBM researchers already offer to pharmaceutical companies. Under the banner of accelerated discovery, text-analyzing tools are used to mine publications, patents, and molecular databases. For example, a company in search of a new malaria drug might use IBM’s tools to find molecules with characteristics that are similar to existing treatments. Because software can search more widely, it might turn up molecules in overlooked publications or patents that no human would otherwise find.

Leave a Reply

You must be logged in to post a comment.