03. May 2018

Search Engines for Personalised Cancer Medicine

HU computer scientists want to facilitate decision-making of medical professionals by improved data analysis

Ulf Leser

When a patient is diagnosed with a malignant tumour, the question is what the appropriate treatment is: operation or therapy? Which one is better? Ulf Leser, a professor for knowledge management in bioinformatics at the Humboldt-University’s Department of Computer Science, wants to support doctors making this decision by developing specialised search engines. His software helps to sift through a multitude of publications and databases and identify those that contain information on successful treatment of specific diseases.

It sounds like an easy enough task seeing as PubMed, an open digital repository, archives most medical publications. However, its search engine sorts results chronologically and not according to medical relevance. Moreover, it yields vast amounts of information. Every year, a million scientific papers are added to the 30 million that are currently in storage. Common tumours are researched by hundreds of research groups worldwide, who ceaselessly publish new findings from dozens of clinical studies and an even larger number of sequencing projects. While this is an encouraging development, it also puts doctors in a dilemma. “It is hard to stay on top of developments in their field, even for experts,” says Leser. “With complicated cases, doctors spend between 30 minutes and one day on a patient. We want to save them time and improve the results.”

The focus is on searching for information on tumours with specific genetic mutations, which are not susceptible to standard therapy. Increasingly, new drugs are developed that bind to certain genes but only if they are mutated in a certain way.

The sequencing of a tumour’s genome results in a list of anything between a few dozen and thousands of mutations. Ideally, doctors must extract those that provide a starting point for therapy. “The aim of our projects PERSONS and PREDICT, which both consist of interdisciplinary teams, is to support this extraction process,” says Leser. The Charité in Berlin as well as the University of Tübingen and the University Hospital Tübingen are part of the project. “Our software does not make decisions. It provides the doctors making decisions with comprehensive and up-to-date data,” Leser emphasises. “What does the community know about specific mutations, their clinical implications and their effects on disease progression? How can we provide doctors with this information in a way that is quick, intuitive, clear and well-presented?

This requires optimising search engines – the programming of which is quite tricky. The first challenge involves finding suitable keywords. The existing literature is by no means consistent in naming genes and mutations. The average human gene has about eight names, which are often extremely similar to those of other genes, or identical to the names of genes in other mammals. The names of diseases also differ: one may say intestinal cancer, or colorectal carcinoma, both may also be metastatic. The researcher’s main focus is on finding publications and databases that are clinically relevant to a given patient.

These are two of Ulf Leser’s larger integration and text mining problems. The computer scientist and his team develop methods to, one, find relevant publications automatically. Secondly, they extract relevant data: on genes, mutations, diseases, medication, et cetera. The researchers employ machine learning methods, that is they teach a software based on a set of thousands of clinically relevant studies. A so-called classifier looks at individual words, measures the word frequency in a given text and compares it to thousands of documents that are not clinically relevant. Based on these learned differences, the system can determine whether a new publication may be deemed clinically relevant with a certain degree of probability. The team of researchers recently finished developing the first version of their search engine, which is now being tested and evaluated by doctors. Leser concludes: “The accuracy of our results is promising and significantly better than PubMed, the de facto standard search engine for oncologists.”

By Uta Deffke for Adlershof Journal