UQuery: Leveraging All Clinical Information to Benefit Patients
The amount of data we generate has increased exponentially in recent years, and one of the key contributing factors to this trend is the health sector. However, a growing proportion of that information is unstructured, like texts. It is difficult for a machine to understand and extract value automatically from them, as human language is very complex. Natural Language Processing technologies come into play to overcome that obstacle.
Applying these technologies, with the medical consultancy of Azierta, GMV has managed to structure textual health data to facilitate the analysis and exploitation of clinical information on patients with Renal Cell Carcinoma (RCC), with a view to offering them personalized treatments. That would not have been possible without the invaluable collaboration of Joaquín Carballido Rodríguez, Head of the Urology Service at the Puerta de Hierro University Hospital in Madrid, which provided the data on nearly 600 patients with malignant renal neoplasms collected over a period of 10 years. The forms, report, and medical notes were interpreted and exploited with GMV’s natural processing technologies and the knowledge of doctor and researcher Eduardo Ródenas and his team.
Development consisted of three clearly differentiated phases. During the first one, a Renal Cell Carcinoma ontology was generated, making it possible to represent the pertinent knowledge on this area through the definition of relevant entities–symptoms, medical tests, treatments, etc.–and the interactions among them. A series of relevant concepts was also defined with values that may be of interest (e.g. ECOG performance status or number of platelets in the blood).
In the second phase, useful knowledge was extracted from the texts. To do that, the GMV Team used uQuery, a proprietary Natural Language Processing tool. With it, previously defined concepts and patterns were located in the data, making it possible to tackle common problems associated with natural language processing, such as gender and negation management, which are quite common in medical texts. This phase also sought to assign a temporal context to the findings, making it possible to sort them chronologically afterwards. That was one of the major challenges of the project, especially due to the unique nature of many medical texts, such as schematic narration or mixing temporal contexts. Finally, the last phase consisted of leveraging the results obtained using a series of views that made it possible to reconstruct the chronology of the disease in the patient and analyze it more intuitively.
The aim of the work was to expand the global knowledge on the behavior of RCCs and to delve deeper into the pathology to improve care for patients with these tumors. With this project, GMV and Azierta, applying natural language technology, have provided specialists at the Puerta de Hierro Hospital with highly valuable information on diagnostic procedures applied during the study phase, their evolution over the years, tackling different therapies based on the comorbidities described by the patient and their lifestyle, and more. uQuery, GMV’s Natural Language Processing tool, has made it possible to analyze and exploit the clinical information on patients with Renal Cell Carcinoma, building a timeline of their illness and undertaking new clinical approaches.
Author: Paloma López de Arenosa Barbeito, GMV Artificial Intelligence and Big Data Division Data Scientist.