Sergio Peñafiel

Automatic Support System for Tumor Coding in Pathology Reports in Spanish


Sergio Peñafiel is a Computer Engineer and Master in Computer Science from Universidad de Chile, Santiago, Chile. He is currently head of the Medical informatics team at Arturo López Peréz Cancer Institute, Santiago, Chile. He is the CEO of Create, a startup that has focused on applications for healthcare and interoperability in Latin America. He is also an external expert professor at the Department of Computer Science of the Universidad de Chile, Santiago, Chile. His research interests are Artificial Intelligence, Interpretability in Machine Learning, Decision Support Systems, and the Application of Technology in Healthcare.

Description of the Talk:

Pathology reports provide valuable information for cancer registries to understand, plan and implement strategies to mitigate the impact of cancer. However, coding key information from unstructured reports is done by experts in a time-consuming manual process. Here we report an automatic deep learning-based system that recognizes tumor morphology and topography mentions from free text and suggests codes from the International Classification of Diseases for Oncology (ICD-O) in Spanish. This task combined an in-house annotated corpus of tumor morphology and topography mentions, with the CANTEMIST (CANcer TExt Mining Shared Task – tumor named entity recognition) corpus, an open-source dataset annotated with tumor morphology mentions. Specifically, we created a Named Entity Recognition (NER) model using the BiLSTM-CRF architecture and applied transfer learning from state-of-the-art pre-trained language models to obtain high-quality contextual representations, thus improving the detection of entities. The mentions found with this model were subsequently coded using a search engine tailored to the ICD-O codes. Our NER models achieved an F1 score of 0.86 and 0.90 for tumor morphology and topography, respectively. The overall performance of our automatic coding system achieved an accuracy at five suggestions of 0.72 and 0.65 for tumor morphology and topography, respectively. These results demonstrate the feasibility of implementing NLP tools in the routine of a cancer center to extract and code valuable information from pathology reports.