back to list

Project: AI for histopathology of melanoma


Melanoma is a form of skin cancer that originates in melanin-producing cells known as melanocytes. While other skin cancer types occur more frequently, melanoma is most dangerous due to the high likelihood of metastasis if not treated early. The incidence rate of melanoma has increased over the past decades.

To determine whether the a melanocytic lesion is benign (i.e., nevus) or malignant (i.e., melanoma), the lesion is first surgically removed, the tissue is prepared and cut into cross-sections, which are then stained and scanned using a microscope. The scanned microscopy images are examined by a pathologist, who is responsible for diagnosing the lesion and recommending follow-up treatment if necessary. If the lesion diagnosis is still unclear based on the regular hematoxylin & eosin (H&E) staining, so-called immunohistochemical (IHC) stainings can be performed, which highlight specific features of the lesion. If uncertainty remains, pathologists also have the option to perform molecular tests on the lesion to determine if there are genetic mutations that indicate a lesion type.

One of the challenges that pathologists currently face is an increasing workload as a consequence of the growing number of cases that need to be examined. A potential solution to address this challenge is the implementation of automated methods to reduce the total time spent per case. For example, the application of a predictive model to assign more challenging cases directly to the pathologist with most expertise could make the workflow more efficient, or by using a machine learning tool for time-consuming tasks such as counting specific cells could save time in diagnosing a lesion.

We have been working on the curation of a large-scale, high-quality dataset of melanocytic lesions. At the moment, this dataset includes 52,000+ H&E stained images (and 10,000+ uncurated IHC stained images) from 21,000+ patients. One of the unique features of microscopy images is the high resolution, e.g., a single image can be 150,000 by 150,000 pixels. In addition to the image data, text descriptions of the lesion characteristics and diagnosis written by a pathologist are available.

Potential projects
The projects are in collaboration with researchers from the department of Biomedical Engineering at the TU/e and the department of Pathology at the UMC Utrecht.

1. Pathology Report Segmentation for Image-Conditioned Report Generation
Pathology reports typically include a description of the patient information, lesion information based on the H&E stained images (both the characteristic visual features and measurements), optionally lesion information based on IHC stained images and the results from molecular tests, as well as a final diagnosis and treatment recommendation. We would like to train a language model for text segmentation, which is able to classify all sentences of a report based on the type of information that each sentence contains.

The end goal is to train a second language model to write part of the pathology report based on the image information. The text segmentation model is required to select only the information from the report that can be derived from the image (e.g., excluding the findings from molecular tests).  

2. Extending the Melanocytic Lesion Classification Model to IHC Stained Images.
At the moment, we are developing predictive models for melanocytic lesion classification based only on H&E stained images. We expect that the model performance can substantially increase for some lesion types by also incorporating the information from IHC stained images. These images highlight specific features of a lesion, for example the invasiveness, malignancy, or genetic aberrations which are indicative of specific lesion types. In brief, this project is expected to consists of the following steps
Dataset curation: Performing quality assurance by selecting the IHC images to include. We already developed and used a selection tool to streamline this process for H&E stained images.
Tissue segmentation: Fine-tuning a tissue segmentation model we developed for H&E on a small set of IHC stained images.
Classification model: Training a classification model that incorporates both H&E and IHC stained image information. The main challenge is that not all IHC stainings are performed on all lesions.

Sibylle Hess
Secondary supervisor
Mitko Veta
Get in contact