Computer Vision for Medicine

Healthcare & Life Sciences
Artificial Intelligence & Machine Learning

Our client is a major international pharmaceutical company, which conducts research and development activities related to a wide range of human medical disorders, including mental illness, neurological disorders, anaesthesia and analgesia, gastrointestinal disorders, fungal infection, allergies, and cancer.


The goal of the pilot project was to assess the feasibility of automating the scoring of histology slides. These slides are a way to assess the activity of inflammatory bowel disease. The focus was on Crohn's disease. Scoring requires a trained pathologist and is time-consuming.

On input, we had about 1500 biopsies and metadata. The biopsies had been labelled by an expert pathologist. Slides were stained with hematoxylin and eosin (H&E).

We were to build a system that would automatically assign class labels to new biopsies. The class labels correspond to abnormalities defined by the Global Histology Activity Score (GHAS). This scoring system defines multiple scoring components, but only three of them were in the scope of the project: epithelial damage, infiltration of mononuclear cells in lamina propria (LP), and infiltration of polymorphonuclear cells in LP.


Solving the task involved three subtasks:

  1. Semantic segmentation: identifying regions of interest in images. For the scores related to lamina propria, we needed to first locate the lamina propria in the image. For this, we trained a convolutional neural network of the U-Net architecture on a selected subset of biopsies with lamina propria manually annotated. The CNN was then used to identify lamina propria on the rest of the biopsies. For epithelial damage, individual cell nuclei were first located using a different CNN.
  2. Feature extraction: the process of representing the found regions with numeric vectors suitable for further classification. We used a pre-trained ResNet network acting as a feature extractor. The network had been pre-trained on the ImageNet dataset. We used this approach for all three scoring components, with differences being in what exactly is fed into the network. Random square patches were extracted from LP and fed into the feature extractor. For epithelium, we used small square patches around epithelial nuclei as input to the feature extractor. Feature vectors were then pooled into a single feature vector to be fed into the final classifier.
  3. Classification. We had three classifiers, one for each scoring component: epithelial damage, mononuclear cells in LP, and polymorphonuclear cells in LP. The classifiers were simple fully-connected neural networks.

Python libraries used: Keras, Tensorflow, openslide, scikit-learn.


We achieved the weighted F1 score of 0.76-0.81 for different scoring components. F1 is a measure of classification accuracy, ranging between 0 and 1.

It has been shown that automating the scoring of histology slides is feasible. Further efforts may improve scoring accuracy and take the system closer to being used for an automated second opinion.