Human Protein Atlas - Single Cell Classification
Find individual human cell differences in microscope images
Introduction:
Project Objective : The main objective of this project is to predicting protein organelle localization labels for each cell in the image. Which indicated that we have to predict the type of protein present in the given image using a Neural Network.
About Dataset:
The Human Protein Atlas is an initiative based in Sweden that is aimed at mapping proteins in all human cells, tissues, and organs. The data in the Human Protein Atlas database is freely accessible to scientists all around the world that allows them to explore the cellular makeup of the human body. Solving the single-cell image classification challenge will help us characterize single-cell heterogeneity in our large collection of images by generating more accurate annotations of the subcellular localizations for thousands of human proteins in individual cells. With the help of the neural network architecture we construct, we will be able to more accurately model the spatial organization of the human cell and provide new open-access cellular data to the scientific community, which may accelerate our growing understanding of how human cells functions and how diseases develop.
Training Dataset : https://www.kaggle.com/thedrcat/hpa-cell-tiles-sample-balanced-dataset
Dataset Source: https://www.kaggle.com/c/hpa-single-cell-image-classification/overview
Files Details:
train - training images (has images in .tif format)
test - test images (has images in .png format) - We need to use the model to label the images in this folder
train.csv -Image filenames and image level labels for the training set
My Prediction:
We are predicting protein organelle localization labels for each cell in the image. My final prediction would be to count the number of cell types in each image.
There are in total 19 different labels present in the dataset (18 labels for specific locations, and label 18 for negative and unspecific signal). The dataset is acquired in a highly standardized way using one imaging modality (confocal microscopy). However, the dataset comprises 17 different cell types of highly different morphology, which affect the protein patterns of the different organelles. All image samples are represented by four filters (stored as individual files), the protein of interest (green) plus three cellular landmarks: nucleus (blue), microtubules (red), endoplasmic reticulum (yellow). The green filter should hence be used to predict the label, and the other filters are used as references. The labels are represented as integers that map to the following:
0. Nucleoplasm
1. Nuclear membrane
2. Nucleoli
3. Nucleoli fibrillar center
4. Nuclear speckles
5. Nuclear bodies
6. Endoplasmic reticulum
7. Golgi apparatus
8. Intermediate filaments
9. Actin filaments
10. Microtubules
11. Mitotic spindle
12. Centrosome
13. Plasma membrane
14. Mitochondria
15. Aggresome
16. Cytosol
17. Vesicles and punctate cytosolic patterns
18. Negative
Exploratory Data Analysis
This is how the actual data looks like but we will be using a modified Dataset to train our models as show below.
Balanced Dataset For Training:
The only modification to the Dataset is we are splitting the images into individual cells and their labels to train our models.
We are considering only 0-9 Labeled cells for our training in order to reduce our dataset size and for better training results.
DataFlow Diagram of the Project
Models Used for Training:
ResNet50:
The Architecture of ResNet50 Looks as shown below
ResNet50 is a variant of ResNet model which has 48 Convolution layers along with 1 MaxPool and 1 Average Pool layer.
The Last Layer to this ResNet50 is a Fully connected layer with output as number of classes
After training ResNet50 for 5 epochs the results were as shown below
The final accuracy of the model was 15%
Custom CNN model:
The Architecture of Custom CNN models looks as shown.
In this custom CNN model where we include 3 convolution and 1 MaxPool and 1 linear layer.
Finally, a softmax layer for label prediction
This custom CNN model has the following results when trained for 10 epochs
The final accuracy of the model was 30%
EfficientNet:
As we can see Efficientnet had better results compared to ResNet-50 in the graph shown.
So, I have trained my data using Efficientnet-B1
The Last Layer to this Efficientnet is a Fully connected layer with output as number of classes
EfficientNet Results:
Efficientnet had the following results after training for 10 epochs
The final loss was close to 0.13 this was best of all the previous models
The final accuracy of the EfficientNet model was close to 65%
Instance Segmentation Model:
We have a pretrained HPA instance Segmentation Model : Its function is to instance segment all the cell in a image into individual cells
We can see here how the HPA model segments individual cells from a image with multiple cells.
Prediction:
Here HPA cell segmentation model segments each individual cell and is sent to Trained EfficientNet model for prediction of cell type.
We can see below the prediction of each cell type using the EfficientNet model that is trained. We can see the count of each cell type that is present in the images provided
References:
Leveraging Instance Image and Dataset Level Information for Weakly Supervised Instance Segmentation Yun Liu, Yu-Huan Wu, Peisong Wen, Yujun Shi, Yu Qiu, and Ming-Ming Cheng- http://mftp.mmcheng.net/Papers/21PAMI_InsImgDatasetWSIS.pdf
Learning to Segment Object Candidates Pedro O. Pinheiro∗ Ronan Collobert Piotr Dollar - https://papers.nips.cc/paper/2015/file/4e4e53aa080247bc31d0eb4e7aeb07a0-Paper.pdf
Weakly Supervised Instance Segmentation using Class Peak Response Yanzhao Zhou , Yi Zhu , Qixiang Ye , Qiang Qiu and Jianbin Jiao - https://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Weakly_Supervised_Instance_CVPR_2018_paper.pdf
Associating Inter-Image Salient Instances for Weakly Supervised Semantic Segmentation Ruochen Fan , Qibin Hou , Ming-Ming Cheng , Gang Yu , Ralph R. Martin, Shi-Min Hu1 - https://mftp.mmcheng.net/Papers/18ECCVGraphPartition.pdf