Data and Research Samples

In an effort to aid fellow Researchers and Engineers, here is a lost of PUBLIC data sets to benchmark and work on Biomedical Image Processing and Computer Vision Projects.

Project 1: Medical Image Processing for Diagnostic Medicine(Ophthalmology use case)

Data Sets:

New medical image-based data sets have been created for multi-class and binary classification tasks.

Data Set 1: This local data set has been created using 89 fundus images from DIARETDB1 data set that contains manually marked images with bright and red lesions corresponding to diabetic retinopathy. The data set contains regional lesion information from samples of 6 different classes (class label 0 through 5) corresponding to 66 features from 15,945 samples.

Download data set at: http://www2.it.lut.fi/project/imageret/diaretdb1/

The class labels have the following meaning:

-Class 0: Bright non-lesion

-Class 1: Hard exudates

-Class 2:Cotton wool spots

-Class 3: Red non-lesion

-Class 4: Microaneurysms

-Class 5: Hemorrhages

The 66 features are:

Area, bounding box lengths (Bb3, Bb4), convex area, eccentricity, Equivalent Diameter, Euler number, extent, filled area, major and minor axes lengths (Maj, Min), orientation, perimeter, solidity, 12 Gaussian Coefficients corresponding to mean and standard deviation from 6 different variance coefficients, 16 Regional Intensity Coefficients

corresponding to max, min, mean and variance of pixels in the green plane, red plane, hue plane and intensity plane, and 24 Gradient Intensity coefficients corresponding to maximum, minimum, mean pixel intensities in first order and second order gradient filtered images from green, red, hue and saturation planes.

Other Classification Data sets for Diabetic Retinopathy: DIARETDB0 (https://www.it.lut.fi/project/imageret/diaretdb0/)

MESSIDOR (https://www.adcis.net/en/third-party/messidor/)

MESSIDOR2 (https://www.adcis.net/en/third-party/messidor2/)

Related paper:

Bihis, Matthew; Roychowdhury, Sohini, "A generalized flow for multi-class and binary classification tasks: An Azure ML approach," in Big Data (Big Data), 2015 IEEE International Conference on , vol., no., pp.1728-1737, Oct. 29 2015-Nov. 1 2015. doi: 10.1109/BigData.2015.7363944

Paper

Data Set 2:

This binary classifciation data set for pixel-based classification of fine blood vessels can be used for binary classification. 98 features per sample explained in the following paper:

Roychowdhury, Sohini. "Classification of Large-Scale Fundus Image Data Sets: A Cloud-Computing Framework." arXiv preprint arXiv:1603.08071(2016).Paper

Download Data sets:

  1. DRIVE (https://drive.grand-challenge.org/)

  2. STARE (https://cecas.clemson.edu/~ahoover/stare/)

  3. CHASEDB1 (https://www.kaggle.com/khoongweihao/chasedb1)

Related Publications and Segmentation Results:

[J1] S. Roychowdhury, D.D. Koozekanani, K.K. Parhi. “Iterative Vessel Segmentation of Fundus Images”, IEEE Transactions on Biomedical Engineering. vol.62, no.7, pp.1738-1749, July 2015. [Supplementary Material].

[J2] S. Roychowdhury, D.D. Koozekanani, K.K. Parhi. “Blood Vessel Segmentation of Fundus Images by Major Vessel Extraction and Sub-image Classification”, IEEE Journal on Biomedical and Health Informatics, vol. 19, no. 3, pp. 1118–1128, May 2015. [Nominated for Journal Cover].

[J3] S. Roychowdhury, D. D. Koozekanani, S. N. Kuchinka and K. K. Parhi, "Optic Disc Boundary and Vessel Origin Segmentation of Fundus Images," in IEEE Journal of Biomedical and Health Informatics, vol. 20, no. 6, pp. 1562-1574, Nov. 2016. doi: 10.1109/JBHI.2015.2473159


Project 2: Segmentation of Histopathology images

The goal in this project is semantic segmentation of pathology.

Data set 1: Segmentation of neural Structures (http://brainiac2.mit.edu/isbi_challenge/)

Dataset 2: Breast Histopathology Images (https://www.kaggle.com/paultimothymooney/breast-histopathology-images)

Other Recent Kaggle data sets:

  1. https://www.kaggle.com/mateuszbuda/lgg-mri-segmentation

  2. https://www.kaggle.com/bachrr/covid-chest-xray?select=images

Related Code: https://github.com/sohiniroych/Explainable_End2End_Aug

https://github.com/sohiniroych/Unet-using-TF2

https://github.com/sohiniroych/U-net-for-Multi-class-semantic-segmentation


Project 3: Camera-based Estimation of Road Friction

The goal of this project is to predict road conditions using front view camera

Paper: S. Roychowdhury, M. Zhao, A. Wallin, N. Ohlsson and M. Jonasson, "Machine Learning Models for Road Surface and Friction Estimation using Front-Camera Images," 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 2018, pp. 1-8, doi: 10.1109/IJCNN.2018.8489188. https://ieeexplore.ieee.org/abstract/document/8489188/

Youtube data sets evaluated: List of Youtube videos for analysis

Project 4: Computer Vision for Autonomous DRIVE

In this project, the goal is to detect cars, pedestrians from videos and perform day to night transformations, pedestrian intention prediction, SLAM etc.

Dataset 1: KITTI http://www.cvlibs.net/datasets/kitti/

Dataset 2: JAAD (http://data.nvision2.eecs.yorku.ca/JAAD_dataset/)

Related Papers:

  1. Piccoli, Francesco, et al. "FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network." arXiv preprint arXiv:2005.07796 (2020). (https://arxiv.org/abs/2005.07796)

  2. Chowdhury, Sohini Roy, et al. "Automated augmentation with reinforcement learning and gans for robust identification of traffic signs using front camera images." 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 2019. (https://ieeexplore.ieee.org/abstract/document/9049005)