The dataset consists of 1010 data points. They come from the LIDC-IDRI dataset, which consists of chest fan beam computed tomography (CT) scans focused on lung nodule analysis. In this work, we use those images as ground truth and then simulate Cone Beam Computed Tomography (CBCT) sinograms [∗], using the ASTRA toolbox [1]. Due to the size of the required dataset, simplified simulations are produced (i.e. not using Monte Carlo physics-based simulations), that contain photon, electronic, and quantification noise models, and a simplified flat-field correction and scatter simulator, to approximate real scanner noise. We provide data corresponding to two dose levels, one with 100% clinical dose [∗∗] and the other corresponding to 10% of clinical dose.
The simulations are divided into 800 training samples, 100 validation samples, and 110 test samples. We provide the training and validation datasets only, with the ground truth data, two sinograms, and two noisy reconstructions using the FDK algorithm (the cone-beam equivalent of FBP) corresponding to the two different levels of noise as stated above. You are free to mix and match these datasets as you please but we provide them separately for convenience. More information on this is given at the bottom of this page.
It is important to clarify that the “ground truth” refers to an over-sampled noiseless reconstruction.
How to get the data:
Please register for the challenge using this form. Hyperlinks to the training and validation data will be available to you upon successful registration.
We recommend using zenodo_get to download the files easier.
Data Description:
Once extracted, the data format is stored in individual .npy files, containing numpy arrays, that can be loaded with:
import numpy as np
data=np.load(filename,allow_pickle=True)
The filenames have the following naming convention:
{patient ID}_clean_fdk_256.npy → Ground truth/target
{patient ID}_fdk_clinical_dose_256.npy → FDK for clinical dose
{patient ID}_fdk_low_dose_256.npy → FDK for low dose
{patient ID}_sino_clinical_dose.npy → Sinogram for clinical dose
{patient ID}_sino_low_dose.npy → Sinogram for low dose
Note: The training data is 217Gb compressed, and 291Gb once extracted. The validation data is 28Gb compressed, and 37Gb once extracted.
Note: The dataset is missing 2 patients (237 and 584).
Example of use:
You are free to use this data for the challenge as you please. We give an example of how to reconstruct the FDK from the sinograms using pytorch, tomosipo[2], and tomosipo algorithms [3] in the code snippet that follows. Tomosipo is a pytorch-compatible ASTRA wrapper.
import torch
import numpy as np
import tomosipo as ts
from ts_algorithms import fdk
# optional, visualization
import matplotlib.pyplot as plt
# define geometry of CBCT scan
image_size = [300, 300, 300]
image_shape = [256, 256, 256]
voxel_size = [1.171875, 1.171875, 1.171875]
detector_shape = [256, 256]
detector_size = [600, 600]
pixel_size = [2.34375, 2.34375]
dso = 575
dsd = 1050
angles = np.linspace(0, 2*np.pi, 360, endpoint=False)
# Create a tomosipo operator
vg = ts.volume(shape=image_shape, size=image_size)
pg = ts.cone(angles=angles, shape=detector_shape, size=detector_size, src_orig_dist=dso, src_det_dist=dsd)
A = ts.operator(vg, pg)
# Load sinogram
folder='/your_folder/train/'
sino=np.load(folder+'0000_sino_clinical_dose.npy',allow_pickle=True)
print(sino.shape)
# Plot sinogram
plt.figure()
plt.imshow(sino[128,:,:])
plt.savefig("sino.png")
sino=torch.from_numpy(sino).cuda()
# Reconstruct image
recon=fdk(A,sino) # Your ML algorithm should replace this to win the challenge.
recon=recon.detach().cpu().numpy()
print(recon.shape)
# Plot image
plt.figure()
plt.imshow(recon[128,:,:])
plt.clim([0,2])
plt.savefig("fdk.png")
Notes:
[∗] The exact geometry of the simulated CBCT is as follows:
image size : [300 300 300] mm
image shape : [256 256 256] voxels
voxel size : [1.171875 1.171875 1.171875] mm
detector size : [600, 600] mm
detector shape : [256 256] pixels
pixel size : [2.34375, 2.34375] mm
distance source origin (axis of rotation, center of image) : 575 mm
distance source to detector : 1050 mm
Angular range= [0, 2π], uniformly distributed 360 angles
[∗∗] We consider a clinical dose around 10^5 photons measured with no patient in the scanner.
[1] Van Aarle, Wim, et al. "The ASTRA Toolbox: A platform for advanced algorithm development in electron tomography." Ultramicroscopy 157 (2015): 35-47.
[2] https://github.com/ahendriksen/tomosipo
[3] https://github.com/ahendriksen/ts_algorithms