Datasets

Data sets are of utmost importance to the community for evaluation of the algorithms. Currently, several good data sets are available, although more and larger data sets are always needed. Data sets currently available are summarized below. If you have any questions, comments, remarks or additions, please send an e-mail to the contact person of this website.

Data sets:

—  INTEL-TUT Dataset for Camera Invariant Color Constancy Research (2017)
—  ColorChecker RECommended (2018)
—  Cube+ (2019)
—  Correcting Improperly White-Balanced Images (2019)

SFU Hyperspectral Set (2002)

At Simon Fraser University, an active group of people are working on the problem of color constancy. In 2002, a set of 1995 hyperspectral surface reflectance functions was released, together with various measured and synthesized illuminant spectra and a set of camera sensitivity functions. This full set of spectra and sensitivity functions allows for the generation of a vast amount of RGB-values which can be used for a systematic evaluation of color constancy algorithms. However, because of the “clean” data, consider for instance the lack of noise, the obtained performance will point towards a best-case assessment. The set can be found here.

SFU Laboratory Images (2002)

Overview of 31 scenes, recorded under the same light sourceThe same group released a set of RGB-images of various objects. Each object was recorded several times, using eleven different light sources. Four different subsets are distinguished: objects with minimal specularities (consisting of 22 scenes, 223 images in total), objects with at least one clear dielectric specularity (9 scenes, 98 images in total), objects with metallic specularities (14 scenes, 149 images in total) and objects with fluorescent surfaces (6 scenes, 59 images in total). A commonly used subset in literature is the union of the first two subsets (the minimal specularities and the dielectric specularities; see the figure to the right for an overview of all objects recorded under one light source). Even though these images encompass several different illuminants and scenes, the variation of the images is limited. The set can be found here.

Nascimento et al. (2002)

One example image of this setOne example image of this set

Eight hyperspectral images are released by Nascimento et al., which are used in their 2002 JOSA publication “Statistics of Spatial Cone-excitation Ratios in Natural Scenes”. The advantage of this set over the previous hyperspectral set is that these data consist of real-world scenes (four urban and four rural scenes are deployed). However, because of the costly acquisitioning of the data, the set is limited in variation. The images can be found here.

SFU Grey-ball Set (2003)

A little after the release of the SFU Laboratory Images in 2002, the SFU came with another, larger, data set. This time, over 11,000 images are captured with a video camera. The ground truth is captured using a grey sphere, mounted on top of the camera (so it is visible inside the image at all times). Some care should be taken when using this set for learning algorithms, though, since the frames can be quite correlated. The authors of the set impose some restrictions when using it, so the procedure to obtain the set is outlined here.

Foster et al. (2004)

One example image of this setOne example image of this set

The same group of people as the Nascimento et al. hyperspectral images created a second set  of eight hyperspectral images. The images are originally used in a 2004 Visual Neuroscience paper titled “Information limits on neural identification of coloured surfaces in natural scenes”. Similar to the previous set, the hyperspectral nature of the images allows for rendering using various different light sources, but the variation of the images is only limited. The images can be found here.

Color-checker (Original) (2008)

At the Vision Group at Microsoft Research Cambridge, Gehler et al. collected a set of real-world images with calibrated cameras. All images contain a color checker inside, which can be used to capture the ground truth. The data set comes with manually annotated labels for indoor and outdoor images. The original RAW-data and the automatically generated RGB-images can be found here.

Barcelona Images (2009)

The Barcelona Calibrated Images Database is originally created by Parraga et al. (corresponding poster). The data set consists of numerous natural images with a grey sphere in the bottom left corner. It is expected the set continues to grow, but consists of at least the following sub-categories: Urban Scenery (83 images), Foreset & Motorways (58 images), Snow & Seaside (68 images). The website with full information on how the set was captured and how to obtain it can be found here.

SFU HDR Images (2010)

The latest addition of the Computational Vision Laboratory at SFU is a set of 105 high dynamic range images. Using a calibrated camera, up to 9 images per scene are captured to generate the high dynamic range images. Four color checkers are positioned at different angles in the scene, which are used to measure the scene illuminant. A full description of the set can be found here.

Color-checker (by Shi) (2010)

Because the original color checker set was generated from RAW data using automatic settings, some artifacts could have been introduced. To avoid as much problems as possible, Shi reprocessed the original RAW data and generated 12-bit PNG-images (with lossless compression). More information on the exact details and how to obtain the set can be found here.

Multiple Light Sources Dataset (2012)

Part of the contribution of the 2012 TIP paper “Color Constancy for Multiple Light Sources” was the introduction of a data set with images that are recorded under multiple light sources. The set consists of two parts, one with images captured in a laboratory setting, and one with images captured outdoors. The images can be found here.

Image Sequences (2013)

In 2013, Véronique Prinet, Dani Lischinski and Michael Werman presented their work during the International Conference on Computer Vision on “Illuminant chromaticity from image sequences”. The data set used in this paper can be found here.

Yet Another Color Constancy Database Updated (2014)

The Eidomatics Laboratory of the Department of Computer Science of the Università degli Studi di Milano (Italy) have created a database consisting of two sets of images: the images in the first set comes from a low dynamic range (LDR) scene, while for the second set a high dynamic range (HDR) scene has been set. More information on the exact details and how to obtain the images can be found here.

Multi-Illuminant Multi-Object (MIMO) (2014)

The data set was introduced in the 2014 IEEE TIP paper titled “Multi-Illuminant Estimation with Conditional Random Fields” by Shida Beigpour et al. and is specifically aimed at evaluating multi-illuminant scenes. The images can be downloaded here, and more information can be found on their project website.

Time-Lapse Hyperspectral Radiance Images of Natural Scenes (2016)

These sequences of hyperspectral radiance images have been taken from a study by Foster, Amano, and Nascimento (2016) (“Time-lapse ratios of cone excitations in natural scenes”, in Vision Research, 120) of scenes undergoing natural illumination changes. In each scene, hyperspectral images were acquired at about 1-hour intervals. Thumbnail RGB images from the sequences are shown below with times and dates of acquisition. More information and the terms for use can be found here or here.

Hyperspectral images for spatial distributions of local illumination in natural scenes (2016)

This set of hyperspectral radiance images has been taken from a study of scenes by Nascimento, Amano & Foster (2016) (“Spatial distributions of local illumination color in natural scenes”, Vision Research, 120). The set consists of 30 hyperspectral radiance images of natural scenes in which small neutral probe spheres were embedded to provide estimates of local illumination spectra. The 30 natural scenes are from the Minho region of Portugal and were acquired during late spring and summer of 2002 and 2003. The sky in most of the scenes was clear but in five it was overcast with cloud. More information and the terms for use can be found here or here.

Cube data set (2017)

This data set was gathered by Nikola Banic and Sven Loncaric (see this arxiv-article) and consists of 1365 exclusively outdoor images taken with a Canon EOS 550D camera in parts of Croatia, Slovenia, and Austria during various seasons. More information and the terms for use can be found here.

INTEL-TUT Dataset for Camera Invariant Color Constancy Research (2017)

INTEL-TUT dataset is designed for camera invariant color constancy research. Camera invariance corresponds to the robustness of an algorithm’s performance when run on images of the same scene taken by different cameras. Accordingly, images in the database correspond to several lab and field scenes each of which are captured by three different cameras with minimal registration errors. The lab scenes are also captured under five different illuminations. The spectral responses of cameras and the spectral power distributions of the lab light sources are also provided, as they may prove beneficial for training future algorithms to achieve color constancy. As a side contribution, this dataset also includes images taken by a mobile camera with color shading corrected and uncorrected results. This allows research on the effect of color shading as well. More information and the terms for use can be found here.

ColorChecker RECommended (2018)

Below we explain how we re-processed the Gehler data to address the problem raised in G. D. Finlayson, G. Hemrit, A. Gijsenij, and P. Gehler, “A Curious Problem with Using the Colour Checker Dataset for Illuminant Estimation,” in Color and Imaging Conference, 2017, pp. 64–69.

If you use the data described below then please cite G. Hemrit et al., “Rehabilitating the ColorChecker Dataset for Illuminant Estimation,” in Color and Imaging Conference, 2018.

The ColorChecker set was introduced by Gehler et al. in 2008. It has 568 RGB images taken with Canon 1D and Canon 5D cameras. Hemrit et al. re-processed the original RAW images and re-calculated the ground-truth according to the calculation methodology described by Shi and Funt: the RAW images are converted to TIF with dcraw (a program for decoding raw image format files). The images are then demosaiced (with a linear interpolation) to create TIFF images. The color charts areas and the saturated and clipped pixels are identified in the images and the user needs to consider masking them using the provided masks (see the link below) before using the images for illuminant estimation.

Follow these links to download: the re-processed images and the ground-truth and coordinates of the color charts.

Cube+ (2019)

The Cube+ dataset is an extension of the Cube dataset proposed earlier by Nikola Banic and Sven Loncaric. This data set contains images that are recorded in parts of Croatia, Slovenia, and Austria during various seasons. For more information, please refer to the corresponding project-site.

Correcting Improperly White-Balanced Images (2019)

At York University in collaboration with Adobe Research, a dataset of 65,416 sRGB images rendered using different white-balance presets in the camera (e.g., Fluorescent, Incandescent, Dayligh) with different camera picture styles (e.g., Vivid, Standard, Neutral, Landscape) by Afifi et al. For each sRGB rendered image, a target white-balanced image is provided. To produce the correct target image, the “ground-truth” white from the middle gray patches in the color rendition chart was manually selected, followed by applying a camera-independent rendering style (namely, Adobe Standard). The dataset is divided into two sets: intrinsic set (Set 1) and extrinsic set (Set 2). Check the project page to access the dataset and learn more about the work.