Near-Duplicate Face Images (NDFI)

Available for download here. If you use our dataset, please cite our paper [Bibtex].

The Near-Duplicate Face Images (NDFI) dataset is created using images from the Labeled Faces in the Wild (LFW) dataset1. 2,727 images from 468 disjoint subjects are cropped to a fixed size of 96x96 using a commercial face matcher, and are further subjected to any or a sequence of 4 random photometric transformations to generate 27,270 near-duplicate face images. They are stored as 8-bit files in .bmp format. The four transformations used to create the dataset are: (i) Brightness adjustment, (ii) Median filtering, (iii) Gaussian smoothing and (iv) Gamma transformation. Refer to Table I for the details about the photometric transformations and the range of parameters used in the paper. 2,727 root images are used to generate 2,727 image phylogeny trees (IPTs). Each IPT contains 10 images, thus resulting in a total of 27,270 images. Refer to Figure 3 in the paper for the IPT configuration. We refer this set as Set II (full-set). We also make available a smaller set of images, Set I (subset of Set II) consisting of 1,229 original images from 391 subjects, resulting in 12,290 near-duplicates. MATLAB R2018a has been used to create this dataset on an Intel(R) Core(TM) i7-7700 CPU @3.60GHz system.


1Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.

S. Banerjee and A. Ross, “Face Phylogeny Tree: Deducing Relationships Between Near-Duplicate Face Images Using Legendre Polynomials and Radial Basis Functions,” Proc. of 10th IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), (Tampa, USA), September 2019.