MSU AVIS Dataset
MSU-AVIS dataset consists of audio-visual data, from 50 different subjects, freely speaking (text-independent) while walking around in a semi-constrained indoor environment mimicking a real-world surveillance scenario. The face images exhibit variations due to large stand-off distance from the camera, occlusions, pose, indoor-illumination, expressions, accessories, etc. The audio samples exhibit variations due to the distance of the subject from the microphone, indoor reverberations, background noise, etc.