The VCapAV dataset comprises 14,923 original audio-visual clips, each no longer than 10 seconds with audio downsampled to 16kHz. It includes 14,923 real videos and 242 fake videos (generated via Kling ...