Abstract: Global effective receptive field plays a crucial role for image style transfer (ST) to obtain high-quality stylized results. However, existing ST backbones (e.g., CNNs and Transformers) ...
Abstract: The Audio-Visual Question Answering (AVQA) task holds significant potential for applications. Compared to traditional unimodal approaches, the multi-modal input of AVQA makes feature ...
Abstract: Text-driven human motion generation has attracted considerable critical attention in recent years. The task requires generating movements that are diverse, natural, and comfortable in ...