Abstract: Global effective receptive field plays a crucial role for image style transfer (ST) to obtain high-quality stylized results. However, existing ST backbones (e.g., CNNs and Transformers) ...
Abstract: The Audio-Visual Question Answering (AVQA) task holds significant potential for applications. Compared to traditional unimodal approaches, the multi-modal input of AVQA makes feature ...
Abstract: Text-driven human motion generation has attracted considerable critical attention in recent years. The task requires generating movements that are diverse, natural, and comfortable in ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results
Feedback