Abstract: Text-based Visual Question Answering (TextVQA) aims to produce correct answers for given questions about the images with multiple scene texts. In most cases, the texts naturally attach to ...