LAL: linguistically aware learning for scene text recognition

Zheng, Yi; Qin, Wenda; Wijaya, Derry; Betke, Margrit

LAL: linguistically aware learning for scene text recognition

Files

10204780.pdf(3.84 MB)

Accepted manuscript

Date

2020-10-12

DOI

10.1145/3394171.3413913

Authors

Zheng, Yi

Qin, Wenda

Wijaya, Derry

Betke, Margrit

Version

Published version

URI

https://hdl.handle.net/2144/43694

Citation

Y. Zheng, W. Qin, D. Wijaya, M. Betke. 2020. "LAL: Linguistically Aware Learning for Scene Text Recognition." Proceedings of the 28th ACM International Conference on Multimedia. MM '20: The 28th ACM International Conference on Multimedia. https://doi.org/10.1145/3394171.3413913

Abstract

Scene text recognition is the task of recognizing character sequences in images of natural scenes. The considerable diversity in the appearance of text in a scene image and potentially highly complex backgrounds make text recognition challenging. Previous approaches employ character sequence generators to analyze text regions and, subsequently, compare the candidate character sequences against a language model. In this work, we propose a bimodal framework that simultaneously utilizes visual and linguistic information to enhance recognition performance. Our linguistically aware learning (LAL) method effectively learns visual embeddings using a rectifier, encoder, and attention decoder approach, and linguistic embeddings, using a deep next-character prediction model. We present an innovative way of combining these two embeddings effectively. Our experiments on eight standard benchmarks show that our method outperforms previous methods by large margins, particularly on rotated, foreshortened, and curved text. We show that the bimodal approach has a statistically significant impact. We also contribute a new dataset, and show robust performance when LAL is combined with a text detector in a pipelined text spotting framework.

License

© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

Collections

BU Open Access Articles
CAS: Computer Science: Scholarly Papers

Full item page