Exploiting phonological constraints for handshape recognition in sign language video
MetadataShow full item record
The ability to recognize handshapes in signing video is essential in algorithms for sign recognition and retrieval. Handshape recognition from isolated images is, however, an insufficiently constrained problem. Many handshapes share similar 3D configurations and are indistinguishable for some hand orientations in 2D image projections. Additionally, significant differences in handshape appearance are induced by the articulated structure of the hand and variants produced by different signers. Linguistic rules involved in the production of signs impose strong constraints on the articulations of the hands, yet, little attention has been paid towards exploiting these constraints in previous works on sign recognition. Among the different classes of signs in any signed language, lexical signs constitute the prevalent class. Morphemes (or, meaningful units) for signs in this class involve a combination of particular handshapes, palm orientations, locations for articulation, and movement type. These are thus analyzed by many sign linguists as analogues of phonemes in spoken languages. Phonological constraints govern the ways in which phonemes combine in American Sign Language (ASL), as in other signed and spoken languages; utilizing these constraints for handshape recognition in ASL is the focus of the proposed thesis. Handshapes in monomorphemic lexical signs are specified at the start and end of the sign. The handshape transition within a sign are constrained to involve either closing or opening of the hand (i.e., constrained to exclusively use either folding or unfolding of the palm and one or more fingers). Furthermore, akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, we are collecting and preparing for dissemination a large corpus (three thousand signs from three native signers) of ASL video annotated with linguistic information such as glosses, morphological properties and variations, and start/end handshapes associated with each ASL sign.