Show simple item record

dc.contributor.advisorKonrad, Janusz L.en_US
dc.contributor.authorZeng, Qilien_US
dc.date.accessioned2021-01-26T13:51:20Z
dc.date.available2021-01-26T13:51:20Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/2144/41909
dc.description.abstractAs a core problem in video analysis, action recognition is of great significance for many higher-level tasks, both in research and industrial applications. With more and more video data being produced and shared daily, effective automatic action recognition methods are needed. Although, many deep-learning methods have been proposed to solve the problem, recent research reveals that single-stream, RGB-based networks are always outperformed by two-stream networks using both RGB and optical flow as inputs. This dependence on optical flow, which indicates a deficiency in learning motion, is present not only in 2D networks but also in 3D networks. This is somewhat surprising since 3D networks are explicitly designed for spatio-temporal learning. In this thesis, we assume that this deficiency is caused by difficulties associated with learning from videos exhibiting strong temporal variations, such as sudden motion, occlusions, acceleration, or deceleration. Temporal variations occur commonly in real-world videos and force a neural network to account for them, but often are not useful for recognizing actions at coarse granularity. We propose a Dynamic Equilibrium Module (DEM) for spatio-temporal learning through adaptive Eulerian motion manipulation. The proposed module can be inserted into existing networks with separate spatial and temporal convolutions, like the R(2+1)D model, to effectively handle temporal video variations and learn more robust spatio-temporal features. We demonstrate performance gains due to the use of DEM in the R(2+1)D model on miniKinetics, UCF-101, and HMDB-51 datasets.en_US
dc.language.isoen_US
dc.subjectComputer scienceen_US
dc.subjectAction recognitionen_US
dc.subjectDeep learningen_US
dc.subjectSpatio-temporal modelingen_US
dc.subjectVideo analysisen_US
dc.titleLearning temporal variations for action recognitionen_US
dc.typeThesis/Dissertationen_US
dc.date.updated2021-01-20T20:04:50Z
etd.degree.nameMaster of Scienceen_US
etd.degree.levelmastersen_US
etd.degree.disciplineComputer Scienceen_US
etd.degree.grantorBoston Universityen_US
dc.identifier.orcid0000-0002-1663-9461


This item appears in the following Collection(s)

Show simple item record