Download PDF

IEEE winter conference on applications of computer vision - WACV 2018, Date: 2018/03/12 - 2018/03/14, Location: Lake Tahoe, Nevada, USA

Publication date: 2018-01-01
Volume: 2018-January Pages: 1549 - 1557
ISSN: 9781538648865
Publisher: IEEE

WACV 2018

Author:

De Geest, Roeland
Tuytelaars, Tinne

Keywords:

PSI_VISICS, Science & Technology, Technology, Computer Science, Artificial Intelligence, Engineering, Electrical & Electronic, Computer Science, Engineering, PSI_4277

Abstract:

© 2018 IEEE. Online action detection is a challenging problem: A system needs to decide what action is happening at the current frame, based on previous frames only. Fortunately in real-life, human actions are not independent from one another: There are strong (long-term) dependencies between them. An online action detection method should be able to capture these dependencies, to enable a more accurate early detection. At first sight, an LSTM seems very suitable for this problem. It is able to model both short-term and long-term patterns. It takes its input one frame at the time, updates its internal state and gives as output the current class probabilities. In practice, however, the detection results obtained with LSTMs are still quite low. In this work, we start from the hypothesis that it may be too difficult for an LSTM to learn both the interpretation of the input and the temporal patterns at the same time. We propose a two-stream feedback network, where one stream processes the input and the other models the temporal relations. We show improved detection accuracy on an artificial toy dataset and on the Breakfast Dataset [21] and the TVSeries Dataset [7], reallife datasets with inherent temporal dependencies between the actions.