Background reading:
J. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised learning of human action categories using spatial-temporal words," International Journal of Computer Vision. 79(3): 299-318. 2008 Available: http://dx.doi.org/10.1007/s11263-007-0122-4
Contemporary readings:
A. Yao, J. Gall, L. Van Gool, "A Hough Transform-Based Voting Framework for Action Recognition", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5539883[Bharath]
J.C. Niebles, C. Chen, and L. Fei-Fei, "Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification", ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15552-9_29[Bharath]
P. Matikainen, M. Hebert and R. Sukthankar, "Representing Pairwise Spatial and Temporal Relations for Action Recognition", ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15549-9_37[Bharath]
T. Lan, Y. Wang, W. Yang and G. Mori, "Beyond Actions: Discriminative Models for Contextual Group Activities", NIPS 2010, http://books.nips.cc/papers/files/nips23/NIPS2010_0115.pdf [Oriol]
Additional readings (not covered, but relevant):
K. Prabhakar, S. Oh, P. Wang, G. D. Abowd, J Rehg, "Temporal Causality for the Analysis of Visual Events", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5539871
D. Weinland1, M. Ozuysal and P. Fua, "Making Action Recognition Robust to Occlusions and Viewpoint Changes", ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15558-1_46