Background readings:
[1] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, November 2004. http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94
[2] T. Lindeberg, "Feature detection with automatic scale selection," International Journal of Computer Vision, vol. 30, no. 2, pp. 79-116, November 1998. http://dx.doi.org/10.1023/A:1008045108935
[3] J. Matas, O. Chum, U. Martin, and T. Pajdla, "Robust wide baseline stereo from maximally stable extremal regions," in Proceedings of British Machine Vision Conference, vol. 1, London, 2002, pp. 384-393. http://cmp.felk.cvut.cz/~matas/papers/matas-bmvc02.pdf
[4] K. Mikolajczyk and C. Schmid, "Scale & affine invariant interest point detectors," Int. J. Comput. Vision, vol. 60, no. 1, pp. 63-86, October 2004. http://dx.doi.org/10.1023/B:VISI.0000027790.02288.f2
[5] I. Laptev, "On space-time interest points," International Journal of Computer Vision, vol. 64, no. 2-3, pp. 107-123, September 2005. http://dx.doi.org/10.1007/s11263-005-1838-7
Contemporary readings:
[6] L. Bo, X. Ren, and D. Fox, "Kernel Descriptors for Visual Recognition", NIPS 2010, [Yangqing] http://books.nips.cc/papers/files/nips23/NIPS2010_0821.pdf
[7] L. Bourdev, S. Maji, T. Brox, and J. Malik, "Detecting People Using Mutually Consistent Poselet Activations", ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15567-3_13