Topics
This list was originally compiled by Dr. Boqing Gong and updated by me. (See here for a better formatted and updated version).
Convolutional Neural Networks (CNNs)
- CNN basics
- CNN for object recognition
- CNN for object localization
- CNN for semantic segmentation
- CNN and transfer learning
- CNN for saliency detection
- Misc
Vision and Language
- Image captioning
- Visual question answering
- Misc
Generative Adversarial Networks (GANs)
- Conditional GANs
- Image and video generation
Learning Representations and Attributes
- CNN representations
- Middle-level representations: attributes
- Middle-level representations: parts
- Low-level representations
- Zero shot learning
Video: Action, Surveillance and Tracking
- Action recognition
- Tracking
- Surveillance
- Summarization
Statistical Methods and Learning
Vision and People
3D Computer Vision
Segmentation, Edges, and Saliency
Registration, alignment, and stereo
3D Representations for Recognition and Localization
Computational Photography and Image Enhancement
Motion and Correspondence
Misc
CNN Basics
[LeNet] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, november 1998.
Fukushima, Kunihiko. “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.” Biological cybernetics 36, no. 4 (1980): 193-202.
[Dropout] Hinton, Geoffrey E., Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. “Improving neural networks by preventing co-adaptation of feature detectors.” arXiv preprint arXiv:1207.0580 (2012).
[Visualization] Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” In Computer Vision–ECCV 2014, pp. 818-833. Springer International Publishing, 2014.
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples.” arXiv preprint arXiv:1412.6572 (2014).
Coates, Adam, Andrej Karpathy, and Andrew Y. Ng. “Emergence of object-selective features in unsupervised feature learning.” In Advances in Neural Information Processing Systems, pp. 2681-2689. 2012.
Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. “Object detectors emerge in deep scene cnns.” arXiv preprint arXiv:1412.6856 (2014).
Deng, Jia, Nan Ding, Yangqing Jia, Andrea Frome, Kevin Murphy, Samy Bengio, Yuan Li, Hartmut Neven, and Hartwig Adam. “Large-scale object classification using label relation graphs.” In Computer Vision–ECCV 2014, pp. 48-64. Springer International Publishing, 2014.
Henry W. Lin and Max Tegmark, Why does deep and cheap learning work so well? Arxiv, 2016.
G Larsson, M Maire, G Shakhnarovich, FractalNet: Ultra-Deep Neural Networks without Residuals, arXiv, 2016
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus, Intriguing properties of neural networks, ArXiv, 2013.
CNN for object recognition
[ILSVRC] Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. “Imagenet large scale visual recognition challenge.” International Journal of Computer Vision (2014): 1-42.
[AlexNet] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” In Advances in neural information processing systems, pp. 1097-1105. 2012.
[VGGNet] Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).
[GoogLeNet] Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Going deeper with convolutions.” arXiv preprint arXiv:1409.4842 (2014).
[152 layers] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition.” arXiv preprint arXiv:1512.03385 (2015).
M Elhoseiny, T El-Gaaly, A Bakry, A Elgammal, A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation, ICML, 2016.
CNN for object localization
Oquab, Maxime, Léon Bottou, Ivan Laptev, and Josef Sivic. “Is object localization for free?–Weakly-supervised learning with convolutional neural networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 685-694. 2015.
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
Wu, Jiajun, Yinan Yu, Chang Huang, and Kai Yu. “Deep Multiple Instance Learning for Image Classification and Auto-Annotation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460-3469. 2015.
J Redmon, S Divvala, R Girshick, A Farhadi, You only look once: Unified, real-time object detection, CVPR, 2016.
J Redmon, A Farhadi, YOLO9000: Better, Faster, Stronger, ArXiv, 2016.
CNN for semantic segmentation
Xie, Saining, and Zhuowen Tu. “Holistically-Nested Edge Detection.” In Proceedings of the IEEE International Conference on Computer Vision, 2015.
Pinheiro, Pedro O., and Ronan Collobert. “From Image-level to Pixel-level Labeling with Convolutional Networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1713-1721. 2015.
J Long, E Shelhamer, T Darrell, Fully convolutional networks for semantic segmentation, CVPR 2015.
Hyeonwoo Noh, Seunghoon Hong, Bohyung Han, Learning Deconvolution Network for Semantic Segmentation, ICCV 2015.
LC Chen, Y Yang, J Wang, W Xu, AL Yuille, Attention to scale: Scale-aware semantic image segmentation, ArXiv, 2015.
LC Chen, G Papandreou, I Kokkinos, K Murphy, Semantic image segmentation with deep convolutional nets and fully connected crfs, ArXiv, 2014.
M Mostajabi, P Yadollahpour, Gregory Shakhnarovich, Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015.
CNN and Transfer learning
Razavian, Ali S., Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. “CNN features off-the-shelf: an astounding baseline for recognition.” In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pp. 512-519. IEEE, 2014.
Donahue, Jeff, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. “Decaf: A deep convolutional activation feature for generic visual recognition.” arXiv preprint arXiv:1310.1531 (2013).
[Transferability] Yosinski, Jason, Jeff Clune, Yoshua Bengio, and Hod Lipson. “How transferable are features in deep neural networks?.” In Advances in Neural Information Processing Systems, pp. 3320-3328. 2014.
Y Aytar, L Castrejon, C Vondrick, H Pirsiavash, A Torralba, Cross-Modal Scene Networks, ArXiv, 2016.
LA Gatys, AS Ecker, M Bethge, A neural algorithm of artistic style, ArXiv, 2015.
CNN - Misc
[Pose] Oberweger, Markus, Paul Wohlhart, and Vincent Lepetit. “Training a Feedback Loop for Hand Pose Estimation.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3316-3324. 2015.
[Optical flow] Dosovitskiy, Alexey, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. “FlowNet: Learning Optical Flow With Convolutional Networks.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2758-2766. 2015.
[Pose in videos] Pfister, Tomas, James Charles, and Andrew Zisserman. “Flowing convnets for human pose estimation in videos.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1913-1921. 2015.
[Super-resolution] Dong, Chao, Chen Change Loy, Kaiming He, and Xiaoou Tang. “Learning a deep convolutional network for image super-resolution.” In Computer Vision–ECCV 2014, pp. 184-199. Springer International Publishing, 2014.
[Fine-grained object recognition] Krause, Jonathan, Hailin Jin, Jianchao Yang, and Li Fei-Fei. “Fine-Grained Recognition without Part Annotations.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5546-5555. 2015.
[Fine-grained object recognition] Lin, Tsung-Yu, Aruni RoyChowdhury, and Subhransu Maji. “Bilinear CNN Models for Fine-grained Visual Recognition.” arXiv preprint arXiv:1504.07889 (2015).
ZongYuan Ge, Alex Bewley, Christopher McCool, Ben Upcroft, Peter Corke, Conrad Sanderson, Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks, ArXiv, 2016
AR Zamir, TL Wu, L Sun, W Shen, J Malik, S Savarese, Feedback Networks, ArXiv, 2016.
M Rastegari, V Ordonez, J Redmon, A Farhadi, XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks, M Rastegari, V Ordonez, J Redmon, A Farhadi, ArXiv, 2016.
Ullman et al., Atoms of recognition in human and computer vision, PNAS, 2016.
Vision and Language
Image and video captioning
Karpathy, Andrej, and Li Fei-Fei. “Deep visual-semantic alignments for generating image descriptions.” arXiv preprint arXiv:1412.2306 (2014).
Mao, Junhua, Wei Xu, Yi Yang, Jiang Wang, and Alan L. Yuille. “Explain images with multimodal recurrent neural networks.” arXiv preprint arXiv:1410.1090 (2014).
Donahue, Jeff, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. “Long-term recurrent convolutional networks for visual recognition and description.” arXiv preprint arXiv:1411.4389 (2014).
Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. “Show and tell: A neural image caption generator.” arXiv preprint arXiv:1411.4555 (2014).
Kiros, R., Salakhutdinov, R. and Zemel, R.S., 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539.
Xu, Kelvin, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. “Show, attend and tell: Neural image caption generation with visual attention.” arXiv preprint arXiv:1502.03044 (2015).
Lebret, Rémi, Pedro O. Pinheiro, and Ronan Collobert. “Phrase-based image captioning.” arXiv preprint arXiv:1502.03671 (2015).
Chen, Xinlei, and C. Lawrence Zitnick. “Mind’s eye: A recurrent visual representation for image caption generation.” Neural computation 9, no. 8 (1997): 1735-1780.
F Ferraro, N Mostafazadeh, I Misra, A Agrawal, J Devlin, R Girshick, X He, Visual Storytelling, 2016.
CL Zitnick, R Vedantam, D Parikh, Adopting abstract images for semantic scene understanding, TPAMI, 2016.
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler, Skip-Thought Vectors, Arxiv, 2015.
Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, CVPR 2016.
C Fan, DJ Crandall, DeepDiary: Automatically Captioning Lifelogging Image Streams, ECCV, 2016.
Visual question answering
Antol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. “VQA: Visual Question Answering.” arXiv preprint arXiv:1505.00468 (2015).
Zhou, Bolei, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. “Simple Baseline for Visual Question Answering.” arXiv preprint arXiv:1512.02167 (2015).
Sadeghi, Fereshteh, Santosh K. Divvala, and Ali Farhadi. “VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1456-1464. 2015.
Malinowski, Mateusz, Marcus Rohrbach, and Mario Fritz. “Ask Your Neurons: A Neural-based Approach to Answering Questions about Images.” arXiv preprint arXiv:1505.01121 (2015).
Malinowski, Mateusz, and Mario Fritz. “A multi-world approach to question answering about real-world scenes based on uncertain input.” In Advances in Neural Information Processing Systems, pp. 1682-1690. 2014.
Ren, Mengye, Ryan Kiros, and Richard Zemel. “Exploring models and data for image question answering.” In Advances in Neural Information Processing Systems, pp. 2935-2943. 2015.
A Agrawal, D Batra, D Parikh, Analyzing the Behavior of Visual Question Answering Models, ArXiv, 2016.
A Das, H Agrawal, CL Zitnick, D Parikh, D Batra, Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?, ArXiv, 2016.
Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, Sanja Fidler, MovieQA: Understanding Stories in Movies through Question-Answering, CVPR, 2016.
Vision and Language - Misc
Izadinia, Hamid, Fereshteh Sadeghi, Santosh K. Divvala, Hannaneh Hajishirzi, Yejin Choi, and Ali Farhadi. “Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 10-18. 2015.
Zhu, Yukun, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. “Aligning books and movies: Towards story-like visual explanations by watching movies and reading books.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 19-27. 2015.
Yao, Ting, Tao Mei, and Chong-Wah Ngo. “Learning Query and Image Similarities with Ranking Canonical Correlation Analysis.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 28-36. 2015.
Izadinia, Hamid, Fereshteh Sadeghi, Santosh K. Divvala, Hannaneh Hajishirzi, Yejin Choi, and Ali Farhadi. Segment-Phrase Table for Semantic Segmentation, Visual
QV Le, T Mikolov, Distributed Representations of Sentences and Documents, ICML, 2014.
T Mikolov, K Chen, G Corrado, J Dean, Efficient estimation of word representations in vector space, ArXiv, 2013.
Generative Adversarial Networks (GANs) & Generative models in general
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio "Generative Adversarial Networks", in NIPS 2014.
P Isola, JY Zhu, T Zhou, AA Efros, Image-to-image translation with conditional adversarial networks, Arxiv, 2016.
J Zhao, M Mathieu, Y LeCun, Energy-based generative adversarial network, ICLR, 2017.
EL Denton, S Chintala, R Fergus, Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS, 2015.
C Ledig, L Theis, F Huszár, J Caballero, Photo-realistic single image super-resolution using a generative adversarial network, ArXiv, 2016.
C Vondrick, H Pirsiavash, A Torralba, Generating videos with scene dynamics, Advances In Neural Information Processing Systems, 2011.
C Vondrick, H Pirsiavash, A Torralba, Anticipating Visual Representations from Unlabeled Video, CVPR, 2016.
Alec Radford, Luke Metz and Soumith Chintala "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", in ICLR 2016.
Jun-Yan Zhu, Yong Jae Lee and Alexei A. Efros. "AverageExplorer: Interactive Exploration and Alignment of Visual Data Collections", in SIGGRAPH 2014.
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman and Alexei A. Efros. "Learning a Discriminative Model for the Perception of Realism in Composite Images", in ICCV 2015.
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros, Generative Visual Manipulation on the Natural Image Manifold, ECCV 2016.
K Gregor, I Danihelka, A Graves, DJ Rezende, DRAW: A recurrent neural network for image generation, ArXiv, 2016.
T Salimans, I Goodfellow, W Zaremba, Improved techniques for training gans, NIPS, 2016.
Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu, Spatial transformer nets, NIPS, 2015.
Jeff Donahue, Philipp Krähenbühl, Trevor Darrell, Adversarial Feature Learning, ICLR, 2017.
Mickaël Chen, Ludovic Denoyer, Multi-view Generative Adversarial Networks, !, 2016.
Alexey Dosovitskiy, J. Springenberg, Thomas Brox, Learning to Generate Chairs with Convolutional Neural Networks, CVPR, 2015.
SM Eslami, N Heess, T Weber, Y Tassa, K Kavukcuoglu, GE Hinton, Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, ArXiv, 2015.
SMA Eslami, N Heess, CKI Williams, J Winn, The Shape Boltzmann Machine: a Strong Model of Object Shape, International Journal of Computer Vision 107 (2), 155-176, 2014.
Learning Representations and Attributes
CNN representations
Jayaraman, Dinesh, and Kristen Grauman. “Learning image representations tied to ego-motion.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1413-1421. 2015.
D. Jayaraman and K. Grauman, Look-Ahead Before You Leap: End-to-End Active Recognition by Forecasting the Effect of Motion. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, October 2016.
Chen, Xinlei, and Abhinav Gupta. “Webly Supervised Learning of Convolutional Networks.” arXiv preprint arXiv:1505.01554 (2015).
Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. “Unsupervised Visual Representation Learning by Context Prediction.” arXiv preprint arXiv:1505.05192 (2015).
Wu, Zhirong, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. “3D ShapeNets: A Deep Representation for Volumetric Shapes.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912-1920. 2015.
Xu, Zhongwen, Yi Yang, and Alexander G. Hauptmann. “A discriminative CNN video representation for event detection.” arXiv preprint arXiv:1411.4006 (2014).
Yu, Qian and Liu, Feng and SonG, Yi-Zhe and Xiang, Tao and Hospedales, Timothy and Loy, Chen Change, Sketch Me That Shoe, Computer Vision and Pattern Recognition, 2016
Middle-level representations: attributes
Xiao, Fanyi, and Yong Jae Lee. “Discovering the Spatial Extent of Relative Attributes.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1458-1466. 2015.
Kumar, Neeraj, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar. “Attribute and simile classifiers for face verification.” In Computer Vision, 2009 IEEE 12th International Conference on, pp. 365-372. IEEE, 2009.
Farhadi, Alireza, Ian Endres, Derek Hoiem, and David Forsyth. “Describing objects by their attributes.” In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1778-1785. IEEE, 2009.
Lampert, Christoph H., Hannes Nickisch, and Stefan Harmeling. “Learning to detect unseen object classes by between-class attribute transfer.” In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 951-958. IEEE, 2009.
Parikh, Devi, and Kristen Grauman. “Relative attributes.” In Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 503-510. IEEE, 2011.
Nils Murrugarra-Llerena and Adriana Kovashka, Learning Attributes from Human Gaze. In Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), March 2017
Middle-level representations: parts
Guang Shu, Afshin Dehghan, Mubarak Shah, Improving an Object Detector and Extracting Regions using Superpixels, Computer Vision and Pattern Recognition 2013, Portland, Oregon, June 23-28, 2013.
Low-level representations
Yang Yang, Guang Shu, Mubarak Shah, Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video, Computer Vision and Pattern Recognition 2013, Portland, Oregon, June
Omar Oreifej and Zicheng Liu, HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences, Computer Vision and Pattern Recognition 2013, Portland, Oregon, June 23-28, 2013.
Lenc, Karel, and Andrea Vedaldi. “Understanding image representations by measuring their equivariance and equivalence.” arXiv preprint arXiv:1411.5908 (2014).
Georgiadis, Georgios, Alessandro Chiuso, and Stefano Soatto. “Texture Representations for Image and Video Synthesis.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2058-2066. 2015.
Zero-shot learning
M Palatucci, D Pomerleau, GE Hinton, Zero-shot learning with semantic output codes, NIPS, 2009.
M Rohrbach, M Stark, B Schiele, Evaluating knowledge transfer and zero-shot learning in a large-scale setting, CVPR, 2011.
M Norouzi, T Mikolov, S Bengio, Y Singer, Zero-shot learning by convex combination of semantic embeddings, ArXiv, 2013.
L Fei-Fei, R Fergus, P Perona, PAMI, 2006.
S Antol, CL Zitnick, D Parikh, Zero-shot learning via visual abstraction, ECCV, 2014.
Ziming Zhang and Venkatesh Saligrama, Zero-Shot Learning via Semantic Similarity Embedding, ICCV, 2015.
O Vinyals, C Blundell, T Lillicrap, Matching networks for one shot learning, NIPS, 2016.
K Saenko, B Kulis, M Fritz, T Darrell, Adapting visual category models to new domains, ECCV, 2010.
Video: Actions, Surveillance and Tracking
Action recognition
Dong Zhang and Mubarak Shah, Human Pose Estimation in Videos, International Conference on Computer Vision (ICCV), 2015, Santaigo, Chile, December 13-16, 2015.
Khurram Soomro, Haroon Idrees, and Mubarak Shah, Action Localization in Videos through Context Walk, International Conference on Computer Vision (ICCV), 2015, Santaigo, Chile, December 13-16, 2015.
Yicong Tian, Rahul Sukthankar, Mubarak Shah, Spatiotemporal Deformable Part Models for Action Detection, Computer Vision and Pattern Recognition 2013, Portland, Oregon, June 23-28, 2013.
Soran, Bilge, Ali Farhadi, and Linda Shapiro. “Generating Notifications for Missing Actions: Don’t Forget to Turn the Lights Off!.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 4669-4677. 2015.
Rui Hou, Amir Roshan Zamir, Rahul Sukthankar, Mubarak Shah, DaMN - Discriminative and Mutually Nearest: Exploiting Pairwise Category Proximity for Video Action Recognition, ECCV 2014, Zurich, Switzerland, September 6-12, 2014.
Subhabrata Bhattacharya, Mahdi M. Kalayeh, Rahul Sukthankar, Mubarak Shah, Recognition of Complex Events exploiting Temporal Dynamics between Underlying Concepts, CVPR 2014, Columbus, Ohio, June 23-28, 2014
Waqas Sultani, Imran Saleemi, Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition, CVPR 2014, Columbus, Ohio, June 23-28, 2014.
Y Poleg, A Ephrat, S Peleg, C Arora, Compact CNN for Indexing Egocentric Videos, WACV, 2016.
H Pirsiavash, D Ramanan, Detecting activities of daily living in first-person camera views, CVPR, 2012.
D Tran, L Bourdev, R Fergus, L Torresani, M Paluri, Learning spatiotemporal features with 3d convolutional networks, ICCV, 2015.
Eunbyung Park, Xufeng Han, Tamara L. Berg, Alexander C. Berg, Combining Multiple Sources of Knowledge in Deep CNNs for Action Recognition, WACV, 2016.
Francisco Javier Ordóñez * and Daniel Roggen, Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition, Sensors, 2015.
Tracking
Afshin Dehghan, Yicong Tian, Philip. H. S. Torr and Mubarak Shah, Target Identity-aware Network Flow for Online Multiple Target Tracking, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, Massachusetts, June 8-12, 2015
Afshin Dehghan, Shayan Modiri Assari and Mubarak Shah, GMMCP-Tracker: Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, Massachusetts, June 8-12, 2015
Kim, Chanho, Fuxin Li, Arridhana Ciptadi, and James M. Rehg. “Multiple Hypothesis Tracking Revisited.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 4696-4704. 2015.
Xiang, Yu, Alexandre Alahi, and Silvio Savarese. “Learning to Track: Online Multi-Object Tracking by Decision Making.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 4705-4713. 2015.
Surveillance
Ricci, Elisa, Jagannadan Varadarajan, Ramanathan Subramanian, Samuel Rota Bulo, Narendra Ahuja, and Oswald Lanz. “Uncovering Interactions and Interactors: Joint Estimation of Head, Body Orientation and F-formations from Surveillance Videos.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 4660-4668. 2015.
Zheng, Wei-Shi, Xiang Li, Tao Xiang, Shengcai Liao, Jianhuang Lai, and Shaogang Gong. “Partial Person Re-identification.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 4678-4686. 2015.
Video Summarization
YJ Lee, J Ghosh, K Grauman, Discovering important people and objects for egocentric video summarization, CVPR, 2012
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman, Video Summarization with Long Short-term Memory, ECCV, 2016.
Z Lu, K Grauman, Story-driven summarization for egocentric video, CVPR, 2013.
YF Ma, L Lu, HJ Zhang, M Li, A user attention model for video summarization, ACM international conference on Multimedia, 2002.
A Sharghi, B Gong, M Shah, Query-Focused Extractive Video Summarization, 2016.
Yair Poleg, Tavi Halperin, Chetan Arora, Shmuel Peleg, Egosampling: Fast-forward and stereo for egocentric videos, CVPR, 2015.
Statistical methods and learning
Ji, Pan, Mathieu Salzmann, and Hongdong Li. “Shape Interaction Matrix Revisited and Robustified: Efficient Subspace Clustering with Corrupted and Incomplete Data.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 4687-4695. 2015.
Kontschieder, Peter, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. “Deep Neural Decision Forests.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1467-1475. 2015.
Yang, Zichao, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, and Ziyu Wang. “Deep Fried Convnets.” arXiv preprint arXiv:1412.7149 (2014).
Murdock, Calvin, and Fernando De la Torre. “Semantic Component Analysis.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1484-1492. 2015.
Cao, Xiangyong, Yang Chen, Qian Zhao, Deyu Meng, Yao Wang, Dong Wang, and Zongben Xu. “Low-rank matrix factorization under general mixture noise distributions.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1493-1501. 2015.
Avrithis, Yannis, Yannis Kalantidis, Evangelos Anagnostopoulos, and Ioannis Z. Emiris. “Web-scale image clustering revisited.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1502-1510. 2015.
Xia, Yan, Xudong Cao, Fang Wen, Gang Hua, and Jian Sun. “Learning Discriminative Reconstructions for Unsupervised Outlier Removal.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1511-1519. 2015.
Vision and People
Oberweger, Markus, Paul Wohlhart, and Vincent Lepetit. “Training a Feedback Loop for Hand Pose Estimation.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3316-3324. 2015.
Tang, Danhang, Jonathan Taylor, Pushmeet Kohli, Cem Keskin, Tae-Kyun Kim, and Jamie Shotton. “Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3325-3333. 2015.
Joo, Hanbyul, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. “Panoptic Studio: A Massively Multiview System for Social Motion Capture.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3334-3342. 2015.
Hadi Kiapour, M., Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, and Tamara L. Berg. “Where to Buy It: Matching Street Clothing Photos in Online Shops.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3343-3351. 2015.
Chu, Xiao, Wanli Ouyang, Wei Yang, and Xiaogang Wang. “Multi-Task Recurrent Neural Network for Immediacy Prediction.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3352-3360. 2015.
Cai, Zhaowei, Mohammad Saberian, and Nuno Vasconcelos. “Learning complexity-aware cascades for deep pedestrian detection.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3361-3369. 2015.
3D Computer Vision
Ikehata, Satoshi, Hang Yang, and Yasutaka Furukawa. “Structured Indoor Modeling.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1323-1331. 2015.
Martin-Brualla, Ricardo, David Gallup, and Steven M. Seitz. “3D Time-Lapse Reconstruction from Internet Photos.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1332-1340. 2015.
Ummenhofer, Benjamin, and Thomas Brox. “Global, Dense Multiscale Reconstruction for a Billion Points.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1341-1349. 2015.
Katz, Sagi, and Ayellet Tal. “On the Visibility of Point Clouds.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1350-1358. 2015.
Newcombe, Richard A., Dieter Fox, and Steven M. Seitz. “DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 343-352. 2015.
DJ Rezende, SMA Eslami, S Mohamed, P Battaglia, M Jaderberg, Unsupervised learning of 3d structure from images, NIPS, 2016.
S Wang, M Bai, G Mattyus, H Chu, W Luo, B Yang, J Liang, J Cheverie, TorontoCity: Seeing the World with a Million Eyes, ArXiv, 2016.
W Luo, AG Schwing, R Urtasun, Efficient deep learning for stereo matching, CVPR, 2016.
Y Xiang, W Kim, W Chen, J Ji, C Choy, H Su, R Mottaghi, L Guibas, ObjectNet3D: A Large Scale Database for 3D Object Recognition, ECCV, 2016.
Segmentation, Edges, and Saliency
Pourian, Niloufar, S. Karthikeyan, and B. S. Manjunath. “Weakly Supervised Graph Based Semantic Segmentation by Learning Communities of Image-Parts.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1359-1367. 2015.
Yu, Yizhou, Chaowei Fang, and Zicheng Liao. “Piecewise Flat Embedding for Image Segmentation.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1368-1376. 2015.
Liu, Ziwei, Xiaoxiao Li, Ping Luo, Chen-Change Loy, and Xiaoou Tang. “Semantic image segmentation via deep parsing network.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1377-1385. 2015.
Liang, Xiaodan, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, and Shuicheng Yan. “Human Parsing with Contextualized Convolutional Neural Network.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1386-1394. 2015.
Zhang, Jianming, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, and Radomir Mech. “Minimum Barrier Salient Object Detection at 80 FPS.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1404-1412. 2015.
Shervin Ardeshir, Kofi Malcolm Collins-Sibley, Mubarak Shah, Geo-semantic Segmentation, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, Massachusetts, June 8-12, 2015.
Dong Zhang, Omar Javed, Mubarak Shah, Video Object Co-Segmentation by Regulated Maximum Weight Cliques, ECCV 2014, Zurich, Switzerland, September 6-12, 2014.
Dong Zhang, Omar Javed, Mubarak Shah, Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions, Computer Vision and Pattern Recognition 2013, Portland, Oregon, June 23-28, 2013. (Oral)
A Recasens, C Vondrick, A Khosla, A Torralba, Following Gaze Across Views, ArXiv, 2016.
MM Cheng, Z Zhang, WY Lin, P Torr, BING: Binarized Normed Gradients for Objectness Estimation at 300fps, IEEE CVPR, 2014.
B Alexe, T Deselaers, V Ferrari, What is an object? Computer Vision and Pattern Recognition, 2010.
JRR Uijlings, KEA van de Sande, T Gevers, Selective search for object recognition, International journal of computer vision, 2013.
W Kuo, B Hariharan, J Malik, Deepbox: Learning objectness with convolutional networks, CVPR 2015.
X Huang, C Shen, X Boix, Q Zhao, Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks, CVPR 2015.
S Xie, Z Tu, Holistically-nested edge detection, ICCV, 2015.
Registration, alignment, and stereo
Plotz, Tobias, and Stefan Roth. “Registering Images to Untextured Geometry using Average Shading Gradients.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2030-2038. 2015.
Chen, Qifeng, and Vladlen Koltun. “Robust Nonrigid Registration by Convex Optimization.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2039-2047. 2015.
Pani Paudel, Danda, Adlane Habed, Cedric Demonceaux, and Pascal Vasseur. “Robust and Optimal Sum-of-Squares-Based Point-to-Plane Registration of Image Sets and Structured Scenes.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2048-2056. 2015.
Zhang, Chi, Zhiwei Li, Yanhua Cheng, Rui Cai, Hongyang Chao, and Yong Rui. “Meshstereo: A global stereo model with mesh alignment regularization for view interpolation.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2057-2065. 2015.
Zendel, Oliver, Markus Murschitz, Martin Humenberger, and Wolfgang Herzner. “CV-HAZOP: Introducing Test Data Validation for Computer Vision.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2066-2074. 2015.
3D Representations for Recognition and Localization
Su, Hao, Fan Wang, Eric Yi, and Leonidas J. Guibas. “3D-Assisted Feature Synthesis for Novel Views of an Object.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2677-2685. 2015.
Su, Hao, Charles R. Qi, Yangyan Li, and Leonidas Guibas. “Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views.” arXiv preprint arXiv:1505.05641 (2015).
Wang, Shenlong, Sanja Fidler, and Raquel Urtasun. “Lost Shopping! Monocular Localization in Large Indoor Spaces.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2695-2703. 2015.
Zeisl, Bernhard, Torsten Sattler, and Marc Pollefeys. “Camera Pose Voting for Large-Scale Image-Based Localization.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2704-2712. 2015.
Computational Photography and Image Enhancement
Kadambi, Achuta, Vage Taamazyan, Boxin Shi, and Ramesh Raskar. “Polarized 3D: High-Quality Depth Sensing with Polarization Cues.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3370-3378. 2015.
Levis, Aviad, Yoav Y. Schechner, Amit Aides, and Anthony B. Davis. “Airborne Three-Dimensional Cloud Tomography.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3379-3387. 2015.
Vicente, Yago, F. Tomas, Minh Hoai, and Dimitris Samaras. “Leave-One-Out Kernel Optimization for Shadow Detection.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3388-3396. 2015.
Luo, Yu, Yong Xu, and Hui Ji. “Removing Rain From a Single Image via Discriminative Sparse Coding.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3397-3405. 2015.
Shen, Xiaoyong, Chao Zhou, Li Xu, and Jiaya Jia. “Mutual-Structure for Joint Filtering.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 3406-3414. 2015.
Motion and Correspondence
Li, Yu, Dongbo Min, Michael S. Brown, Minh N. Do, and Jiangbo Lu. “SPM-BP: Sped-up PatchMatch Belief Propagation for Continuous MRFs.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 4006-4014. 2015.
Bailer, Christian, Bertram Taetz, and Didier Stricker. “Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 4015-4023. 2015.
Bristow, Hilton, Jack Valmadre, and Simon Lucey. “Dense Semantic Correspondence where Every Pixel is a Classifier.” arXiv preprint arXiv:1505.04143 (2015).
Zhou, Xiaowei, Menglong Zhu, and Kostas Daniilidis. “Multi-Image Matching via Fast Alternating Minimization.” arXiv preprint arXiv:1505.04845 (2015).
Misc
Hariharan, Bharath, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. “Hypercolumns for object segmentation and fine-grained localization.” arXiv preprint arXiv:1411.5752 (2014).
Papandreou, George, Iasonas Kokkinos, and Pierre-André Savalle. “Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 390-399. 2015.
Zhang, Yuting, Kihyuk Sohn, Ruben Villegas, Gang Pan, and Honglak Lee. “Improving object detection with deep convolutional networks via bayesian optimization and structured prediction.” arXiv preprint arXiv:1504.03293 (2015).
Nguyen, Anh, Jason Yosinski, and Jeff Clune. “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images.” arXiv preprint arXiv:1412.1897 (2014).
Bendale, Abhijit, and Terrance Boult. “Towards Open World Recognition.” arXiv preprint arXiv:1412.5687 (2014).
Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah, Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images, Computer Vision and Pattern Recognition 2013, Portland, Oregon, June 23-28, 2013.
What makes an image memorable? P Isola, J Xiao, A Torralba, A Oliva, Computer Vision and Pattern Recognition (CVPR), 2011.
Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid, Good Practice in Large-Scale Learning for Image Classification, PAMI, 2014.
J Weston, S Chopra, A Bordes, Memory networks, arXiv:1410.3916, 2014. - arxiv.org
A Graves, G Wayne, I Danihelka, Neural Turing Machines, ArXiv, 2014.
V Mnih, K Kavukcuoglu, D Silver, AA Rusu, J Veness, MG Bellemare, Human-level control through deep reinforcement learning, Nature 518 (7540), 529-533, 2015.
D Silver, A Huang, CJ Maddison, A Guez, L Sifre, G Van Den Driessche, Mastering the game of Go with deep neural networks and tree search, Nature 529 (7587), 484-489, 2015.
...