ECE/CS 5582 Computer Vision @UMKC
Fall 2022, Tu/Th 7-8:15pm@NMLC 451, on Zoom: 533-999-8759
Fall 2022, Tu/Th 7-8:15pm@NMLC 451, on Zoom: 533-999-8759
Course Info:
Image is an essential form of information representation and communication in modern society. Nowadays billions of images are generated every minutes in a variety of applications ranging from photography, entertainment, education, defense to medical. A good understanding of vast amount of image content at signal, object, syntactic and semantic levels,would be essential to create new capability and enable new applications. This course is project based, and the objective is to teach practical algorithm and system solutions for image formation/representation, analysis/ understanding, and retrieval, as well as the latest topics like mobile visual search, augmented reality, large scale image tagging, and remote sensing and vision problems.
Upon completion of the course, students should be able to understand the current state of art in the image analysis and understanding area,and have practical programming and system skills to address these problems and have basis for future research exploration and industry career in this area.
Instructor: Zhu Li
Textbooks:
None required, Key References:
R. Szeleski, Computer Vision: Algorithms and Applications, Springer, 2015.
J. E. Solem, Programming Computer Vision with Python, O'Reilly, 2014.
Logistics:
Class will be held at NMLC 451 and also via zoom: Tue and Thr, 7pm-8:15pm, Meeting ID: 533 999 8759, password is announced on CANVAS. Instructor: Zhu Li , Office Hour: Thr: 5:30-7pm via zoom, or by appointment. TA: tbd.
Topics:
1. Camera & Image Formation
1.1. Camera and Image Pipeline
1.2. Color Space, Color Histogram for Image Retrieval
1.3. Image Formation: Geometry, Homography
2. Image Analysis by Filtering
2.1. Spatial Domain Filtering: separable filters, scale space filtering, edge detection
2.2. Interesting Points Detection: SIFT, SURF
3. Object Identification and Retrieval from Images
3.1. Interesting Points Detection
3.2. Interesting Points Aggregation and Deep Feature Aggregation
3.3. Visual Object Hashing and Retrieval
3.4. MPEG Mobile Visual Search: Compact Descriptor for Visual Search (CDVS) System
4. Holistic Image Modeling – Subspace Learning and Deep Neural Networks
4.1. Linear Appearance Modeling
4.2. Eigen Face, Fisher Face and Laplacian Face
4.3. Deep Learning by Convolutional Neural Network Modeling
4.4. Compact Deep Learning System
4.5. Remote Sensing, EO+SAR based image classification
imaging pipeline: human vision system, color perception, color space and color features, digital retina, color demosaicin
[4] P. Maharjan, L. Li, Z. Li, N. Xu, C. Ma, and Y. Li, "Improving Extreme Low-Light Image Denoising via Residual Learning", IEEE Int'l Conf on Multimedia & Expo (ICME), Shanghai, 2019.
camera model, projection matrix, homography and applications
[5] J. Janai, F. Guney, A. Behl, A. Geiger, "Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art", arXiv/CS/CVPR, 2017.
[6] D. Eigen, et.al., Depth Map Prediction from a Single Image using a Multi-Scale Deep Network, arXiv/CS/CVPR/2014.
[7] Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, Richard Szeliski, Build Rome in a day, Proc of CVPR, 2009.
[8] Y. Zheng and M. Salman Asif, "Joint Image and Depth Estimation with Mask-Based Lensless Cameras", IEEE CVPR, 2020.
2D image filters, Gaussian blur filtering, Difference of Gaussian filters, edge detection, HoG edge/texture features
[9] Nalal and B. Triggs, "Histograms of oriented gradients for human detection",IEEE Computer Vision & Pattern Recognition, Vol.1, pp.886-893, June 2005. cited >36750 times
Resources:
1) VL_FEAT matlab tool download
2) Compute homography from SIFT matching, matlab example
3) Solving linear equations using SVD, tutorial slides
Harris detector, shift distortion 2nd moment modeling, hessian trace and determinant fast computing from DoG filtering.
[10] R. Szeleski, "Feature Detection", Chapter 4 in Computer Vision: Algorithm & Application
[11] C. Harris, and M. Stephens, "A Combined Corner and Edge Detector", Alvey Vision Conference, page 1-6. Alvey Vision Club, (1988) cited >19147 times
SIFT detection: DoG filtering to find spatial and scale space extremas, prune initial extremas by checking peak strength and edgeness, affine transform matching distance criteria for SIFT repeatability modeling, SIFT description, how to achieve rotation invariance, SIFT distance: true love test. Image retrieval system performance metrics: true positive, false positive, true negative, false negative, precision, recall.
[12] E. Rosten and T. Drummond, "Machine Learning for High Speed Corner detection", Proc. of ECCV (1) 2006: 430-443
[13] Lindeberg (1999) "Automatic scale selection as a pre-processing stage for interpreting the visual world ", Proc. Fundamental Structural Properties in Image and Pattern Analysis FSPIPA'99, (Budapest, Hungary), September 1999.
[14] D. Lowe, "Distinctive image features from scale-invariant keypoints ", IJCV vol. 60(2), pp 91-110. < (cited >65665 times !)
[15] V. Fragoso, G. Srivastava, A. Nagar, Z. Li, K. Park, and M. Turk, "Cascade of Box (CABOX) Filters for Optimal Scale Space Approximation ", Proc of the 4th IEEE Int'l Workshop on Mobile Vision , Columbus, USA, 2014.
Bag of Words (BoW), basically a histogram aggregation, and VLAD aggregation, which computes vectors instead of histograms, that carry more info from aggregation.This is like a non-linear full connection layer similar to the deep learning case.
[16] David Nister, Henrik Stewenius: "Scalable Recognition with a Vocabulary Tree", IEEE CVPR (2) 2006: pp.2161-2168.
[17] Herve Jegou, Matthijs Douze, Cordelia Schmid: "Improving Bag-of-Features for Large Scale Image Search ", International Journal of Computer Vision vol. 87(3): pp. 316-336 (2010)
Fisher Vector aggregation, basically VLAD with local mahanalobis distance metric, Supervector aggregation, FV vs SV, and AKULA aggregation, none indexable but more effective in re-identification
[18] Herve Jegou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Perez, Cordelia Schmid "Aggregating local image descriptors into compact codes", IEEE Trans on PAMI , 2011.12
[19] Z. Wang, L.-Y Duan, J. Lin, S. J.Chen, T. Huang, W. Gao, " Optimizing Binary Fisher Codes for Visual Search ", IEEE Data Compression Conference (DCC), 2015.
[20] Tomi Kinnunen, Haizhou Li: "An overview of text-independent speaker recognition: From features to supervectors ". Speech Communication 52(1): 12-40 (2010)
[21] A. Nagar, Z. Li, G. Srivastava, K. Park: AKULA - Adaptive Cluster Aggregation for Visual Search. IEEE DCC 2014: 13-22.
Distance metrics, kernel function, Mahanalobis distance, knn-classifier, GMM bayesian classifier
[22] Mahalanobis, P. C. (1936). "On the generalised distance in statistics" (PDF). Proceedings of the National Institute of Sciences of India, 2(1):49-55.
Distance metrics, kernel function, Mahanalobis distance, knn-classifier, GMM bayesian classifier
[23] Andrew Ng,Stanford CS 229 notes on Logistic Regression, and Support Vector Machine.
[24] Christopher J. C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition ",Data Mining and Knowledge Discovery , vol. 2(2): 121-167 (1998) cited >24086 times
Compute SIFT, Dense SIFT and Hog features for images with FV and VLAD aggregation to test classification performance on the Aerial Image data set.
Example code are shared here.
Visual object modeling via key points aggregation, aggregation indexing/hashing, object re-identification and retrieval system, performance metric, the MPEG CDVS (Compact Descriptor for Visual Search) work. Fisher Vector Aggregation, Re-ranking. Deep Learning solutions, Google Landmark Challenge
[25] Ling-Yu Duan, Vijay Chandrasekhar, Jie Chen, Jie Lin, Zhe Wang, Tiejun Huang, Bernd Girod, Wen Gao: Overview of the MPEG-CDVS Standard, IEEE Trans. Image Processing 25(1): 179-194 (2016)
[26] MPEG N15372, Test Model 14: Compact Descriptors for Visual Search
[27] F Radenović, G Tolias, O Chum, "Fine-tuning CNN image retrieval with no human annotation", IEEE Trans on PAMI, 2018.
[28] Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, Bohyung Han, "Large-Scale Image Retrieval with Attentive Deep Local Features (DELF)", IEEE CVPR 2018.
Review of the handcrafted computer vision pipeline.
Implementing a LoG pyramid based direct scale space extrema detection and compare that with SIFT. This will be a bonus assignment.
subspace method in image analysis, holistic approach, taking pixels as input, and compute projections that satisfy the need of dimension reduction, examples are PCA, which computes an explicit projection, and SNE and Laplacian Eigen map, which directly compute lower dimensional embedding by solving an Eigen problem.
[29] Linear Algebra Short Overview notes
[30] S.T. Roweis and L.K. Saul. "Nonlinear dimensionality reduction by Locally Linear Embedding ".Science, 290(5500):2323–2326, 2000. cited >16000+ times
[31] L Van Der Maaten, E Postma, J Van den Herik, "Dimensionality reduction: a comparative review", Jnl of Machine Learning Research, 2009. Dimension reduction toolbox download, short manual download
[32] Laurens van der Maaten, Geoffrey Hinton; Visualizing data usinng t-SNE , Jnl. Machine Learning Research, Vol.9(Nov):2579--2605, 2008. font color=red>cited >7200+ times
face recognition from principle component analysis (eigenface) and linear discriminative analysis (fisherface).
[33] M Turk, A Pentland, "Eigenfaces for recognition, Journal of cognitive neuroscience 3 (1), 71-86, 1991. cited > 20000 times !
[34] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, "Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Trans on PAMI, vol.19(7), 711-720, 1997. cited > 15000 times !
graph embedding, affinity graph preserving embedding, graph Laplacian, Laplacian Embedding, Laplacian face, Graph Fourier transforms (GFT: code) and applications in compression and classification.
[35] Xiaofei He, Shuicheng Yan, Yuxiao Hu, Partha Niyogi (R.I.P.), Hong-Jiang Zhang, "Face Recognition Using Laplacianfaces", IEEE Trans. PAMI vol.27(3): 328-340 (2005). Cited by 4255
[36] X. He, and P. Niyogi, "Locality Preserving Projections", Advances in Neural Information Processing Systems, Vancouver, Canada, 2003.
[37] David I. Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, Pierre Vandergheynst, "The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains", IEEE Signal Process. Mag.vol.30(3): 83-98 (2013)
Piece-wise linear subspace models, divide and conquer. Subspace structure on Grassmann manifold. Subspace optimization for classification and compression.CNN acceleration with piece-wise linear projection.
[38] X. Wang, Z. Li, and D.Tao, "Subspace Indexing on Grassmann Manifold for Large Scale Multimedia Retrieval", IEEE Trans on Image Processing , vol.20(9): 2627-2635 (2011).
[39] Alan Edleman, Tomas Arias, and Steven T. Smith, "The Geometry of Algorithms with Othogonality Constraints", SIAM Journal on Matrix Analysis and Applications, vol. 20(2), 1999. Alan Edleman, Tomas Arias, and Steven T. Smith, Subspace optimization on Stiefel/Grassmann manifolds Matlab toolbox download
Eigenface, Fisherface and Laplacian face for face recognition, extra-credit: piece-wise linear solution via KNN.
Example code and data set download.
CNN based solution for image feature extraction and classification, overview of computating architecture involved and examples of classificaiton networks
[40] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, Lawrence D Jackel, "Backpropagation applied to handwritten zip code recognition, Neural Computing , 1989.
[41] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton: ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012: 1106-1114
[42] A. Verladi et.al, "MatConvNet - Conv Neural Network for Matlab", 2016.
CNN training with back propagation algorithms, loss function and the differential of loss functions. SoftMax for classification problem, and Triplet loss for the identification problem
[43] Florian Schroff, Dmitry Kalenichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015. Source code at download
[44] W. Hu, Math 6001 Non-Linear Optimization in Machine Learning.
100 pts - pretrained VGG16 embedding of image, and then apply subspace method in supervised learning for the 45-class aerial images
50-pts extra: FV aggregation of the conv features and then MLP learning.
Biren's tutorial on deep learning tools in matlab and python.
Example code of deep learning in matlab: download.
Obj detection in images, AdaBoost, CNN based, YOLO and SSD. Data set: DOTA, Aerial Image Obj detection.
[45] P. Viola, M. Jones, "Rapid object detection using a boosted cascade of simple features", Proc of IEEE CVPR 2001.
[46] R. Girshick, et. al., "Rich feature hierarchies for accurate object detection and semantic segmentation", Proc of IEEE CVPR, 2014.
[47] J. Redmon et. al., "You Only Look Once: Unified, Real-Time Object Detection", Proc of IEEE CVRP, 2015.
[48] W. Liu et al, "SSD: Single Shot MultiBox Detector", Proc of European Conf on Computer Vision (ECCV), 2016.
Final exam review: subspace methods, PCA, LDA, LPP, deep learning methods.