UMKC CS/ECE 5582 Computer Vision

Course Info:

Image is an essential form of information representation and communication in modern society. Nowadays billions of images are generated every minutes in a variety of applications ranging from photography, entertainment, education, defense to medical. A good understanding of vast amount of image content at signal, object, syntactic and semantic levels,would be essential to create new capability and enable new applications. This course is project based, and the objective is to teach practical algorithm and system solutions for image formation/representation, analysis/ understanding, and retrieval, as well as the latest topics like mobile visual search, augmented reality, large scale image tagging, and remote sensing and vision problems.

Upon completion of the course, students should be able to understand the current state of art in the image analysis and understanding area,and have practical programming and system skills to address these problems and have basis for future research exploration and industry career in this area.

Instructor: Zhu Li

Textbooks:

None required, Key References:

Topics:

1. Camera & Image Formation 

1.1. Camera and Image Pipeline

1.2. Color Space, Color Histogram for Image Retrieval

1.3. Image Formation: Geometry, Homography

2. Image Analysis by Filtering

2.1. Spatial Domain Filtering: separable filters, scale space filtering, edge detection

2.2. Interesting Points Detection: SIFT, SURF

3. Object Identification and Retrieval from Images

3.1. Interesting Points Detection

3.2. Interesting Points Aggregation and Deep Feature Aggregation

3.3. Visual Object Hashing and Retrieval

3.4. MPEG Mobile Visual Search: Compact Descriptor for Visual Search (CDVS) System

4. Holistic Image Modeling – Subspace Learning and Deep Neural Networks

4.1. Linear Appearance Modeling

4.2. Eigen Face, Fisher Face and Laplacian Face

4.3. Deep Learning by Convolutional Neural Network Modeling

4.4. Compact Deep Learning System

4.5. Remote Sensing, EO+SAR based image classification

imaging pipeline: human vision system, color perception, color space and color features, digital retina, color demosaicing

[2] R. Szeleski, "Image Formation - Color Space", Chapter 2 in Computer Vision: Algorithm & Application

[3] B.S.Manjunath, J.-R> Ohm, V. Vasudevan, and A. Yamada, "Color and Texture Descriptors", IEEE Trans on Circuits & System for Video Tech , vol. 11(6), 2001.

[4] P. Maharjan, L. Li, Z. Li, N. Xu, C. Ma, and Y. Li, "Improving Extreme Low-Light Image Denoising via Residual Learning", IEEE Int'l Conf on Multimedia & Expo (ICME), Shanghai, 2019.

.

dominant color histogram; 

camera model, projection matrix, homography from keypoints matching and applications 

[5] J. Janai, F. Guney, A. Behl, A. Geiger, "Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art", arXiv/CS/CVPR, 2017.

[6] D. Eigen, et.al., Depth Map Prediction from a Single Image using a Multi-Scale Deep Network, arXiv/CS/CVPR/2014.

[7] Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, Richard Szeliski, Build Rome in a day, Proc of CVPR, 2009.

[8] Y. Zheng and M. Salman Asif, "Joint Image and Depth Estimation with Mask-Based Lensless Cameras", IEEE CVPR, 2020.

[9] C. Henry, B. Kathariya, M. S. Asif, Z. Li and G. York, "Aerial Image Classification through Thin Lensless Camera," 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR), CA, USA, 2022, pp. 27-30, doi: 10.1109/MIPR54900.2022.00012.

Oxford VL_FEAT image/vision package, [download] 

SIFT based Homography computation by SVD and image wrapping, [matlab code.].

image filtering, smoothing and sharpening; image gradient, edge detection. 

[10] Nalal and B. Triggs, "Histograms of oriented gradients for human detection",IEEE Computer Vision & Pattern Recognition, Vol.1, pp.886-893, June 2005. cited >36750 times 

Harris detector, shift distortion 2nd moment modeling, hessian trace and determinant fast computing from DoG filtering. 

[11] R. Szeleski, "Feature Detection", Chapter 4 in Computer Vision: Algorithm & Application

[12] C. Harris, and M. Stephens, "A Combined Corner and Edge Detector", Alvey Vision Conference, page 1-6. Alvey Vision Club, (1988) cited >20426 times 

Homework-1 Homography and Color Histogram

Solving linear equations using SVD, tutorial slides 

Harris Detector example: download

Lec-06  SIFT

SIFT detection: DoG filtering to find spatial and scale space extremas, prune initial extremas by checking peak strength and edgeness, affine transform matching distance criteria for SIFT repeatability modeling, SIFT description, how to achieve rotation invariance, SIFT distance: true love test. Image retrieval system performance metrics: true positive, false positive, true negative, false negative, precision, recall.

[13] E. Rosten and T. Drummond, "Machine Learning for High Speed Corner detection", Proc. of ECCV (1) 2006: 430-443

[14] Lindeberg (1999) "Automatic scale selection as a pre-processing stage for interpreting the visual world ", Proc. Fundamental Structural Properties in Image and Pattern Analysis FSPIPA'99, (Budapest, Hungary), September 1999.

[15] D. Lowe, "Distinctive image features from scale-invariant keypoints ", IJCV vol. 60(2), pp 91-110. < (cited >69111 times !)

[16] V. Fragoso, G. Srivastava, A. Nagar, Z. Li, K. Park, and M. Turk, "Cascade of Box (CABOX) Filters for Optimal Scale Space Approximation ", Proc of the 4th IEEE Int'l Workshop on Mobile Vision , Columbus, USA, 2014.

Python SIFT implementation download.

Matlab SIFT Implementation


Bag of Words (BoW), basically a histogram aggregation, and VLAD aggregation, which computes vectors instead of histograms, that carry more info from aggregation.This is like a non-linear full connection layer similar to the deep learning case. 

[17] David Nister, Henrik Stewenius: "Scalable Recognition with a Vocabulary Tree", IEEE CVPR (2) 2006: pp.2161-2168.

[18] Herve Jegou, Matthijs Douze, Cordelia Schmid: "Improving Bag-of-Features for Large Scale Image Search ", International Journal of Computer Vision vol. 87(3): pp. 316-336 (2010)

Fisher Vector aggregation, basically VLAD with local mahanalobis distance metric, Supervector aggregation, FV vs SV, and AKULA aggregation, none indexable but more effective in re-identification.

[19] Jorge Sanchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek, "Image Classification with the Fisher Vector: Theory and Practice", Intl Jnl  of Computer Vision, 2013, 105 (3).

[20] Z. Wang, L.-Y Duan, J. Lin, S. J.Chen, T. Huang, W. Gao, " Optimizing Binary Fisher Codes for Visual Search ", IEEE Data Compression Conference (DCC), 2015.

[21] Tomi Kinnunen, Haizhou Li: "An overview of text-independent speaker recognition: From features to supervectors ". Speech Communication 52(1): 12-40 (2010)

[22] A. Nagar, Z. Li, G. Srivastava, K. Park: AKULA - Adaptive Cluster Aggregation for Visual Search. IEEE DCC 2014: 13-22.

Homework-2: Gradient Features and Aggregations

Fisher Vector Python Implementation

Compute PCA and GMM models for SIFT: matlab code, training data

Distance metrics, kernel function, Mahanalobis distance, knn-classifier, GMM Bayesian classifier 

[23] Mahalanobis, P. C. (1936). "On the generalised distance in statistics" (PDF). Proceedings of the National Institute of Sciences of India, 2(1):49-55. 

Optimization based classifier design, including Logistic regression, which use the gradient search and Support Vector Machine (SVM), which solves a constrained optimization problem with Lagrangian relaxation.  

[24] Andrew Ng,Stanford CS 229 notes on Logistic Regression, and Support Vector Machine.

[25] Christopher J. C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition ",Data Mining and Knowledge Discovery , vol. 2(2): 121-167 (1998) cited >24086 times 

Fast LoG filter via 2-SVD approximation: fastLoG.m, svd_approx.m

SVM tutorial matlab code

Visual object modeling via key points aggregation, aggregation indexing/hashing, object re-identification and retrieval system, performance metric, the MPEG CDVS (Compact Descriptor for Visual Search) work. Fisher Vector Aggregation, Re-ranking. Deep Learning solutions, Google Landmark Challenge 

[26] Ling-Yu Duan, Vijay Chandrasekhar, Jie Chen, Jie Lin, Zhe Wang, Tiejun Huang, Bernd Girod, Wen Gao: Overview of the MPEG-CDVS Standard, IEEE Trans. Image Processing 25(1): 179-194 (2016)

[27] MPEG N15372, Test Model 14: Compact Descriptors for Visual Search

[28] Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, Bohyung Han, "Large-Scale Image Retrieval with Attentive Deep Local Features (DELF)", IEEE CVPR 2018 

[29] P. Maharjan, L. T. Vanfossan, Z. Li, and J. Shen, “Fast LoG SIFT Keypoint Detector”, IEEE Multimedia Signal Processing (MMSP) Workshop, Poitiers, France, 2023.

Lec-12Mid-Term Review

subspace method in image analysis, holistic approach, taking pixels as input, and compute projections that satisfy the need of dimension reduction, examples are PCA, which computes an explicit projection, and SNE and Laplacian Eigen map, which directly compute lower dimensional embedding by solving an Eigen problem. 

[29] Linear Algebra Short Overview notes

[30] S.T. Roweis and L.K. Saul. "Nonlinear dimensionality reduction by Locally Linear Embedding ".Science, 290(5500):2323–2326, 2000. cited >16000+ times

[31] L Van Der Maaten, E Postma, J Van den Herik, "Dimensionality reduction: a comparative review", Jnl of Machine Learning Research, 2009. 

[32] Laurens van der Maaten, Geoffrey Hinton; Visualizing data usinng t-SNE , Jnl. Machine Learning Research, Vol.9(Nov):2579--2605, 2008. font color=red>cited >7200+ times

Dimension reduction toolbox download, short manual download


subspace method in face recognition from principle component analysis (eigenface) and linear discriminative analysis (fisherface). 

[33] M Turk, A Pentland, "Eigenfaces for recognition, Journal of cognitive neuroscience 3 (1), 71-86, 1991. cited > 20000 times !

[34] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, "Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Trans on PAMI, vol.19(7), 711-720, 1997. cited > 15000 times !

subspace methods in face recognition, data set and sample code

graph embedding, affinity graph preserving embedding, graph Laplacian, Laplacian Embedding, Laplacian face, Graph Fourier transforms and applications in compression and classification.

[35] Xiaofei He, Shuicheng Yan, Yuxiao Hu, Partha Niyogi (R.I.P.), Hong-Jiang Zhang, "Face Recognition Using Laplacianfaces", IEEE Trans. PAMI vol.27(3): 328-340 (2005). Cited by 4255

[36] X. He, and P. Niyogi, "Locality Preserving Projections", Advances in Neural Information Processing Systems, Vancouver, Canada, 2003.

[37] David I. Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, Pierre Vandergheynst, "The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains", IEEE Signal Process. Mag.vol.30(3): 83-98 (2013), cited 3664 times.

[38] Qi Yang, Zhan Ma, Yiling Xu, Zhu Li, Jun Sun, "Inferring Point Cloud Quality via Graph Similarity", IEEE Trans. Pattern Anal. Mach. Intell. 44(6): 3015-3029 (2022)

Graph Fourier transforms (GFT: code)

Matlab getConfMap.m

Homework-4 Subspace Methods

Piece-wise linear subspace models, divide and conquer. Subspace structure on Grassmann manifold. Subspace optimization for classification and compression.CNN acceleration with piece-wise linear projection.

[39] X. Wang, Z. Li, and D.Tao, "Subspace Indexing on Grassmann Manifold for Large Scale Multimedia Retrieval", IEEE Trans on Image Processing , vol.20(9): 2627-2635 (2011).

[40] Alan Edleman, Tomas Arias, and Steven T. Smith, "The Geometry of Algorithms with Othogonality Constraints", SIAM Journal on Matrix Analysis and Applications, vol. 20(2), 1999.  Alan Edleman, Tomas Arias, and Steven T. Smith, 

Subspace optimization on Stiefel/Grassmann manifolds Matlab toolbox download

Deep CNN based solution for image feature extraction and classification, overview of computating architecture involved and examples of classificaiton networks


[41] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, Lawrence D Jackel, "Backpropagation applied to handwritten zip code recognition, Neural Computing , 1989.

[42] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton: ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012: 1106-1114

[43] A. Verladi et.al, "MatConvNet - Conv Neural Network for Matlab", 2016.


Training deep CNN with back propagation algorithms, loss function and the differential of loss functions. SoftMax for classification problem, and Triplet loss for the identification problem


[44] Florian Schroff, Dmitry Kalenichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015. Source code at download

[45] W. Hu, Math 6001 Non-Linear Optimization in Machine Learning.

[46] M. Tan, Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks ", ICML 2019, cited 12000 times.

Notes on LeNet backpropagation derivation [pdf]

EfficientNets implementation.

Obj detection in images, AdaBoost, CNN based, YOLO and SSD. Data set: DOTA, Aerial Image Obj detection. 


[47] P. Viola, M. Jones, "Rapid object detection using a boosted cascade of simple features", Proc of IEEE CVPR 2001.

[48] J. Redmon et. al., "You Only Look Once: Unified, Real-Time Object Detection", Proc of IEEE CVRP, 2015. cited 34005 times.

[49] W. Liu et al, "SSD: Single Shot MultiBox Detector", Proc of European Conf on Computer Vision (ECCV), 2016, cited 28740 times.

[50] C. Marrs, B. Kathriya, Z. Li, and G. York, “Plug-and-Play Joint Image Deblurring and Detection”, IEEE Multimedia Signal Processing (MMSP) Workshop, Poitiers, France, 2023.

Homework-5/Project: Transfer Learning