ECE/CS 5582 Computer Vision @UMKC

Fall 2022, Tu/Th 7-8:15pm@NMLC 451, on Zoom: 533-999-8759

Course Info:

Image is an essential form of information representation and communication in modern society. Nowadays billions of images are generated every minutes in a variety of applications ranging from photography, entertainment, education, defense to medical. A good understanding of vast amount of image content at signal, object, syntactic and semantic levels,would be essential to create new capability and enable new applications. This course is project based, and the objective is to teach practical algorithm and system solutions for image formation/representation, analysis/ understanding, and retrieval, as well as the latest topics like mobile visual search, augmented reality, large scale image tagging, and remote sensing and vision problems.

Upon completion of the course, students should be able to understand the current state of art in the image analysis and understanding area,and have practical programming and system skills to address these problems and have basis for future research exploration and industry career in this area.

Instructor: Zhu Li

Textbooks:

None required, Key References:

R. Szeleski, Computer Vision: Algorithms and Applications, Springer, 2015.
J. E. Solem, Programming Computer Vision with Python, O'Reilly, 2014.

Logistics:

Class will be held at NMLC 451 and also via zoom: Tue and Thr, 7pm-8:15pm, Meeting ID: 533 999 8759, password is announced on CANVAS. Instructor: Zhu Li , Office Hour: Thr: 5:30-7pm via zoom, or by appointment. TA: tbd.

Topics:

1. Camera & Image Formation

1.1. Camera and Image Pipeline

1.2. Color Space, Color Histogram for Image Retrieval

1.3. Image Formation: Geometry, Homography

2. Image Analysis by Filtering

2.1. Spatial Domain Filtering: separable filters, scale space filtering, edge detection

2.2. Interesting Points Detection: SIFT, SURF

3. Object Identification and Retrieval from Images

3.1. Interesting Points Detection

3.2. Interesting Points Aggregation and Deep Feature Aggregation

3.3. Visual Object Hashing and Retrieval

3.4. MPEG Mobile Visual Search: Compact Descriptor for Visual Search (CDVS) System

4. Holistic Image Modeling – Subspace Learning and Deep Neural Networks

4.1. Linear Appearance Modeling

4.2. Eigen Face, Fisher Face and Laplacian Face

4.3. Deep Learning by Convolutional Neural Network Modeling

4.4. Compact Deep Learning System

4.5. Remote Sensing, EO+SAR based image classification

8/23 Lec 01: Introduction to Comptuer Vision

[1] ETHZ Tutorial: Image Processing Using Matlab

8/25 Lec 02: Image Formation - Color

imaging pipeline: human vision system, color perception, color space and color features, digital retina, color demosaicin

[2] R. Szeleski, "Image Formation - Color Space", Chapter 2 in Computer Vision: Algorithm & Application

[3] B.S.Manjunath, J.-R> Ohm, V. Vasudevan, and A. Yamada, "Color and Texture Descriptors", IEEE Trans on Circuits & System for Video Tech , vol. 11(6), 2001.

[4] P. Maharjan, L. Li, Z. Li, N. Xu, C. Ma, and Y. Li, "Improving Extreme Low-Light Image Denoising via Residual Learning", IEEE Int'l Conf on Multimedia & Expo (ICME), Shanghai, 2019.

8/30 Lec 03: Image Formation - Geometry

camera model, projection matrix, homography and applications

[5] J. Janai, F. Guney, A. Behl, A. Geiger, "Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art", arXiv/CS/CVPR, 2017.

[6] D. Eigen, et.al., Depth Map Prediction from a Single Image using a Multi-Scale Deep Network, arXiv/CS/CVPR/2014.

[7] Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, Richard Szeliski, Build Rome in a day, Proc of CVPR, 2009.

[8] Y. Zheng and M. Salman Asif, "Joint Image and Depth Estimation with Mask-Based Lensless Cameras", IEEE CVPR, 2020.

9/01 Lec 04: Image Filters and Edge Detection

2D image filters, Gaussian blur filtering, Difference of Gaussian filters, edge detection, HoG edge/texture features

[9] Nalal and B. Triggs, "Histograms of oriented gradients for human detection",IEEE Computer Vision & Pattern Recognition, Vol.1, pp.886-893, June 2005. cited >36750 times

Homework-1: Color histogram and homography

Resources:

1) VL_FEAT matlab tool download

2) Compute homography from SIFT matching, matlab example

3) Solving linear equations using SVD, tutorial slides

09/06: Lec05: Interest Points Detection

Harris detector, shift distortion 2nd moment modeling, hessian trace and determinant fast computing from DoG filtering.

[10] R. Szeleski, "Feature Detection", Chapter 4 in Computer Vision: Algorithm & Application

[11] C. Harris, and M. Stephens, "A Combined Corner and Edge Detector", Alvey Vision Conference, page 1-6. Alvey Vision Club, (1988) cited >19147 times

09/08: Lec06: SIFT

SIFT detection: DoG filtering to find spatial and scale space extremas, prune initial extremas by checking peak strength and edgeness, affine transform matching distance criteria for SIFT repeatability modeling, SIFT description, how to achieve rotation invariance, SIFT distance: true love test. Image retrieval system performance metrics: true positive, false positive, true negative, false negative, precision, recall.

[12] E. Rosten and T. Drummond, "Machine Learning for High Speed Corner detection", Proc. of ECCV (1) 2006: 430-443

[13] Lindeberg (1999) "Automatic scale selection as a pre-processing stage for interpreting the visual world ", Proc. Fundamental Structural Properties in Image and Pattern Analysis FSPIPA'99, (Budapest, Hungary), September 1999.

[14] D. Lowe, "Distinctive image features from scale-invariant keypoints ", IJCV vol. 60(2), pp 91-110. < (cited >65665 times !)

[15] V. Fragoso, G. Srivastava, A. Nagar, Z. Li, K. Park, and M. Turk, "Cascade of Box (CABOX) Filters for Optimal Scale Space Approximation ", Proc of the 4th IEEE Int'l Workshop on Mobile Vision , Columbus, USA, 2014.

09/13: Lec07: Feature Aggregation I (BoW & VLAD)

Bag of Words (BoW), basically a histogram aggregation, and VLAD aggregation, which computes vectors instead of histograms, that carry more info from aggregation.This is like a non-linear full connection layer similar to the deep learning case.

[16] David Nister, Henrik Stewenius: "Scalable Recognition with a Vocabulary Tree", IEEE CVPR (2) 2006: pp.2161-2168.

[17] Herve Jegou, Matthijs Douze, Cordelia Schmid: "Improving Bag-of-Features for Large Scale Image Search ", International Journal of Computer Vision vol. 87(3): pp. 316-336 (2010)

09/15: Lec08:Feature Aggregation II (FV, SV & AKULA)

Fisher Vector aggregation, basically VLAD with local mahanalobis distance metric, Supervector aggregation, FV vs SV, and AKULA aggregation, none indexable but more effective in re-identification

[18] Herve Jegou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Perez, Cordelia Schmid "Aggregating local image descriptors into compact codes", IEEE Trans on PAMI , 2011.12

[19] Z. Wang, L.-Y Duan, J. Lin, S. J.Chen, T. Huang, W. Gao, " Optimizing Binary Fisher Codes for Visual Search ", IEEE Data Compression Conference (DCC), 2015.

[20] Tomi Kinnunen, Haizhou Li: "An overview of text-independent speaker recognition: From features to supervectors ". Speech Communication 52(1): 12-40 (2010)

[21] A. Nagar, Z. Li, G. Srivastava, K. Park: AKULA - Adaptive Cluster Aggregation for Visual Search. IEEE DCC 2014: 13-22.

09/20: Lec09: Classification I

Distance metrics, kernel function, Mahanalobis distance, knn-classifier, GMM bayesian classifier

[22] Mahalanobis, P. C. (1936). "On the generalised distance in statistics" (PDF). Proceedings of the National Institute of Sciences of India, 2(1):49-55.

09/22: Lec10: Classification II

Distance metrics, kernel function, Mahanalobis distance, knn-classifier, GMM bayesian classifier

[23] Andrew Ng,Stanford CS 229 notes on Logistic Regression, and Support Vector Machine.

[24] Christopher J. C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition ",Data Mining and Knowledge Discovery , vol. 2(2): 121-167 (1998) cited >24086 times

Homework 2: Gradient Feature Aggregation

Compute SIFT, Dense SIFT and Hog features for images with FV and VLAD aggregation to test classification performance on the Aerial Image data set.

Example code are shared here.

09/27 Lec 11: Object Re-Identification

Visual object modeling via key points aggregation, aggregation indexing/hashing, object re-identification and retrieval system, performance metric, the MPEG CDVS (Compact Descriptor for Visual Search) work. Fisher Vector Aggregation, Re-ranking. Deep Learning solutions, Google Landmark Challenge

[25] Ling-Yu Duan, Vijay Chandrasekhar, Jie Chen, Jie Lin, Zhe Wang, Tiejun Huang, Bernd Girod, Wen Gao: Overview of the MPEG-CDVS Standard, IEEE Trans. Image Processing 25(1): 179-194 (2016)

[26] MPEG N15372, Test Model 14: Compact Descriptors for Visual Search

[27] F Radenović, G Tolias, O Chum, "Fine-tuning CNN image retrieval with no human annotation", IEEE Trans on PAMI, 2018.

[28] Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, Bohyung Han, "Large-Scale Image Retrieval with Attentive Deep Local Features (DELF)", IEEE CVPR 2018.

09/29 Lec 12: Mid-term Review

Review of the handcrafted computer vision pipeline.

HW-3: LoG Key Points

Implementing a LoG pyramid based direct scale space extrema detection and compare that with SIFT. This will be a bonus assignment.

10/04 Lec 13: Low Dimension Embedding

subspace method in image analysis, holistic approach, taking pixels as input, and compute projections that satisfy the need of dimension reduction, examples are PCA, which computes an explicit projection, and SNE and Laplacian Eigen map, which directly compute lower dimensional embedding by solving an Eigen problem.

[29] Linear Algebra Short Overview notes

[30] S.T. Roweis and L.K. Saul. "Nonlinear dimensionality reduction by Locally Linear Embedding ".Science, 290(5500):2323–2326, 2000. cited >16000+ times

[31] L Van Der Maaten, E Postma, J Van den Herik, "Dimensionality reduction: a comparative review", Jnl of Machine Learning Research, 2009. Dimension reduction toolbox download, short manual download

[32] Laurens van der Maaten, Geoffrey Hinton; Visualizing data usinng t-SNE , Jnl. Machine Learning Research, Vol.9(Nov):2579--2605, 2008. font color=red>cited >7200+ times

10/06 Lec 14: Eigenface and Fisherface

face recognition from principle component analysis (eigenface) and linear discriminative analysis (fisherface).

[33] M Turk, A Pentland, "Eigenfaces for recognition, Journal of cognitive neuroscience 3 (1), 71-86, 1991. cited > 20000 times !

[34] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, "Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Trans on PAMI, vol.19(7), 711-720, 1997. cited > 15000 times !

10/11 Lec 15: Graph Embedding and Laplacianface

graph embedding, affinity graph preserving embedding, graph Laplacian, Laplacian Embedding, Laplacian face, Graph Fourier transforms (GFT: code) and applications in compression and classification.

[35] Xiaofei He, Shuicheng Yan, Yuxiao Hu, Partha Niyogi (R.I.P.), Hong-Jiang Zhang, "Face Recognition Using Laplacianfaces", IEEE Trans. PAMI vol.27(3): 328-340 (2005). Cited by 4255

[36] X. He, and P. Niyogi, "Locality Preserving Projections", Advances in Neural Information Processing Systems, Vancouver, Canada, 2003.

[37] David I. Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, Pierre Vandergheynst, "The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains", IEEE Signal Process. Mag.vol.30(3): 83-98 (2013)

11/01 Lec 16: Subspace Models on Grassmann Manifold

Piece-wise linear subspace models, divide and conquer. Subspace structure on Grassmann manifold. Subspace optimization for classification and compression.CNN acceleration with piece-wise linear projection.

[38] X. Wang, Z. Li, and D.Tao, "Subspace Indexing on Grassmann Manifold for Large Scale Multimedia Retrieval", IEEE Trans on Image Processing , vol.20(9): 2627-2635 (2011).

[39] Alan Edleman, Tomas Arias, and Steven T. Smith, "The Geometry of Algorithms with Othogonality Constraints", SIAM Journal on Matrix Analysis and Applications, vol. 20(2), 1999. Alan Edleman, Tomas Arias, and Steven T. Smith, Subspace optimization on Stiefel/Grassmann manifolds Matlab toolbox download

Homework 4: Subspace method in face recognition

Eigenface, Fisherface and Laplacian face for face recognition, extra-credit: piece-wise linear solution via KNN.

Example code and data set download.

11/03: Lec-17 Deep Learning with CNN I

CNN based solution for image feature extraction and classification, overview of computating architecture involved and examples of classificaiton networks

[40] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, Lawrence D Jackel, "Backpropagation applied to handwritten zip code recognition, Neural Computing , 1989.

[41] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton: ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012: 1106-1114

[42] A. Verladi et.al, "MatConvNet - Conv Neural Network for Matlab", 2016.

11/08: Lec-18 Deep Learning with CNN II

CNN training with back propagation algorithms, loss function and the differential of loss functions. SoftMax for classification problem, and Triplet loss for the identification problem

[43] Florian Schroff, Dmitry Kalenichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015. Source code at download

[44] W. Hu, Math 6001 Non-Linear Optimization in Machine Learning.

Project: Transfer Learning with Fisher Vector and MLP

100 pts - pretrained VGG16 embedding of image, and then apply subspace method in supervised learning for the 45-class aerial images

50-pts extra: FV aggregation of the conv features and then MLP learning.

Biren's tutorial on deep learning tools in matlab and python.

Example code of deep learning in matlab: download.

11/15: Lec-19 Object Detection in Images

Obj detection in images, AdaBoost, CNN based, YOLO and SSD. Data set: DOTA, Aerial Image Obj detection.

[45] P. Viola, M. Jones, "Rapid object detection using a boosted cascade of simple features", Proc of IEEE CVPR 2001.

[46] R. Girshick, et. al., "Rich feature hierarchies for accurate object detection and semantic segmentation", Proc of IEEE CVPR, 2014.

[47] J. Redmon et. al., "You Only Look Once: Unified, Real-Time Object Detection", Proc of IEEE CVRP, 2015.

[48] W. Liu et al, "SSD: Single Shot MultiBox Detector", Proc of European Conf on Computer Vision (ECCV), 2016.

11/17: Lec-20: Review of Subspace and Deep Learning Methods

Final exam review: subspace methods, PCA, LDA, LPP, deep learning methods.

Page updated

Google Sites

Report abuse