Course Info:
Video is becoming the dominant traffic over the Internet and mobile networks, and the ever improving video resolution and quality puts even more demands on the already stressed networks. In this class, we will cover the important topics on video signal processing, modeling, compression, and communication, that are part of the on-going efforts in both academia and industry. Topics will include info theory on source coding, lossless coding (hoffman, golomb, arithmetic) schemes, transform, quantization, and super resolution techniques, motion compensation in video signal modeling, scalability, rate-distortion optimization issues in video coding, as well as the current status of video coding standards, especially HEVC, Screen Content Coding, Point Cloud and Light Field Coding, and Immersive video coding. Recent results with deep learning based compression, for which my group is very active in research, will also be discussed in details. On the communication side, we will be focusing on the topics of QoE metrics, Joint Source-Channel Coding and Error Resilience, Media Transport like RTP, RTSP, HTTP/Websocket, WebRTC, and related congestion measure and control. Issues and solutions for Over the Top (OTT) video delivery solution, CDN architecture, Video cache management and de-duplication.
Upon completeion of this course, students should have both theoretical background and algorithmic experiences in dealing with video signal modeling, compression, and streaming, and have practical experiences in setting up video networking sessions with the latest tools from various industrial standardization effort.
Textbooks:
Key References:
Y. Wang, Y.-Q. Zhang, and J. Ostermann, Video Processing and Communication
K. Sayood, Introduction to Data Compression, Elsevier, 3rd Ed, 2005.
Logistics:
Instructor: Zhu Li , Office Hour: Tue: 4-5:30pm @ FH 560E, or by appointment
TA: Paras Maharjan, Office: FH 262. Office Hour: Mon-Thur: 2-4pm, Fri: 11am-1pm.
Topics:
1. Info Theory Foundation for Entropy Coding
2. Lossless coding: Hoffman, Golomb codes, Arithmetic Code, Dictionary Methods
3. Image Coding: Transforms, Wavelets, Vector Quantization, JPEG and JPEG2000
4. Video Signal Processing: Color Space, Sampling, Motion Compensation
5. Video Coding and Standards
6. Scalable Coding and Super Resolution
7. Deep Learning in Compression
7. Rate-Distortion Optimization in Coding
8. QoE metrics, Joint Source Channel Coding, Error Resilience in Video Communication
9. Modern Media Transport: RTP, HTTP/WebSocket, WebRTC
10. Congestion Measure and Control in Media Streaming
11. CDN Caching and Cache De-Duplication
12. MPEG DASH/MMT standards
[1] Thomas Wiegand; Heiko Schwarz, Video Coding: Part II of Fundamentals of Source and Video Coding , now, 2016.
Lec-02 Info Theory & Entropy Coding
info theoretical foundation for lossless coding: entropy, conditional entropy, relative entropy, mutual information, prefix coding, kraft-mcmillan inequality. text: Sayood, Chpt 2. Math Preliminary.
[2] Z. Liu, L. Karam, "Mutual information-based analysis of JPEG2000 contexts", IEEE Trans on Image Processing, vol. 14(4), 2005.
[3] M. J. Weinberger, G. Seroussi, G. Sapiro: "The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS". IEEE Tran. on Image Processing 9(8): 1309-1324 (2000)
[4] Matt Mahoney, PAQ Compression.
Lec-03 Hoffman and Golomb Coding
practical lossless coding schemes, huffman coding, and golomb coding, applications in the real world applications. text: Sayood: Chpt. 3.2, 3.5.
[5] Chapt 2: Huffman Coding, Sayood, Intro to Data Compression
Lec-04 Entropy Coding in JPEG and MPEG
Variable Length Coding in JPEG/MPEG, text: Sayood: Chpt. 13.6. for your amusement, entropy vs kolmogorov complexity.
[6] Gisle Bjontegaard and Karl Lillevold, Context-adaptive VLC (CVLC) coding of coefficients, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6 JVT-C028 , May, 2002
HW-1: Prediction and Entropy Coding.
sample code: lossless_code.m, getEntropy.m, pixelPrediction.m,
Lec-05 Arithmetic Coding I
arithmetic coding, and binary arithmetic coding. text book: Sayood, Chpt. 4. Arithmetic Coding.
[7] Arithmetic Coding, Chpt 4 of Sayood, Intro to Data Compression
Python implementation of AC: torchac
Lec-06 Arithmetic Coding II
Context adaptive AC, Lossless PAQ, CABAC in H.264, and Deep Learning image coding with hyper-prior.
[8] D. Marpe, H. Schwarz, and T. Wiegand, Context Based Adaptive Binary Arithmetic Coding in the H.264/AVC Standard, (CABAC), IEEE Trans. on Circuits & System for Video Tech , vol. 13(7), 2003.
[9] Christopher Olah, Understanding LSTM
[10] B. Knoll and N. de Freitas, A Machine Learning Perspective on Predictive Coding with PAQ, 2008.
[11] Tong Chen, Haojie Liu, Zhan Ma, Qiu Shen, Xun Cao, and Yao Wang, Neural Image Compression via Non-Local AttentionOptimization and Improved Context Modeling, in review, IEEE Trans on Image Processing.
PAQ compression, code
Lec-07 Transforms & Quantization I
PCA/KLT the optimal decorrelating/min MSE reconstruction transform, DCT a good approximation of PCA, SVD: another signal dependent local basis expansion representation. Auto Encoder: non-linear latent representation.
[12] T. Wiegand and H. Schwarz, Chpt 7, Transforms, in Source Coding: Part I of Fundamentals of Source and Video Coding , 2011.
Lec-08 Transforms & Quantization II
Compression of signals on a graph, Graph Fourier Transforms (GFT), Autoencoder, scalar quantization, vector quantization, distortion metrics.
[13] Z.Li, and A. Katsaggelos, A Color Vector Quantization Based Video Coder , IEEE Int'l Conf on Image Processing Rochester, NY, 2002.
[14] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura and P. Vandergheynst, "Graph Signal Processing: Overview, Challenges, and Applications," in Proceedings of the IEEE, vol. 106, no. 5, pp. 808-828, May 2018, doi: 10.1109/JPROC.2018.2820126.
[15] Y. Shao, Z. Zhang, Z. Li, and G. Li, “Attribute Compression of 3D Point Clouds Using Laplacian Sparsity Optimized Graph Transform”, IEEE Visual Communication & Image Processing (VCIP) Conf, St. Petersberg, FL, 2017.
HW-2: Arithmetic Coding
transform coding and VQ example code [download]
Graph Fourier Transform (GFT) implementation [download]
EPFL's GSP toolbox
Lec-09 Video Signal Processing I
Color space manipulation, YCbCr sampling, block based motion estimation and compensation, sub-pixel resolution motion, fast algorithms for motion estimation.
[16] Renxiang Li, Bing Zeng, Ming L. Liou, A new three-step search algorithm for block motion estimation , IEEE Trans. Circuits Syst. Video Tech vol.4(4): 438-442 (1994). [top 10 cited T-CSVT paper]
[17] Shan Zhu, Kai-Kuang Ma, A new diamond search algorithm for fast block-matching motion estimation, IEEE Transactions on Image Processing vol.9(2): 287-290 (2000).
[18] H. Zhang, L. Song, L. Li, Z. Li, and X.K. Yang, Compression Priors Assisted Convolutional Neural Network for Fractional Interpolation, accepted, IEEE Transactions on Circuits and Systems for Video Tech. (T-CSVT), 2020. [17] Li Li, Houqiang Li, Dong Liu, Zhu Li, Haitao Yang, Sixin Lin, Huanbang Chen, Feng Wu: [19] An Efficient Four-Parameter Affine Motion Model for Video Coding. IEEE Trans on CSVT, vol.28(8): 1934-1948 (2018)
block motion estimation sample code: motion_estimation.m, [download]
Lec-10 Video Signal Processing II
Motion Vector Prediction, Intra Prediction, Deblocking filter, Deep Learning Deblocking, SAO and Scalability.
[20] Jani Lainema, Frank Bossen, Woojin Han, Junghye Min, Kemal Ugur, Intra Coding of the HEVC Standard. IEEE Trans. Circuits Syst. Video Tech . 22(12): 1792-1801 (2012)
[21] Chih-Ming Fu, Elena Alshina, Alexander Alshin, Yu-Wen Huang, Ching-Yeh Chen, Chia-Yang Tsai, Chih-Wei Hsu, Shawmin Lei, Jeong-Hoon Park, Woojin Han, Sample Adaptive Offset in the HEVC Standard IEEE Trans. Circuits Syst. Video Tech , vol.22(12): 1755-1764 (2012).
[22] Jill M. Boyce, Yan Ye, Jianle Chen, Adarsh K. Ramasubramonian,Overview of SHVC: Scalable Extensions of the High Efficiency Video Coding Standard. IEEE Trans. Circuits Syst. Video Tech. 26(1): 20-34 (2016)
[23] Y. Li, L. Li, Z. Li, and H. Li, “Hierarchical Piece-Wise Canonical Correlation Analysis Projections for Efficient Intra-Prediction Coding ”, IEEE Visual Communication & Image Processing (VCIP) Conf, St. Petersberg, FL, 2017.
[24] Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression Artifacts Reduction by a Deep Convolutional Network, IEEE ICCV 2015.
[25] B. Kathriya, Z. Li and G. van der Auwera, "Joint Pixel and Frequency Feature Learning and Fusion via Channel-wise Transformer for High-Efficiency Learned In-Loop Filter in VVC", accepted, IEEE Trans. on Circuits & Sys. for Video Tech., 2023
Lec-11 Video Coding Standards
Affine motion model (4-parameter), HEVC/H.265 Standard, Light Field and Mesh Coding.
[26] L. Li, Z. Li, B. Li, D. Liu, and H.-Q. Li, "Pseudo Sequence based 2-D hierarchical reference structure for Light-Field Image Compression", IEEE Data Compression Conference (DCC), Snow Bird, 2017.
[27] Li Li, Houqiang Li, Dong Liu, Zhu Li, Haitao Yang, Sixin Lin, Huanbang Chen, Feng Wu: An Efficient Four-Parameter Affine Motion Model for Video Coding. IEEE Trans. Circuits Syst. Video Technol. 28(8): 1934-1948 (2018).
[28] Joao Ascenso, AG4 Workshop: JPEG AI Based Image Compression, 2022.
[29] Marius Preda, AG4 Workshop: MESH Compression, 2022.
HW-3: Motion Compensation in Video Coding
ref: Python implementation of 3-step fast search, github.
Lec -12 Mid-term Review
Affine motion model (4-parameter), HEVC/H.265 Standard, Light Field and Mesh Coding.
Lec -13 Learning-based Image and Video Compression
Recent advances in image and video coding with deep learning, Variational Autoencoder (VAE), VAE with a simulated quantization loss, motion compensation in latent space for video coding.
[30] D. P. Kingma, M. Welling, Auto-Encoding Variantional Bayes, ICLR, 2014.
[31] Johannes Balle, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational im-age compression with a scale hyperprior. International Conference on Learning Representations ICLR 2018. URL: https://openreview.net/forum?id=rkcQFMZRb
[32] T. Chen, H. Liu, Z. Ma, Q. Shen, X. Cao and Y. Wang, End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling, in IEEE Transactions on Image Processing, vol. 30, pp. 3179-3191, 2021, doi: 10.1109/TIP.2021.3058615.
[33] Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao: DVC: An End-To-End Deep Video Compression Framework. IEEE CVPR 2019: 11006-11015
Variational Autoencoder (VAE) Python Tutorial.
NJU's end to end learning based image coding github
DVC [33] and FVC implementation
Lec -14 Rate-Distortion Optimization I
Math foundations for R-D optimization, Lagrangian relaxation, KKT conditions, and convex hull approximation. Applications in DASH bottleneck coorination.
[34] S. Boyd and L Vandenberghe, Convex Optimization, http://stanford.edu/~boyd/cvxbook
[35] Z. Li, G. Schuster, and A. Katsaggelos, "MINMAX Optimal Video Summarization", IEEE Trans on CSVT, vol 15(10), 2005.
[36] Z. Li. et. al, US Patent 20160373500, Method and apparatus for distributed bottleneck coordination in dash with resource pricing. Issued, 2020.
Lec -15 Rate-Distortion Optimization II
RD Optimization in video coding, convex hull approximation by Lagrangian method, rate control in H.263. H.265, and the rho-domain method. The deep learning approaches in R-D modeling.
[37] Z. He and D. O. Wu, "Linear Rate Control and Optimum Statistical Multiplexing for H.264 Video Broadcast," in IEEE Transactions on Multimedia, vol. 10, no. 7, pp. 1237-1249, Nov. 2008, doi: 10.1109/TMM.2008.2004903.
[38] Y. Sun, L. Li, Z. Li, and S. Liu, "Referenceless Rate-Distortion Modeling with Learning from Bitstream and Pixel Features", ACM Multimedia (MM), Seattle, 2020.
Lec -16 VAE talk at PCS 2018
Johaness Belle's keynote speech at the Picture Coding Symposium (PCS) 2018, which gives details on the VAE learning based compression scheme, how to use additive noise to simulate quantization at training time, while at inference, use real quantization. Also a new work, TinyLIC [33], which beats VVC intra while much smaller in network size is open sourced.
[39] M. Lu and Z. Ma, "High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation", and code is here at github.
Lec -17 Quality of Experience (QoE)
Quality of Service (QoS), and Quality of Experience (QoE), MOS scores, PSNR, MSE, SSIM and VMAF
[36] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, Eero P. Simoncelli: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Processing vol.13(4): 600-612 (2004) cited > 39000 times
[37] Sheikh, Hamid R., and Alan C. Bovik. "Image information and visual quality". IEEE Transactions on image processing vol.15.2 (2006): 430-444.
[38] S. Li, F. Zhang, L. Ma, and K. Ngan, "Image Quality Assessment by Separately Evaluating Detail Losses and Additive Impairments", IEEE Transactions on Multimedia, vol. 13, no. 5, pp. 935–949, Oct. 2011.
[39] Q. Yang, Z. Ma, Y. Xu, Z. Li and J. Sun, "Inferring Point Cloud Quality via Graph Similarity," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 3015-3029, 1 June 2022.
Lec -18 MPEG System - ISOBMFF and DASH
MP4 file format, aka, ISO Based Media File Format(ISOBMFF), an abstraction layer to interfacing application APIs with underlying compression tech. DASH - Dynamic Adaptive Streaming over HTTP, utilizing HTTP CDN infrastructure, and allow a client centric congestion measurement and streaming control solution, highly successful and widely used.
[39] ISO/IEC 14496-12: Information technology - Coding of audio-visual objects - Part 12: ISO base media file format
[40] Thomas Stockhammer, Dynamic adaptive streaming over HTTP: standards and design principles. ACM MMSys 2011: 133-144
[41] Ingo Kofler, Robert Kuschnig, Hermann Hellwagner, Implications of the ISO base media file format on adaptive HTTP streaming of H.264/SVC. IEEE CCNC 2012: 549-553.
HW-4: QoE.
Lec -19 Media Transport I: Protocols
IP network and over the top (OTT) based transport and control schemes, RTP, RTCP, RTSP, WebRTC, WebSocket...etc.
[42] HTTP 1.1, RFC 2616
[43] QUIC - A Muliplexed Stream Transport over UDP, The Chromium Project on QUIC
[44] SPDY Protocol, Inet Draft
[45] WebRTC W3C draft
Lec -20 Media Transport II: Congestion Models
TCP window based congestion control, congestion modeling, delay vs loss based, WebRTC/RMCAT congestion models, packet arrival jitter based congestion models. GCC - Google Congestion Control.
[46] M. Chiang, S.-H Low, A. Robert Calderbank, J.C. Doyle,"Layering as optimization decomposition: A mathematical theory of network architectures", Proceedings of the IEEE Vol.95 (1), 255-312
[47] Luca De Cicco, Gaetano Carlucci, Saverio Mascolo: Understanding the Dynamic Behaviour of the Google Congestion Control for RTCWeb. PacketVideo 2013: 1-8
[48] Gaetano Carlucci, Luca De Cicco, Saverio Mascolo: HTTP over UDP: an experimental investigation of QUIC. SAC 2015: 609-61
Course Project: SAR image coding with Learned Image Coder (LIC)
Lec -21 Forward Error Correction (FEC)
ARQ vs FEC, erasure correction, BCH/Reed-Solomon Coding, Digital Fountain Coding for rateless erasure correction over a broadcast channel
[49] A. Shokrollahi, Raptor Codes, IEEE Trans on Info Theory, vol. 52(6), June 2006.
[50] Reed, Irving S.; Solomon, Gustave (1960), "Polynomial Codes over Certain Finite Fields", Journal of the Society for Industrial and Applied Mathematics, 8 (2): 300–304, doi:10.1137/0108018, example matlab implementation.
[51] Wen Ji, Zhu Li: Joint layered video and digital fountain coding for multi-channel video broadcasting. ACM Multimedia 2010: 1223-1226
Lec - 21 Final Exam Review
Covering the 2nd half of the semester, MPEG systems, QoE metric, Transport, Congestion Modeling, FEC.