ECE/CS 5578 Multimedia Communication
Fall 2022, Tue/Thr, 2:30-3:45pm, FH310, also on Zoom: 533-999-8759 (pwd in CANVAS)
ECE/CS 5578 Multimedia Communication
Fall 2022, Tue/Thr, 2:30-3:45pm, FH310, also on Zoom: 533-999-8759 (pwd in CANVAS)
Course Info:
Video is becoming the dominant traffic over the Internet and mobile networks, and the ever improving video resolution and quality puts even more demands on the already stressed networks. In this class, we will cover the important topics on video signal processing, modeling, compression, and communication, that are part of the on-going efforts in both academia and industry. Topics will include info theory on source coding, lossless coding (hoffman, golomb, arithmetic) schemes, transform, quantization, and super resolution techniques, motion compensation in video signal modeling, scalability, rate-distortion optimization issues in video coding, as well as the current status of video coding standards, especially HEVC, Screen Content Coding, Point Cloud and Light Field Coding, and Immersive video coding. Recent results with deep learning based compression, for which my group is very active in research, will also be discussed in details. On the communication side, we will be focusing on the topics of QoE metrics, Joint Source-Channel Coding and Error Resilience, Media Transport like RTP, RTSP, HTTP/Websocket, WebRTC, and related congestion measure and control. Issues and solutions for Over the Top (OTT) video delivery solution, CDN architecture, Video cache management and de-duplication.
Upon completeion of this course, students should have both theoretical background and algorithmic experiences in dealing with video signal modeling, compression, and streaming, and have practical experiences in setting up video networking sessions with the latest tools from various industrial standardization effort.
Textbooks:
Key References:
Y. Wang, Y.-Q. Zhang, and J. Ostermann, Video Processing and Communication
K. Sayood, Introduction to Data Compression, Elsevier, 3rd Ed, 2005.
Logistics:
Instructor: Zhu Li , Office Hour: Tue: 4-5:30pm @ FH 560E, or by appointment
TA: Paras Maharjan, Office: FH 262. Office Hour: Mon-Thur: 2-4pm, Fri: 11am-1pm.
Topics:
1. Info Theory Foundation for Entropy Coding
2. Lossless coding: Hoffman, Golomb codes, Arithmetic Code, Dictionary Methods
3. Image Coding: Transforms, Wavelets, Vector Quantization, JPEG and JPEG2000
4. Video Signal Processing: Color Space, Sampling, Motion Compensation
5. Video Coding and Standards
6. Scalable Coding and Super Resolution
7. Deep Learning in Compression
7. Rate-Distortion Optimization in Coding
8. QoE metrics, Joint Source Channel Coding, Error Resilience in Video Communication
9. Modern Media Transport: RTP, HTTP/WebSocket, WebRTC
10. Congestion Measure and Control in Media Streaming
11. CDN Caching and Cache De-Duplication
12. MPEG DASH/MMT standards
[1] Thomas Wiegand; Heiko Schwarz, Video Coding: Part II of Fundamentals of Source and Video Coding , now, 2016.
info theoretical foundation for lossless coding: entropy, conditional entropy, relative entropy, mutual information, prefix coding, kraft-mcmillan inequality. text: Sayood, Chpt 2. Math Preliminary.
[2] Z. Liu, L. Karam, "Mutual information-based analysis of JPEG2000 contexts", IEEE Trans on Image Processing, vol. 14(4), 2005.
[3] M. J. Weinberger, G. Seroussi, G. Sapiro: "The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS". IEEE Tran. on Image Processing 9(8): 1309-1324 (2000)
practical lossless coding schemes, huffman coding, and golomb coding, applications in the real world applications. text: Sayood: Chpt. 3.2, 3.5.
[4] Chapt 2: Huffman Coding, Sayood, Intro to Data Compression
[5] Peter Grunwald and Paul Vitanyi, Shannon Information and Kolmogorov Complexity
Variable Length Coding in JPEG/MPEG, text: Sayood: Chpt. 13.6. for your amusement, entropy vs kolmogorov complexity.
[6] Gisle Bjontegaard and Karl Lillevold, Context-adaptive VLC (CVLC) coding of coefficients, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6 JVT-C028 , May, 2002
arithmetic coding, and binary arithmetic coding. text book: Sayood, Chpt. 4. Arithmetic Coding.
[7] Arithmetic Coding, Chpt 4 of Sayood, Intro to Data Compression
Context adaptive arithmetic coding, PAQ compression, H264 CABAC, and hyperprior context model in deep learning VAE (Variational Auto Encoder) based compression.
[8] D. Marpe, H. Schwarz, and T. Wiegand, Context Based Adaptive Binary Arithmetic Coding in the H.264/AVC Standard, (CABAC), IEEE Trans. on Circuits & System for Video Tech , vol. 13(7), 2003.
[9] Christopher Olah, Understanding LSTM
[10] B. Knoll and N. de Freitas, A Machine Learning Perspective on Predictive Coding with PAQ, 2008.
[11] Tong Chen, Haojie Liu, Zhan Ma, Qiu Shen, Xun Cao, and Yao Wang, Neural Image Compression via Non-Local AttentionOptimization and Improved Context Modeling, in review, IEEE Trans on Image Processing.
PCA/KLT the optimal decorrelating/min MSE reconstruction transform, DCT a good approximation of PCA, SVD: another signal dependent local basis expansion representation. Auto Encoder: non-linear latent representation.
[12] T. Wiegand and H. Schwarz, Chpt 7, Transforms, in Source Coding: Part I of Fundamentals of Source and Video Coding , 2011.
Compression of signals on a graph, Graph Fourier Transforms (GFT), Autoencoder, scalar quantization, vector quantization, distortion metrics.
transform coding and vect quantization example code [download]
[13] Z.Li, and A. Katsaggelos, A Color Vector Quantization Based Video Coder , IEEE Int'l Conf on Image Processing Rochester, NY, 2002.
[14] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura and P. Vandergheynst, "Graph Signal Processing: Overview, Challenges, and Applications," in Proceedings of the IEEE, vol. 106, no. 5, pp. 808-828, May 2018, doi: 10.1109/JPROC.2018.2820126.
[15] Y. Shao, Z. Zhang, Z. Li, and G. Li, “Attribute Compression of 3D Point Clouds Using Laplacian Sparsity Optimized Graph Transform”, IEEE Visual Communication & Image Processing (VCIP) Conf, St. Petersberg, FL, 2017.
YCbCr color space sampling, block based motion model, sub-pixel resolution motion estimation, fast algorithms in motion estimation.
block motion estimation sample code: motion_estimation.m, [download]
[16] Renxiang Li, Bing Zeng, Ming L. Liou, A new three-step search algorithm for block motion estimation , IEEE Trans. Circuits Syst. Video Tech vol.4(4): 438-442 (1994). [top 10 cited T-CSVT paper]
[17] Shan Zhu, Kai-Kuang Ma, A new diamond search algorithm for fast block-matching motion estimation, IEEE Transactions on Image Processing vol.9(2): 287-290 (2000).
[18] H. Zhang, L. Song, L. Li, Z. Li, and X.K. Yang, Compression Priors Assisted Convolutional Neural Network for Fractional Interpolation, accepted, IEEE Transactions on Circuits and Systems for Video Tech. (T-CSVT), 2020. [17] Li Li, Houqiang Li, Dong Liu, Zhu Li, Haitao Yang, Sixin Lin, Huanbang Chen, Feng Wu: [19] An Efficient Four-Parameter Affine Motion Model for Video Coding. IEEE Trans on CSVT, vol.28(8): 1934-1948 (2018)
you will use different images according to your student number check out the HW-2 description and data set here: [download]
Motion Vector Prediction, Intra Prediction, Deblocking, SAO and Scalability.
[20] Jani Lainema, Frank Bossen, Woojin Han, Junghye Min, Kemal Ugur, Intra Coding of the HEVC Standard. IEEE Trans. Circuits Syst. Video Tech . 22(12): 1792-1801 (2012)
[21] Chih-Ming Fu, Elena Alshina, Alexander Alshin, Yu-Wen Huang, Ching-Yeh Chen, Chia-Yang Tsai, Chih-Wei Hsu, Shawmin Lei, Jeong-Hoon Park, Woojin Han, Sample Adaptive Offset in the HEVC Standard IEEE Trans. Circuits Syst. Video Tech , vol.22(12): 1755-1764 (2012).
[22] Jill M. Boyce, Yan Ye, Jianle Chen, Adarsh K. Ramasubramonian,Overview of SHVC: Scalable Extensions of the High Efficiency Video Coding Standard. IEEE Trans. Circuits Syst. Video Tech. 26(1): 20-34 (2016)
[23] Y. Li, L. Li, Z. Li, and H. Li, “Hierarchical Piece-Wise Canonical Correlation Analysis Projections for Efficient Intra-Prediction Coding ”, IEEE Visual Communication & Image Processing (VCIP) Conf, St. Petersberg, FL, 2017.
[24] Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression Artifacts Reduction by a Deep Convolutional Network, IEEE ICCV 2015.
[25] Dominic Springer, Wolfgang Schnurrer, Andreas Weinlich, Andreas Heindel, Jurgen Seiler, Andre Kaup: Open source HEVC analyzer for rapid prototyping (HARP). IEEE ICIP 2014: 2189-2191. download
Affine motion compensation (a key tool in HEVC), 4-parameter complex motion estimation, HEVC/H.265 Video Coding Standard, Light Field compression.
[26] L. Li, Z. Li, B. Li, D. Liu, and H.-Q. Li, "Pseudo Sequence based 2-D hierarchical reference structure for Light-Field Image Compression", IEEE Data Compression Conference (DCC), Snow Bird, 2017.
[27] Li Li, Houqiang Li, Dong Liu, Zhu Li, Haitao Yang, Sixin Lin, Huanbang Chen, Feng Wu: An Efficient Four-Parameter Affine Motion Model for Video Coding. IEEE Trans. Circuits Syst. Video Technol. 28(8): 1934-1948 (2018).
[28] Joao Ascenso, AG4 Workshop: JPEG AI Based Image Compression, 2022.
[29] Marius Preda, AG4 Workshop: MESH Compression, 2022.
Implementing exhaustive integer pel and fractional pel motion estimation and compensation algorithm, test with the Bosphorus and Jackey sequences. Implementing the fast diamond search and compare the complexity
Mid-term review.
Autoencoder as a non-linear PCA, variational autoencoder, differentiable rate-distortion loss based training of variational autoencoder image coder, current SOTA in learning based compression, DVC and FVC for video coding .
[30] D. P. Kingma, M. Welling, Auto-Encoding Variantional Bayes, ICLR, 2014.
[31] Johannes Balle, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational im-age compression with a scale hyperprior. International Conference on Learning Representations ICLR 2018. URL: https://openreview.net/forum?id=rkcQFMZRb
[32] T. Chen, H. Liu, Z. Ma, Q. Shen, X. Cao and Y. Wang, End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling, in IEEE Transactions on Image Processing, vol. 30, pp. 3179-3191, 2021, doi: 10.1109/TIP.2021.3058615.
[33] Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao: DVC: An End-To-End Deep Video Compression Framework. IEEE CVPR 2019: 11006-11015
[34] Zhihao Hu, Guo Lu, Dong Xu: FVC: A New Framework Towards Deep Video Compression in Feature Space. CVPR 2021: 1502-1511
Johaness Belle's keynote speech at the Picture Coding Symposium (PCS) 2018, which gives details on the VAE learning based compression scheme, how to use additive noise to simulate quantization at training time, while at inference, use real quantization. Also a new work, TinyLIC [33], which beats VVC intra while much smaller in network size is open sourced.
[35] M. Lu and Z. Ma, "High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation", and code is here at github.
Quality of Service (QoS), and Quality of Experience (QoE), MOS scores, PSNR, MSE, SSIM and VMAF
[36] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, Eero P. Simoncelli: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Processing vol.13(4): 600-612 (2004) cited > 39000 times
[37] Sheikh, Hamid R., and Alan C. Bovik. "Image information and visual quality". IEEE Transactions on image processing vol.15.2 (2006): 430-444.
[38] S. Li, F. Zhang, L. Ma, and K. Ngan, "Image Quality Assessment by Separately Evaluating Detail Losses and Additive Impairments", IEEE Transactions on Multimedia, vol. 13, no. 5, pp. 935–949, Oct. 2011.
MP4 file format, aka, ISO Based Media File Format(ISOBMFF), an abstraction layer to interfacing application APIs with underlying compression tech. DASH - Dynamic Adaptive Streaming over HTTP, utilizing HTTP CDN infrastructure, and allow a client centric congestion measurement and streaming control solution, highly successful and widely used.
[39] ISO/IEC 14496-12: Information technology - Coding of audio-visual objects - Part 12: ISO base media file format
[40] Thomas Stockhammer, Dynamic adaptive streaming over HTTP: standards and design principles. ACM MMSys 2011: 133-144
[41] Ingo Kofler, Robert Kuschnig, Hermann Hellwagner, Implications of the ISO base media file format on adaptive HTTP streaming of H.264/SVC. IEEE CCNC 2012: 549-553.
IP network and over the top (OTT) based transport and control schemes, RTP, RTCP, RTSP, WebRTC, WebSocket...etc.
[42] HTTP 1.1, RFC 2616
[43] QUIC - A Muliplexed Stream Transport over UDP, The Chromium Project on QUIC
[44] SPDY Protocol, Inet Draft
[45] WebRTC W3C draft
TCP window based congestion control, congestion modeling, delay vs loss based, WebRTC/RMCAT congestion models, packet arrival jitter based congestion models
[46] M. Chiang, S.-H Low, A. Robert Calderbank, J.C. Doyle,"Layering as optimization decomposition: A mathematical theory of network architectures", Proceedings of the IEEE Vol.95 (1), 255-312
[47] Luca De Cicco, Gaetano Carlucci, Saverio Mascolo: Understanding the Dynamic Behaviour of the Google Congestion Control for RTCWeb. PacketVideo 2013: 1-8
[48] Gaetano Carlucci, Luca De Cicco, Saverio Mascolo: HTTP over UDP: an experimental investigation of QUIC. SAC 2015: 609-61
use ffmpeg tool to manipulate video sequences, and compute PSNR, MSE, SSIM metrics for different coding configurations.
ARQ vs FEC, erasure correction, BCH/Reed-Solomon Coding, Digital Fountain Coding for rateless erasure correction over a broadcast channel
[49] A. Shokrollahi, Raptor Codes, IEEE Trans on Info Theory, vol. 52(6), June 2006.
[50] Reed, Irving S.; Solomon, Gustave (1960), "Polynomial Codes over Certain Finite Fields", Journal of the Society for Industrial and Applied Mathematics, 8 (2): 300–304, doi:10.1137/0108018, example matlab implementation.
[51] Wen Ji, Zhu Li: Joint layered video and digital fountain coding for multi-channel video broadcasting. ACM Multimedia 2010: 1223-1226
2nd half coverage review.
Announcement: Final Exam, in person, 11/29 Tuesday, in class.