Hao Zhang (张皓)

Hao Zhang (Chinese: 张皓; born Feb. 1994)

LAMDA Group

National Key Laboratory for Novel Software Technology

Department of Computer Science and Technology

Nanjing University, China

Email: zhangh0214#gmail.com or zhangh#lamda.nju.edu.cn

[知乎] [GitHub] [Google Scholar] [LinkedIn] [Quora]

Biography

Currently, Hao is pursuing his M.Sc.'s degree at LAMDA Group led by professor Zhi-Hua Zhou. His adviser is professor Jianxin Wu. Before that, he received his B.Sc.'s degree in Nanjing University, China, in 2016. In the same year, he was admitted to study for a M.Sc.'s degree in Nanjing University without entrance examination. He used to be a member of Excellent Engineer Training Program, from Ministry of Education of China. He is a columnist of AI Era (新智元).

Besides, he has been a member of Communist Party of China since 2013.

Research Interests

Hao's current research interests mainly include machine learning and computer vision, especially on deep learning and visual recognition. He is working on exploiting convolutional features in both supervised and unsupervised ways to improve the efficiency of convolutional neural networks.

Besides, he is particularly interested in linear algebra and its applications. Linear algebra is closely related to real-world applications, such as linear systems (Ax=b), dynamic systems (u_{k+1} = Au_k or du(t)/dt = Au(t)), optimization problems (arg min_u f(u)), linear transformations (T(u)=Au), etc. Please refer to this page for further discussions.

Awards and Honors

  • First Place in Apparent Personality Analysis Contest. ECCV, 2016.
  • National Scholarship. Ministry of Education of China, 2013.
  • Member of Excellent Engineer Training Program, Ministry of Education of China, 2013--2016.
  • Excellent All-round Student. Jiangsu Provincial Department of Education, 2015.
  • Top-grade in Jiangsu Provincial Undergraduate Electronics Design Contest. Jiangsu Committee of the National Undergraduate Electronics Design Contest, 2014.
  • Top-grade "Red Sun" Scholarship. Nanjing Red Sun CO., LTD, 2014. (only 20 from 12,000 undergraduate students in Nanjing University achieved per year)
  • Top-grade Graduate Research Scholarship. Nanjing University, 2016, 2017, 2018.
  • Excellent Students. Nanjing University, 2014.
  • Excellent Cadre of Students. Nanjing University, 2013.
  • Excellent Undergraduate Student. Nanjing University, 2016.
  • People's Scholarship Speciality Specialization. Nanjing University, 2015.
  • People's Scholarship Social Work Specialization. Nanjing University, 2014.
  • Second Class Xingquan Responsibility Scholarship. Nanjing University, 2015.
  • Excellent League Member (twice). Youth League Committee of Nanjing University, 2014, 2015.
  • Excellent Student in Summer Social Practice. Youth League Committee of Nanjing University, 2013.

Publications

J.-H. Luo, H. Zhang, H.-Y. Zhou, C.-W. Xie, J. Wu, and W. Lin. ThiNet: Pruning CNN filters for a thinner net. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018, in press. [pdf]

C.-L. Zhang, H. Zhang, X.-S. Wei, and J. Wu. Deep bimodal regression for apparent personality analysis. In Proceedings of the 14th European Conference on Computer Vision (ECCV'16) Workshops, Amsterdam, The Netherlands, Oct. 2016, LNCS 9915, pp. 311-324. [pdf] [slides] [project page] [code]

X.-S. Wei, C.-L. Zhang, H. Zhang, and J. Wu. Deep bimodal regression of personality traits from short video sequences. IEEE Transactions on Affective Computing (TAC), 2018, 9(3): 303-315. [pdf] [project page] [code]

H. Zhang and J. Wu. A survey on unsupervised image retrieval using deep features. Journal of Computer Research and Development (CRAD), 2018, 55(9): 1829-1842. [pdf] (in Chinese)

H. Zhang and J. Wu. Ensemble max-pooling: Is only the maximum activation useful when pooling. In Proceedings of CCF Conference on Artificial Intelligence (CCFAI'17). Kunming, China. Aug. 2017. [pdf] [poster] [spotlight] (acceptance rate: 34.0%) (in Chinese)

H. Zhang and J. Wu. Ensemble max-pooling: Is only the maximum activation useful when pooling. Journal of University of Science and Technology of China (JUST), 2017, 47(10): 799-807. [pdf] (in Chinese)

Articles

从零推导支持向量机(SVM) [pdf]

  • 支持向量机(SVM)是一个非常经典且高效的分类模型。但是,支持向量机中涉及许多复杂的数学推导,并需要比较强的凸优化基础,使得有些初学者虽下大量时间和精力研读,但仍一头雾水,最终对其望而却步。本文旨在从零构建支持向量机,涵盖从思想到形式化、再简化、最后实现的完整过程,并展现其完整思想脉络和所有公式推导细节。本文力图做到逻辑清晰而删繁就简,避免引入不必要的概念、记号等。此外,本文并不需要读者有凸优化的基础,以减轻读者的负担。对于用到的优化技术,在文中均有其介绍。尽管现在深度学习十分流行,了解支持向量机的原理、对想法的形式化、简化、及一步步使模型更一般化的过程,及其具体实现仍然有其研究价值。另一方面,支持向量机仍有其一席之地。相比深度神经网络,支持向量机特别擅长于特征维数多于样本数的情况,而小样本学习至今仍是深度学习的一大难题。

三次简化一张图:一招理解LSTM/GRU门控机制 [link] (repost on AI100, 搜狐, 网易)

  • RNN是深度学习中用于处理时序数据的关键技术,目前已在自然语言处理、语音识别、视频识别等领域取得重要突破,然而梯度消失现象制约着RNN的实际应用。LSTM和GRU是两种目前广为使用的RNN变体,它们通过门控机制很大程度上缓解了RNN的梯度消失问题,但是它们的内部结构看上去十分复杂,使得初学者很难理解其中的原理所在。本文介绍“三次简化一张图”的方法,对LSTM和GRU的内部结构进行分析。该方法具有通用性, 适用于所有门控机制的原理分析。

直观梳理深度学习

  • 深度学习目前已成为发展最快、最令人兴奋的机器学习领域之一,许多卓有建树的论文已经发表,而且已有很多高质量的开源深度学习框架可供使用。然而,论文通常非常简明扼要并假设读者已对深度学习有相当的理解,这使得初学者经常卡在一些概念的理解上,读论文似懂非懂、十分吃力。另一方面,即使有了简单易用的深度学习框架,如果对深度学习常见概念和基本思路不了解,面对现实任务时不知道如何设计、诊断、及调试网络,最终仍会束手无策。本系列文章旨在直观系统地梳理深度学习各领域常见概念与基本思想,使读者对深度学习的重要概念与思想有一直观理解,做到“知其然,又知其所以然”,从而降低后续理解论文及实际应用的难度。本系列文章力图用简练的语言加以描述,避免数学公式和繁杂细节。
  • (一)深度学习基础(基本概念、优化算法、初始化、正则化等) [link] (repost on 新智元, 搜狐)
  • (二)计算机视觉四大基本任务(分类、定位、检测、分割) [link] (repost on 新智元, 搜狐)
  • (三)计算机视觉其他应用(网络压缩、视觉问答、可视化、风格迁移等) [link] (repost on 新智元, 人工智能头条, 前言技术研究)
  • 视频理解近期研究进展 [link] (repost on 新智元, 搜狐)

你需要的机器学习数学基础速查手册

  • 线性代数和概率在机器学习领域是非常重要基础知识,机器学习是整个计算机科学领域对数学要求最高的两个方向之一(另一个是理论计算机科学)。其中,线性代数十分抽象,许多人学完之后认为线性代数就是为了解一个线性方程组,实际上,线性代数在计算机科学中有非常重要的用处,可以解决非常多的现实问题。概率中对计数(counting)要求很高,很容易做错。本文提供线性代数和概率的简要回顾。
  • 线性代数 [pdf]
  • 概率论 [pdf]

当你在应用机器学习时你应该想什么 [link] (repost on AI100, 开发者头条, 搜狐)

  • 如今,机器学习变得十分诱人,它已在网页搜索、商品推荐、垃圾邮件检测、语音识别、图像识别、自然语言处理等诸多领域发挥重要作用。和以往我们显式地通过编程告诉计算机如何进行计算不同,机器学习是一种数据驱动方法(data-driven approach)。然而,有时候机器学习像是一种“魔术”,即使是给定相同的数据,一位机器学习领域专家和一位新手训练得到的结果可能相去甚远。本文简要讨论了实际应用机器学习时九个需要注意的重要方面。

深度学习基础及数学原理 [pdf] [slides]

  • 当今,我们正处在信息时代和数字时代,充斥着大量的数字图像,诸多的实际应用场景需要计算机能正确和高效地理解图像,语义鸿沟的存在使图像识别成为一项极具挑战性的任务。使用深度学习中卷积神经网络(CNN)是现在进行图像识别的主流方法。除图像分类外,卷积神经网络还广泛应用于很多领域,如目标识别、图像分割、视频分类、场景分类、人脸识别、深度估计、从图像中生成语言描述等。本文简要介绍了深度学习的基础知识及其背后的数学原理,全文共计54页。

论文格式排版你真的做对了吗? 常用格式及其LaTeX书写方法介绍 [link] (repost on 机器之心, 搜狐, 凤凰网科技, 新浪, 极客头条)

  • 论文格式排版是你的文章留给审稿人的第一印象。一篇排版糟糕的文章很难会使审稿人相信这篇文章提出了卓有建树的思想。当论文提供模板时我们可以按照模板进行排版,而对于模板没有涵盖的地方甚至没有模板可用时,我们有必要了解大家约定俗成的排版格式。本文简要介绍了论文书写过程中常用的格式规范及其LaTeX书写方法。
  • A Simple and Efficient Implementation of im2col in Convolution Neural Networks [pdf]
    • In convolutional nerual networks (CNN), the most time consuming part is the convolution layer. Convolution is usually done by im2col, which convert the 3 D input data tensor and weight tensor into 2 D matrices, then the complicated convolution operation can be done by matrix multiplications. Therefore, the efficiency of im2col operations dettermine the overall speed. In this article, we proposed a simple and efficient implementation of im2col which can take place the Caffe’s implementation. When training MNIST on LeNet, we are 20.6% faster than Caffe’s implementation.

停机问题(C语言版) [link]

  • 用C语言结合理发师悖论通俗地解释什么是停机问题,以及研究停机问题有什么用。

在电子,你会被加哪些技能点--电子学院专业课程分析 [pdf]

  • This is an introduction article written by me in Chinese suiting for freshman in School of Electronic Science and Engineering, Nanjing University. What will you learn in EE? Why you have to take these courses? How is the interest and usefulness theses courses will give you? This article will help you. The whole article is 10 pages long.

如何写好一封英文邮件 [pdf]

  • 当需要和他人通过邮件进行交流时,由于无法根据对方的表情变化或反应适时解释或纠正,有时邮件中无意使用的不恰当词语或细微的语气差异,都可能引起很大的误会,甚至决定了一个人及一个公司/组织机构的形象。因此,本文简要介绍了英文邮件写作时的一些注意事项和常用套路。

MOOC and Online Courses

Hao is determined to become a lifelong learner. The followings are the courses he completed.

Miscellaneous

Reimplementations

Hao has reimplemented several papers, some of which have been open sourced in GitHub.

  • Y. Huang, X. Sun, M. Lu, and M. Xu. Channel-max,channel-drop and stochastic max-pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 9–17, 2015.
  • Y. Kalantidis, C. Mellina, and S. Osindero. Cross-dimensional weighting for aggregated deep convolutional features. In Proceedings of the European Conference of Computer Vision Workshops, pages 685--701, 2016.
  • T.-Y. Lin, A. RoyChowdhury, and S. Maji. Bilinear CNN models for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 1449--1457, 2015.
  • X.-S. Wei, C.-L. Zhang, Y. Li, C.-W. Xie, J. Wu, C. Shen, and Z.-H. Zhou. Deep descriptor transforming for image co-localization. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 3048--3054, 2017.
  • H. Wu and X. Gu. Max-pooling dropout for regularization of convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing, pages 46--54, 2015.
  • J. Xu, C. Shi, C. Qi, C. Wang, and B. Xiao. Part-based weighting aggregation of deep convolutional features for image retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1--1, 2018.

Book List

The followings are the books Hao has read.

  • Sheldon Axler. Linear algebra done right. Springer, 1997.
  • Stephen Boyd and Lieven Vandenberghe. Introduction to applied linear algebra: Vectors, matrices, and least squares. Cambridge University Press, 2018.
  • Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein. Introduction to algorithms (3rd edition). MIT Press, 2009. [Solutions]
  • Allen Downey, Jeffrey Elkner, and Chris Meyers. How to think like a computer scientist: Learning with Python. Green Tea Press, 2002.
  • Allen Downey. Think Python: how to think like a computer scientist. Green Tea Press, 2012. [Online Book]
  • Ian Goodfellow, Aaron Courville, and Yoshua Bengio. Deep Learning: Adaptive computation and machine learning series. MIT Press. 2016.
  • Rafael Gonzalez, Richard Woods, and Steven Eddin. Digital image processing. Pearson, 2007.
  • Rafael Gonzalez, Richard Woods, and Steven Eddin. Digital image processing using MATLAB. Gatesmark Publishing, 2009.
  • David C. Lay. Linear Algebra and Its Applications (Fifth Edition). Pearson, 2014.
  • Eric Lehman, Thomson Leighton, and Albert Meyer. Mathematics for computer science (2010 version). MIT, 2010.
  • Eric Lehman, Thomson Leighton, and Albert Meyer. Mathematics for computer science (2017 version). MIT, 2017.
  • Stanley Lippman. Essential C++. Addison-Wesley Longman Publishing Co., Inc., 1999.
  • Stanley Lippman, Josée Lajoie, and Barbara Moo. C++ primer (5th edition). Addison-Wesley Professional, 2012. [Errata] [Solution Code]
  • Bradley Miller and David Ranum. Problem solving with algorithms and data structures using Python. Franklin, Beedle & Associates Inc., 2006. [Online Book]
  • Williams Shotts. The linux command line: A complete introduction. No Starch Press, 2012.
  • Gilbert Strang. Introduction to linear algebra (Fourth Edition). Wellesley Cambridge Press, 2009.
  • Gilbert Strang. Linear algebra and its applications (Fourth Edition). Academic Press, 2006.
  • 刘金鹏. Linux入门很简单. 清华大学出版社, 2012.
  • 吴军. 数学之美. 第二版. 人民邮电出版社, 2014.
  • 王世江 and 鸟哥. 鸟哥的Linux私房菜:基础学习篇. 第3版. 人民邮电出版社, 2010.
  • 周志华. 机器学习. 清华大学出版社, 2016. [勘误修订]

Notes

Convolutional Neuron Networks and its Applications [pdf]

  • Convolution Neural Network (CNN) is the state-of-the-art approach to object recognition, and it has show greatly advance on the performance of many compute vision tasks. To have a deep understanding of CNN and to inspire ideas for cutting-edge research, I think the most fundamental and effective way is to look at recent CNN publications from top-tier vision conferences and journals. Therefore, I decided to write a note to take down the basic ideas and my understandings of those publications. At present, this note contains around 60 papers from ICCV, ECCV, CVPR, NIPS, ICML, ICLR and so on. The content covers the basic topics in computer vision including image classification, object localization, object detection, object segmentation, image and language, video classification, GAN, etc.

Notes on Machine Learning [pdf]

  • This note was written when I was starting studying machine learning. The first part includes mathematical background such as linear algebra, probability, statistics, information theory, and numerical computation. The second part illustrate the learning theory, foundations of machine learning, and some commonly used learning algorithms. The third part shows some basic idea of deep learning. The whole note is 431 pages long.

Mathematics for Computer Science [pdf]

  • This note explains how to use mathematical models and methods to analyze problems that arise in computer science.

C++: Concepts and Practices [pdf]

  • This note was written when I was studying C++.

Python Reference Note [pdf]

  • This note was written when I was studying Python.

Useful Resources

Machine Learning

Courses

Books

Deep Learning Courses and Books

Math

Linear Algebra

Probability and Statistics

C++

Python

  • Martelli, A., Ravenscroft, A., & Ascher, D. (2005). Python cookbook. O'Reilly Media.
  • Brett Slatkin. Effective Python: 59 Specific Ways to Write Better Python (Effective Software Development Series). Addison-Wesley Professional. 2015.
  • Micha Gorelick & Ian Ozsvald. High Performance Python: Practical Performant Programming for Humans. O'Reilly Media. 2014.

Linux Operating System

Miscellaneous

Hobbies

Reading

Reading a good book is like talking to a wise man, which leads to endless aftertastes. The followings are the ones Hao has read.

大众哲学(艾思奇), 推拿(毕飞宇), 雷雨(曹禺), 白鹿原(陈忠实), 明朝那些事儿(当年明月), 高兴(贾平凹), 47楼207: 北大醉侠的浪漫宣言(孔庆东), 茶馆(老舍), 北平无战事(刘和平), 我不是潘金莲(刘震云), 我叫刘跃进(刘震云), 阿Q正传(鲁迅), 朝花夕拾(鲁迅), 平凡的世界(路遥), 三国演义(罗贯中), 围城(钱钟书), 活着(余华), 傲慢与偏见(奥斯汀), 理智与情感(奥斯汀), 动物庄园(奥威尔), 第二性(波伏娃), 简·爱(勃朗特), 一封陌生女子的来信(茨威格), 我的职业是小说家(村上春树), 包法利夫人(福楼拜), 南方与北方(盖斯凯尔), 霍乱时期的爱情(马尔克斯), 飘(米歇尔), 罗密欧与朱丽叶(莎士比亚), 堂吉诃德(塞万提斯), 小王子(圣埃克苏佩里), 红与黑(司汤达), 安娜·卡列尼娜(托尔斯泰), 悲惨世界(雨果).

Fitness, Boxing, and Nunchaku

Research sometimes can be exhausting, fitness lets one be in a good state.

Photography

A good photograph shows the emotion of the photographer. Photography is a kind of artistic creation --- painting is doing addition, while photography is doing subtraction.

BBC Documentary

Astonishing discoveries of wild life of animals.

生命故事(Life Story), 荒野间谍(Spy in the Wild), 猫的秘密生活(The Secret Life of the Cat).

Competitive Games

Including traditional ones like chess, Chinese chess, and go, and electronic sports like Warcraft and Starcraft.

Correspondence

Mail

National Key Laboratory for Novel Software Technology

Nanjing University, Xianlin Campus

163 Xianlin Avenue, Qixia District

Nanjing, Jiangsu Province

210023, China

Laboratory

328, Computer Science and Technology Building, Xianlin Campus, Nanjing University

Homepage

http://lamda.nju.edu.cn/zhangh/

https://sites.google.com/view/haozhang/

(These two pages are identical)