Deep Learning Implementations and Frameworks (DLIF)
Tutorial in Thirty-first Conference on Artificial Intelligence (AAAI-17)
February 5 (9:00 AM – 1:00 PM), 2017, San Francisco, California, USA
- Part1: Basics of neural networks
- (Feb. 3, 2017 Updated) Part2: Common design of neural network implementations
- (Mar. 30, 2017 Updated1) Part3: Differences of deep learning frameworks
1 Fixed the backprop design choice of MXNet (Mar. 30, 2017)
Coding Examples (Feb.3, 2017 New)
See AAAI-17 website for the detail of the conference.
About this tutorial
This tutorial explains general knowledge on design principles for deep learning frameworks, with a goal of providing a guideline to choose a suitable framework for researchers and practitioners of AI who want to utilize deep learning for their own tasks.
Today, software frameworks for deep learning, such as TensorFlow and Caffe, are widely employed in many deep learning systems to accelerate the speed of research and development. Deep learning plays a fundamental role in core technologies of AI including image/audio recognition, planning, and natural language processing and is utilized as building blocks for developing AI systems such as robots, games, question answering, and medical diagnosis. Frameworks hide low-level implementation details and provide a systematic way to implementation.
At a higher level, the design of deep learning models is, in essence, to combine components and heuristics. Technical elements for deep learning are often common and reusable. For example, a typical deep learning architecture for image recognition is a layered stack of convolutional and pooling operations. Many hacks including dropout and batch normalization are commonly employed to enhance generalization performance. A great deal of the implementation can be reduced to finding good combinations of components like convolution and pooling and heuristics like dropout and batch normalization. This is why we use deep learning frameworks for efficient coding.
Choosing an appropriate deep learning framework requires proper knowledge on the fundamental design principles of frameworks. The variety of many available frameworks confuses users to select the most suitable one. In addition to reusability, competing demands in speed, scalability, code simplicity, easy debugging, and community size pose further difficulty. Choosing a suboptimal framework may result in degraded efficiency in research and development, damage the utility of work, and lead to diminished popularity.
Given the recent development on deep learning outside simple pattern recognition tasks, this tutorial will provide useful technical information to general AI applications.
This tutorial is for researchers and practitioners who want to utilize deep learning to develop AI systems or systems making use of AI for their own tasks. It is tailored to help them to choose software frameworks suitable for their applications from various candidates.
This tutorial offers basic knowledge about design principles and tradeoffs among the frameworks to provide guidelines for selecting an appropriate one. The audience will learn from this tutorial why some frameworks are faster than others, some are difficult to debug, and some are inefficient for dynamic model changes. Coding examples for TensorFlow, Keras, and Chainer are provided not to just show the usage but to give deep understanding of their internal mechanisms. Our point of view is general and the understanding of the discussed topics will also be beneficial to evaluate currently non-existing frameworks released in future.
Seiya Tokui is a researcher at Preferred Networks, Inc. and a Ph.D. student at the University of Tokyo, from which he received the master's degree in mathematical informatics in 2012. He is the lead developer of Chainer, a deep learning framework. His research interests include deep learning and generative models. [Webpage]
Kenta Oono is an engineer at Preferred Networks, Inc., Japan. He received his master's degree in mathematics at the University of Tokyo in 2011. He is a core developer of a deep learning framework, Chainer. His research interest is on deep learning, bioinformatics, and theoretical analysis of machine learning. [GitHub]
Atsunori Kanemura is a Research Scientist at Mathematical Neuroinformatics Group and Machine Learning Team, National Institute of Advanced Industrial Science and Technology (AIST), Japan. He obtained the Ph.D. degree from Kyoto University in 2009. His research interests include machine learning, statistical signal processing, and analysis of human data. [Webpage]
- The 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (Auckland, New Zealand, April 19-22, 2016), [Webpage]
- oono at preferred dot jp