IJCAI 2020 tutorial

Scalable Deep Learning: How far is one billion neurons?

Organizers

Decebal Constantin Mocanu (University of Twente, Eindhoven University of Technology)
Elena Mocanu (University of Twente)
Tiago Pinto (Polytechnic Institute of Porto)
Zita Vale (Polytechnic Institute of Porto)

Abstract

A fundamental task for artificial intelligence is learning. Deep Neural Networks have proven to cope perfectly with all learning paradigms, i.e. supervised, unsupervised, and reinforcement learning. Nevertheless, traditional deep learning approaches make use of cloud computing facilities and do not scale well to autonomous agents with low computational resources. Even in the cloud they suffer from computational and memory limitations and cannot be used to model properly large physical worlds for agents which assume networks with billion of neurons. These issues are addressed in the last few years by the emerging topic of scalable deep learning which makes use of static and adaptive sparse connectivity in neural networks before and throughout training. The tutorial covers these research directions focusing on theoretical advancements, practical applications, and hands-on experience, in two parts.

Detailed Program, Location, and Slides

Location: IJCAI 2020

Date: 7 January 2021, 18:40-22:00 (GMT+9:00, Osaka, Sapporo, Tokyo)

Slides:

Talk 1 [pdf]
Talk 2
Talk 3 [pdf]
Talk 4 [pdf]

Code: https://github.com/SelimaC/Tutorial-SCADS-Summer-School-2020-Scalable-Deep-Learning

Program. Outline

Description

Part I - Scalable Deep Learning: from pruning to evolution.

The first part of the tutorial focuses on theory. We first briefly discuss the basic science paradigms in the context of complex networks and systems and we revise how many agents make use of deep neural networks nowadays. We then introduce the basic concepts of neural networks and we draw a parallel between artificial and biological neural networks from a functional and topological perspective. We continue by introducing the first papers on efficient neural networks coming from early '90s which make use either of sparsity enforcing penalties or weights pruning of fully connected networks based on various saliency criterion. Afterwards, we review some of the recent works which start from fully connected networks and make use of prune-retrain cycles to compress deep neural networks and to make them more efficient in the inference phase. We then discuss an alternative approach, i.e. NeuroEvolution of Augmenting Topologies (NEAT) and its follow-ups, to grow efficient deep neural networks using evolutionary computing.

Scalable Deep Learning. Further on, we introduce the topic of Scalable Deep Learning (SDL) which builds on efficient deep learning and put together all of the above. Herein, we discuss intrinsically sparse Deep Neural Networks (DNNs) with static and adaptive sparse connectivity. We first illustrate this approach using the new proposed Sparse Evolutionary Training (SET) algorithm. SET-DNNs start from random sparse networks and use an evolutionary process to adapt their sparse connectivity to the data while learning. SET-DNNs offer benefits in both phases, training and inference, having quadratically lower memory-footprints and much faster running time than their fully connected counterparts. After that, we present state-of-the-art alternatives to SET, such as DeepR, Dynamic Sparse Reparameterization, Sparse Networks from Scratch, The Rigged Lottery, and Dynamic Sparse Training. We end the first part by discussing why sparse neural networks with adaptive sparse connectivity have the potential of pushing deep learning boundaries well beyond its current capabilities, highlighting at the same time the main challenges (e.g. theory, hardware, software) which still have to be solved to reach, e.g., DNNs with billion of neurons.

Part II - Scalable Deep Learning: deep reinforcement learning

Up to now, everything is discussed in the context of supervised and unsupervised learning. Further on, we introduce deep reinforcement learning and we pave the ground for scalable deep reinforcement learning. We describe some very recent progresses in the field of deep reinforcement learning that could be used to foster the performances of reinforcement learning agents when confronted with delayed reinforcement signals and environments that can exhibit sudden changes in their dynamics, as it is often the case with energy systems.

Applications - SDL agents in smart grids. The last part of the tutorial focuses on practical applications. Distributed generation, demand response, distributed storage, and electric vehicles are bringing new challenges to the power and energy sector. The tutorial addresses the current and envisioned solutions for the management of these distributed energy resources in the context of smart grids. Artificial intelligence based approaches bring important new possibilities enabling efficient individual and aggregated energy management. Such approaches can provide different players aiming to accomplish individual and common goals in the frame of a market-driven environment with advanced decision-support and automated solutions. Through the end, we will argue that the reinforcement learning paradigm can be very powerful to solve many decision-making problems in the energy sector, as for example investment problems, the design of bidding strategies for playing with the intraday electricity market or problems related to the control of microgrids. The last part defines the resource allocation problems as a sequential stochastic decision-making process, that considers scalable and efficient deep reinforcement learning agents. We investigate how multiple learning agents interact and influence each other in the smart grid context, what kind of global system dynamics arise, and how desired electrical behaviour can be obtained by modifying the learning algorithms used. The settings considered, range from one-to-one interactions (e.g. games) to small groups (e.g. multi-agent coordination) and large communities (e.g. interactions in social networks).

After the tutorial, the participants shall have: a basic understanding of scalable deep neural networks for intelligent agents learning, of agents in the smart grid context; basic hands-on experience to use these concepts in various practical applications; and some good thoughts for future research directions.

Selected References

Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H. Nguyen, Madeleine Gibescu, and Antonio Liotta, 2017. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature Communications 2018, https://www.nature.com/articles/s41467-018-04316-3; Code: https://github.com/dcmocanu/sparse-evolutionary-artificial-neural-networks
Decebal Constantin Mocanu and Elena Mocanu. 2018. One-Shot Learning Using Mixture of Variational Autoencoders: A Generalization Learning Approach. In Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS ’18), https://arxiv.org/abs/1804.07645
Elena Mocanu, Decebal Constantin Mocanu, Phuong. H. Nguyen, Antonio Liotta, Michael E. Webber, Madeleine Gibescu, and J. G. Slootweg. 2018. On-line Building Energy Optimization using Deep Reinforcement Learning. IEEE Transactions on Smart Grid 2018, https://ieeexplore.ieee.org/document/8356086
Elena Mocanu, Phuong H Nguyen, Madeleine Gibescu and Wil L Kling, 2016. Deep learning for estimating building energy consumption, Sustainable Energy, Grids and Networks 2016
Anil Yaman, Giovanni Iacca, Decebal Constantin Mocanu, George Fletcher and Mykola Pechenizkiy, 2019. Learning with Delayed Synaptic Plasticity, The Genetic and Evolutionary Computation Conference (GECCO 2019), Prague, Czech Republic. https://arxiv.org/abs/1903.09393 , Video
Shiwei Liu, Decebal Constantin Mocanu, Amarsagar Reddy Ramapuram Matavalam, Yulong Pei, Mykola Pechenizkiy, 2019, Sparse evolutionary Deep Learning with over one million artificial neurons on commodity hardware, https://arxiv.org/abs/1901.09181 ; Code
Decebal Constantin Mocanu, Elena Mocanu, Phuong Nguyen, Madeleine Gibescu, Antonio Liotta, 2016. A topological insight into restricted Boltzmann machines, Machine Learning (ECMLPKDD 2016), https://link.springer.com/article/10.1007%2Fs10994-016-5570-z
Decebal Constantin Mocanu, 2016, On the synergy of network science and artificial intelligence, In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2016)
Decebal Constantin Mocanu, Maria Torres Vega, Eric Eaton, Peter Stone, Antonio Liotta, 2016, Online contrastive divergence with generative replay: Experience replay without storing data, https://arxiv.org/abs/1610.05555.
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proceedings of IEEE 2017.
Yan Le Cun, John S. Denker, Sara A. Solla, 1990. Optimal brain damage. NIPS 1990.
Song Han, Jeff Pool, John Tran, William J. Dally, 2015. Learning both Weights and Connections for Efficient, NIPS 2015.
Hangyu Zhu and Yaochu Jin, 2019, Multi-objective Evolutionary Federated Learning, 2018, IEEE Transactions on Neural Networks and Learning Systems 2019.
Hesham Mostafa, Xin Wang, 2018, Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization, ICML 2019.
Tim Dettmers, Luke Zettlemoyer, 2019, Sparse Networks from Scratch: Faster Training without Losing Performance, https://arxiv.org/abs/1907.04840
Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen, 2019, Rigging the Lottery: Making All Tickets Winners, https://arxiv.org/abs/1911.11134
Tiago Pinto and Zita Vale, 2019, AiD-EM: adaptive decision support for electricity markets negotiations, In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2019)

Organizers Short Bios

Decebal Constantin Mocanu is an Assistant Professor in Artificial Intelligence and Machine Learning within the DMB group, Faculty of Electrical Engineering, Mathematics, and Computer Science at University of Twente. From September 2017 until February 2020, he was an Assistant Professor in Machine Learning at Eindhoven University of Technology (TU/e), and a member of TU/e Young Academy of Engineering. His short-term research interest is to conceive scalable deep artificial neural network models and their corresponding learning algorithms using principles from network science, evolutionary computing, optimization and neuroscience. Such models shall have sparse and evolutionary connectivity, make use of previous knowledge, and have strong generalization capabilities to be able to learn, and to reason, using few examples in a continuous and adaptive manner. Most science carried out throughout human evolution uses the traditional reductionism paradigm, which even if it is very successful, still has some limitations. Aristotle wrote in Metaphysics ``The whole is more than the sum of its parts''. Inspired by this quote, in long term, Decebal would like to follow the alternative complex systems paradigm and study the synergy between artificial intelligence, neuroscience, and network science for the benefits of science and society. In 2017, Decebal received his PhD in Artificial Intelligence and Network Science from TU/e. During his doctoral studies, Decebal undertook three research visits: the University of Pennsylvania (2014), Julius Maximilian University of Wurzburg (2015), and University of Texas, Austin (2016). Prior to this, in 2013, he obtained his MSc in Artificial Intelligence from Maastricht University. During his master studies, Decebal also worked as a part time software developer at We Focus BV in Maastricht. In the last year of his master studies, he also worked as an intern at Philips Research in Eindhoven, where he prepared his internship and master thesis projects. Decebal obtained his Licensed Engineer degree from University Politehnica of Bucharest. While in Bucharest, between 2001 and 2010, Decebal started MDC Artdesign SRL (a software house specialized in web development), worked as a computer laboratory assistant at the University Nicolae Titulescu, and as a software engineer at Namedrive LLC.
Elena Mocanu is an Assistant Professor in Machine Learning and Autonomous Agents at University of Twente, Netherlands. She received her B.Sc. degree in Mathematics and Physics from Transilvania University of Brasov, Romania, in 2004. After four years as mathematics and physics teacher at high-school level, Elena moved to the university. She has been an Assistant Lecturer within the Department of Information Technology, University of Bucharest, Romania from September 2008 to January 2011. In parallel, in 2009 she started a master program in Theoretical Physics. In 2011 she obtained the M.Sc. degree with specialization in Quantum Transport from University of Bucharest, Romania. In January 2011, Elena moved from Romania to Netherlands. She obtained the M.Sc. degree in Operations Research from Maastricht University, The Netherlands, in 2013. In the last year of her master program, she did a %six months Internship at Mastricht University on bioinformatics data analytics research and a six months graduation project at NXP Semiconductors, Eindhoven. In her master thesis she has investigated deep learning methods for "People detection for building automation". In October 2013, Elena started her PhD research in Machine Learning and Smart Grids at TU/e. In January 2015 she performed a short research visit at the Technical University of Denmark and, from January to April 2016 she was a visiting researcher at University of Texas at Austin, USA. In 2017, Elena received her Doctor of Philosophy degree in Machine learning and Smart Grids from TU/e.
Tiago Pinto works in the area of Artificial Intelligence (AI), with special interest in the fields of adaptive machine learning and automated negotiation. He has been working in the application of AI techniques to the study of electricity markets since 2008, specifically in the decision support of negotiating agents. Tiago Pinto has been involved in the organization of multiple International congresses and conferences, namely: 18th EPIA Conference on Artificial Intelligence, Porto, Portugal, September, 2017; PAAMS 2017 (15th International Conference on Practical Applications of Agents and Multi-Agent Systems), Porto, Portugal, June 2017; PAACB 2017 (11th International Conference on Practical Applications of Computational Biology \& Bioinformatics), Porto, Portugal, June 2017; GIIS 2016 (Global Information Infrastructure and Networking Symposium), October 2016, Porto, Portugal; 27th International Conference on Database and Expert Systems Applications (DEXA 2016), September 5-8, 2016, Porto, Portugal; SUComS 2015 (6th International Conference on Security-enriched Urban Computing and Smart Grids), June 21-23, 2015, Porto, Portugal; IEA/AIE 2011 (Twenty-fourth International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems), 29 June – 1 July, 2011, Syracuse, U.S.A; Summer School on Neural Networks in Classification, Regression and Data Mining, July 12-16, 2010, Porto, Portugal; ISCIES 2009. He has been also involved in the organization of several workshops and special sessions.
Zita Vale is full professor at the Polytechnic Institute of Porto and the director of the Research Group on Intelligent Engineering and Computing for Advanced Innovation and Development (GECAD). She received her diploma in Electrical Engineering in 1986 and her PhD in 1993, both from University of Porto. Zita Vale works in the area of Power and Energy Systems, with special interest in the application of Artificial Intelligence techniques. She has been involved in more than 50 funded projects related to the development and use of Knowledge-Based systems, Multi-Agent systems, Genetic Algorithms, Neural networks, Particle Swarm Intelligence, Constraint Logic Programming and Data Mining. Energy resources management, distributed generation, demand response and electric vehicles are important topics of her research in the current projects. The main application fields of these projects comprise: (1) Smart Grids, accommodating an intensive use of Renewable Energy Sources, Distributed Energy Resources (DER) and Distributed Generation (DG). She addresses the management of energy resources, the impact of DER on electrical networks, the negotiation of DER in electricity markets, demand response, storage, energy management in buildings, and electrical vehicles, including the ones with gridable capability (V2G); (2) Electricity markets, addressing contracts, prices and tariffs, decision-support for market participants, aggregation, ancillary services, and wholesale and local market simulation; and (3) Control Center applications, namely intelligent alarm processing, intelligent interfaces and intelligent tutors. Zita published over 800 works, including more than 100 papers in international scientific journals, and more than 500 papers in international scientific conferences. She has surprised 17 PhD concluded thesis, and is currently supervising 8 PhD students. Se has also supervised 45 MSc concluded theses, and is currently surpervising 10 MSc thesis.