Speaker: Uri Weiser (Technion)
Abstract: For the last 40 years Process Technology and Computer Architecture have been orchestrating a magnificent growth in computing performance. It seems that we have reached a major turning point; Moore’s law is reaching its end and Dennard scaling has already ended, while performance requirements continue to soar for countless new AI applications. In this short talk I’ll try to cover the tip of the new AI era’s chip implications.
Bio: Uri Weiser is a Professor emeritus at the Technion’s Electrical Engineering department and is in the advisory board of numerous startups. Professor Weiser worked at Intel 1988-2007. At Intel, Weiser initiated the definition of the first Pentium® processor, drove the definition of Intel's MMX™ technology, invented the Trace Cache, co-managed the forming of Intel Microprocessor Design Center at Austin, Texas and formed an Advanced Media applications research activity.
Prior to his career at Intel, Professor Weiser worked at the Israeli Department of Defense (RAFAEL) as a research and system engineer and later at National Semiconductor Design Center in Israel, where he led the design of the NS32532 microprocessor. Professor Weiser was appointed an Intel Fellow in 1996, in 2002 he became an IEEE Fellow and in 2005 an ACM Fellow. Professor Weiser got the Eckert Mauchly award in 2016. He published more than 50 papers and 15 patents.
Speaker: Adi Fuchs (Speedata)
Abstract: Artificial Intelligence has sparked humanity's imagination in the past decades as a dominating motif in science fiction novels, films, position papers, and futuristic visions. However, it is only in recent years that we began to see a rise in successful artificial intelligence applications due to the ubiquity of big data and the recent advancements in accelerated hardware.
This talk will focus on the pivotal role that accelerated hardware played in the AI renaissance, the current state of the AI accelerators landscape, and the need for an architectural paradigm shift to tackle the challenges of future AI.
Bio: Adi is a core architect and project lead at Speedata. He earned his Ph.D. in Electrical Engineering from Princeton University, and honorary M.Sc. and B.Sc. degrees in Electrical Engineering from Technion. Adi led multiple award-winning research projects that were published in top-tier architecture venues and in the global media. He has more than ten years of hardware/software engineering experience at startups (SambaNova, Mellanox, Concertio, and Dune) and corporations (Apple and Philips) and is a co-inventor of multiple US patents.
Speaker: Michael Behar (Nvidia)
Abstract: The development of Large Language Models (LLMs) has led to a breakthrough in Natural Language Processing (NLP), resulting in the emergence of popular products such as chatGPT. However, as the name implies, the sheer size of these models, such as GPT3 with 175B parameters, exceeds the memory capacity of a single GPU. Additionally, latency constraints during inference require parallelization and restrict batch size. Furthermore, the Attention mechanism on which these models are based is not efficient in scaling with sequence length, causing additional complexities for large sequence inference. This presentation aims to examine the fundamental challenges associated with LLM inference and investigate possible solutions in novel hardware and software.
Bio: Michael Behar is an hardware Deep Learning architect at Nvidia with expertise in end-to-end inference, specializing in GPU and datacenters. He leads a team that is responsible for developing cutting-edge hardware and software solutions in this field. Michael has a MSc in Electrical Engineering degree from the Technion-Israel Institute of Technology. Michael has over 14 years of experience as an architect at Intel working on CPU, GPU, Computer Vision, and AI accelerators, and 3 years of experience at Nvidia as a deep learning end-to-end inference architect.
Challenges in Acceleration of Transformers on Dataflow Architecture
Speaker: Mark Grobman (Hailo)
Abstract: The past few years have seen the rapid adoption of the Transformer architecture for computer vision tasks. At the heart of the Transformer architecture is the Multi-Head Attention (MHA) operation. The very nature of what makes this operator attractive – it is global and dynamic – makes it challenging for efficient acceleration by dataflow architectures. Furthermore, MHA suffers from numeric issues that make it harder to quantize when compared to CNNs. The heart of this talk we will concern these challenges and discuss solutions. Briefly, through that prism we will glimpse at two interesting challenges faced by the industry: The HW architect’s balancing act between efficient acceleration of CNNs (local) and Transformers (Global), and the product designer struggle to navigate the increasingly diverging landscape of NN architectures and HW accelerators.
Bio: Mark is the ML CTO at Hailo and has been with Hailo since its founding in 2017. Mark holds an MS.c in Neuroscience from Bar-Ilan University and a double BS.c degree in Electrical Engineering and in Physics ('Psagot' program) from the Technion.
Recording
Speaker: Ayelet Hen (Samsung Israel R&D Center)
Abstract: Deep learning technology for edge is developing quite rapidly in the last few years. Training frameworks, Quantization tools & Schemes, Network Architecture Solutions, as well as new operators are introduced and changed on a yearly basis. An automated exploration and productization methodology would be presented. This methodology enables keeping the technology up-to-date to market trends, while developing few IPs in a year and meeting project deadlines without compromising on quality.
Bio: Ayelet is a Group Manager for Deep Learning Hardware in Samsung Israel R&D Center. The group develops DL inference HW Accelerator IPs with optimized PPA (Power, Performance and Area) for pre-defined applications for the areas of Computer Vision & Sensor. Previously, Ayelet took part in development of multiple chips as a chip design engineer in Mellanox (acquired by Nvidia). Ayelet holds an MBA from Tel Aviv University and B.Sc in Electrical Engineering From Ben-Gurion University.
Speaker: Udi Kra (Bar Ilan University)
Abstract: Massively parallel arithmetic logic, which dominates on-chip AI computation, is commonly accelerated by sequential pipelining. However, pipeline sampling stages result in significant area and power overheads due to the added registers and clock network. WavePro is an innovative scalable method offering an alternative to the traditional clocked pipeline sequential logic, reaching high throughput without the need for intermediate sampling registers and thereby significantly reducing area and power. The WavePro technology is compliant with commercial design flows along with a post-silicon process variation tolerance mechanism. The technology is Silicon-Proven upon two recent test chips, demonstrating an order of 50% saving in energy and 12% in area savings for common AI compute logic.
Bio: Udi Kra holds a BSc. degree from the Technion and an MSc degree from Bar-Ilan University. With 40 years of engineering and entrepreneurship experience in the chip-design industry, he is currently the chief architect of the SoC Lab within the EnICS Labs research center at Bar-Ilan University. In his current position, he led the development of HAMSA-DI, the flagship RISC-V core of the Israeli GenPro Consortium. Since 2021, he has been pursuing his PhD under the supervision of Prof. Adam Teman, focusing on Edge-AI System-on-Chip acceleration.
Speaker: Elad Hoffer (Habana Labs)
Abstract: In recent years, there has been a growing interest in training deep learning models using low-precision arithmetic, which can significantly reduce the computational cost and energy consumption of training. Moving from full-precision (FP32), many current deep learning models are trained using 16-bit floating-point arithmetic (FP16/BF16). Recently, advancements in AI-targeted hardware allowed the transition towards even lower precision formats such as 8-bit floating-point (FP8). In this talk, we will discuss our experience training deep learning models in low precision, focusing on the use of FP8 and below. We will cover the challenges and benefits of using this low-precision format, and compare it to common practices. We will present our results and insights on how to optimize the training process for low precision, to achieve faster and more energy-efficient training of deep learning models.
Bio: Elad Hoffer is a principal engineer at Habana Labs' (Intel) research team, specializing in efficient training of deep learning models. Elad earned his Ph.D. in the department of Electrical Engineering at the Technion, Israel Institute of Technology.
Speaker: Niv Nayman (Amazon and Technion)
Abstract: Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, hence, the resulting accuracy of the generated models is often harmed. In this work we resolve this by introducing Bilinear Interpretable Neural Architecture Search (BINAS), that is based on an accurate and simple bilinear formulation of both an accuracy estimator and the expected resource requirement, together with a scalable search method with theoretical guarantees. The simplicity of our proposed estimator together with the intuitive way it is constructed bring interpretability through many insights about the contribution of different design choices. For example, we find that in the examined search space, adding depth and width is more effective at deeper stages of the network and at the beginning of each resolution stage. Our experiments show that BINAS generates comparable to or better architectures than other state-of-the-art NAS methods within a reduced marginal search cost, while strictly satisfying the resource constraints
Bio: Niv Nayman is an applied scientist in Amazon Web Services and a Phd candidate at the Technion, working primarily on AutoML research and applications in computer vision and document analysis. His work has been integrated in products at scale and published at top venues (NeurIPS, ICLR, ICML, etc.). Before joining Amazon, Niv served as a senior research scientist at Alibaba DAMO Academy after completing a long service as an officer in the intelligence technological unit - 81, where he performed a variety of roles through the years, from hardware design, to algorithmic research and cyber security. Niv holds BSc degrees both in Electrical Engineering and in Physics (Cum Laude, 'Psagot' program) and a MSc in Optimization and Machine Learning, all from the Technion.
Speaker: Shady Abu-Hussein (Tel Aviv University)
Abstract: In recent years, Denoising Diffusion Probabilistic Models (DDPM) have caught significant attention. By composing a Markovian process that starts in the data domain and then gradually adds noise until reaching pure white noise, they achieve superior performance in learning data distributions. Yet, these models require a large number of diffusion steps to produce aesthetically pleasing samples, which is inefficient. In addition, unlike common generative adversarial networks, the latent space of diffusion models is not interpretable. In this talk, we will present a novel diffusion model process that can generate images in only a few steps, while also providing a latent space that can be interpolated.
Bio: Shady is a Ph.D. student at Tel Aviv University, where his primary research focuses on solving imaging inverse problems using deep learning methods. He received his BSc in electrical and computer engineering from Ben-Gurion University of the Negev. Throughout his Ph.D., Shady has explored multiple directions to overcome the challenges faced in solving ill-posed inverse problems. His research has included leveraging the latent prior captured by deep generative models, utilizing the implicit prior defined through deep neural network architecture, and adapting off-the-shelf deep solvers using well-known tools from signal processing theory.
Speaker: Nimrod Kruger (Rafael)
Abstract: While the "machine learning" revolution is getting most of the attention, a quieter revolution is taking place in sensing technologies. New electro-optical frame-based sensors with better performance, new features and lower prices are emerging daily. Moreover, a technology coined Neuromorphic Sensor (a.k.a. Event Camera) offers a different paradigm to frame-based imaging. Like neuromorphic computing, neuromorphic sensing leans heavily on biology as inspiration, forcing engineers and scientists that wish to use it to constantly ask "how does biology do it?". We will discuss the thought process and building blocks required, when designing neuromorphic sensing technology, and consider the implications for defence.
Bio: Nimrod spent his academic years studying bio-medical engineering, solar energy and electro-optics, conducting his research under the supervision of Prof. Carmel Rotschild. For the last 4.5 years he works in Rafael, developing active electro-optic components, and researching novel sensing technologies for the challenging problems in the defence domain.
Speaker: Elishai Ezra Tsur
Abstract: There is an apparent discrepancy between visual perception, which is colorful, complete, and in high resolution, and the saccadic, and spatially heterogeneous retinal input data. In this talk, I will present our latest work where we emulated foveated color maps and intensity channels as well as intra-saccadic motion data using a neuromorphic event-based camera. We used a convolutional neural network, U-Net, and adversarial optimization to demonstrate how retinal inputs can be used for the reconstruction of colorful images in high resolution. I will further show how cognitive models can be used for colorful image reconstruction from sparse event-based data.
Bio: Elishai is the principal investigator of the Neuro & Biomorphic Engineering Lab (NBEL-lab.com) and an Assistant Professor at the Open University of Israel. In his research, Elishai studied the realm of brain-inspired machines. He utilizes artificial brains to develop new frameworks for robotics and vision processing. Elishai Holds degrees in Life sciences (B.Sc), Philosophy and History (B.A), Computer Science (M.Sc), Neuropsychology (M.A), Bioengineering (M.Sc, PhD) and Computational Neuroscience (Post Doc).
Speaker: Loai Danial (Mobileye)
Abstract: Data converters are ubiquitous in integrated mixed-signal systems, becoming the computational bottleneck in traditional data acquisition systems, and emerging brain-inspired neuromorphic systems which aim to embed low-power artificial intelligence algorithms close to the sensor using analog in-memory computing. Unfortunately, conventional Nyquist data converters trade off speed, power, and accuracy. Therefore, they are exhaustively customized for special purpose applications. Furthermore, intrinsic real-time and post-silicon variations dramatically degrade their performance along with the CMOS technology downscaling.
This talk will present novel neuromorphic analog-to-digital (ADC) and digital-to-analog (DAC) converters that are trained using the stochastic gradient descent algorithm to autonomously adapt to varying design specifications, including multiple full-scale voltages, number of resolution bits, and input frequencies. The feasibility of the neuromorphic paradigm is experimentally demonstrated using commercial integrated memory arrays, comprising two-terminal floating-gate memristive devices fabricated in the standard 180nm CMOS process and operate at the sub-threshold mode. This memristive array efficiently implements multiply, accumulate, and update operations essential for trainable artificial synapses in neuromorphic systems, at the analog level. Theoretical analysis, as well as simulation results, show the collective resilient properties of the proposed converters in application reconfiguration, mismatches calibration, noise tolerance, and power optimization. We believe that these findings will spur the development of very large-scale integrated memristive neuromorphic systems.
Bio: I received the B.Sc. degree in electrical engineering from the Technion–Israel Institute of Technology in 2014. From 2013 to 2016 I was with IBM labs as a hardware research student. From 2016 to 2021 I pursued the Ph.D. degree with the Andrew and Erna Viterbi Faculty of Electrical Engineering, Technion–Israel Institute of Technology. My research combines interdisciplinary interests of data converters, machine learning, and neuromorphic mixed-signal systems using emerging memory devices. I received the 2017 Hershel Rich Technion Innovation Award, the Israeli Planning and Budgeting Committee Fellowship, the Andrew and Erna Finci Viterbi Graduate Fellowship’s award, Jacobs award for the best engineering paper at Technion in 2020 which was published in Nature electronics, and the best paper award at a Nature Conference on Neuromorphic Computing. In 2021, I joined Intel emerging growth incubation, recently acquired by Mobileye corporation, as an analog engineer and researcher working on next generation high-performance analog-to-digital converter for imaging radar.