CAREER

I have 25 years of industrial R&D and technology leadership experience, have worked across three geographies (US East Coast, Europe and US West Coast), and contributed to company-wide strategic directions. I have been a people manager for over 15 years, leading small and focused teams. I have experience in setting strategic directions and starting new programs, including M&A, and interacting at the CTO and CEO level of a $25B global company.

I made significant technical contributions in four major areas: (1) instruction-level parallel architectures (specifically, VLIW processors and IP cores) for embedded system-on-chip (SoC) processor cores; (2) novel methodology and tools for simulation and modeling of large-scale computing systems and (3) SoC-based system architectures for low-power hyperscale servers; (4) high performance technologies for exascale computing.

I am a technologist at heart, and even while progressing in my executive career, I have constantly kept in touch with leading edge technology. I am an inventor in 55 granted US patents and over 70 published pending patent applications, co-author of the book “Embedded Computing: A VLIW approach to architecture, compiler and tools” which has been adopted by several Master and PhD courses around the world. I co-authored over 100 scientific publications in IEEE, ACM, and other peer-reviewed journals and conferences. In 2019 I was the program co-Chair of the Rebooting Computing conference, looking at post-Moore technologies, and was the chairman of the IEEE/ACM Eckert-Mauchly Award committee in computer architecture. In 2012 I was the Guest Editor (and co-chair of the selection committee) for the prestigious IEEE Micro issue on Top Picks from the 2011 Computer Architecture Conferences. According to Google Scholar (http://bit.ly/gscholar-faraboschi), my work received over 4,300 citations, and 79 of my publications have at least 10 citations (i10-index), and an h-index is 34.

Among the top-referenced publications, the ISCA 2000 paper “Lx: A Technology Platform for Customizable VLIW Embedded Processing was cited 459 times and pioneered the area of customized embedded processors. The 2004 book “Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools” was cited 445 times and has been adopted as teaching course book by several graduate programs around the world, among which CS514/ECE514 at Rice University (US), ECE751 at University of Wisconsin (US), COMP5228 at Hong Kong Polytechnic University (HK), EPD (embedded processor design) at NTU - National Taiwan University (TW), COEN259 at University of Santa Clara, CP631 at UA Huntsville, CSE 524 at VIT Vellore (India), and many more.

The intersection of computing architectures, tools and applications has been the driving theme throughout my research career, with a special emphasis on specialization and customization, starting from my PhD thesis (1989-1992) that explored specialized processors for logic programming (Prolog).

CUSTOMIZED VLIW EMBEDDED PROCESSORS

From 1994 to 2003, I was the technical lead of the “Custom-Fit Processors” Project at HP Labs Cambridge and the principal architect of the instruction set architecture (ISA) of the Lx/ST200 family of VLIW embedded processor cores. Led the design of the prototype compiler and simulator, worked with the microarchitecture team, the application teams, and was integral part of the technology transfer. Also developed the IP strategy to partner with STMicroelectronics to mature the IP so that it could be licensed back to HP and embedded in a variety of consumer products (printers and scanners). The ST200 family of cores has been used for almost two decades in a several audio, video and imaging consumer products, TV set-top boxes, and HP's printers and scanners. To help understand the penetration of the technology that Paolo pioneered, from 2005 to 2009, STMicroelectronics shipped over 40 million systems-on-chip for digital video with ST231 VLIW processor cores.

The "Custom-Fit Processors" (CFP) project at HPL Cambridge (1994-2003), pioneered the area of customized embedded processor cores, specifically looking at VLIW architectures. Processors born from that effort, where I had a key role for almost a decade, have been widely used (in the order of a hundred millions parts) in the consumer industry. for digital video, audio, printing and imaging devices. See the EETimes' article "HP helps put VLIW into STMicro's system-chip" for an overview. The ST200/Lx family of embedded VLIW microprocessor developed in partnership by STMicroelectronics and HPL reached its fourth generation (ST210, ST220, ST230, and ST231) of processor cores in 2005. These cores are deeply embedded (often in multiple instances) in many consumer electronics systems-on-chip (SoCs). For example, there are three of these processors in each STM8000 (see stm8000.pdf) DVD recorder chip as well as in digital video recorder chips (stx5524, stx5525, stm8010), set top box decoders (sti5300, sti5301), and in a series of HD H264 set top box chips (stb7100, stb7109). In December 2006, SAGEM communication announced to be the first manufacturer to reach 1 million MPEG-4 single-chip Set-Top-Boxes based on an ST231-powered chip. The latest generations of the ST200 are fully capable of running a standalone operating system, such as Linux (the STLinux distribution). Launched in 2008, the STi7105 contains two ST231's and and addresses the low-cost, high-performance market of HD set-top boxes.

NOVEL SIMULATION AND MODELING TOOLS

Around 2005, the rapid rise of multicore processors, the emergence of new complex workloads, the relentless pressure reduce cost (and energy) created a disruption in the tools and methodology required to analyze modern computing system that required a different approach to modeling and simulation. The COTSon project developed a system-level simulation tool where we abandoned the notion of always-on cycle-detailed simulation, in favor of statistical sampling approaches. In COTSon we could trade off accuracy and speed and developed a modular infrastructure that enables us to leverage the abundant work of the community by aggregating several simulation "modules". We took advantage of modern dynamic binary translation technology present in fast emulators and virtual machine monitors to reach simulation speeds that can cover the execution of the full software stack of complex multi-tier applications. In a world of commodity components, system-level knowledge and tools are going to be fundamental differentiation for a system company like HP. In the research realm, these tools advance our understanding of the parallel, scale-up and scale-out systems that will dominate information technology in the next decade

From 2004 to 2008, I led an HP Labs team (in Barcelona, Spain) to develop the COTSon simulation platform, in collaboration with AMD. COTSon models large-scale computing systems (of 100s of multi-core processors) and their interconnection network. It was released in the Open Source in Jan 2010, available at http://cotson.sourceforge.net. The tool uses novel technology combining fast emulation/virtualization concepts with traditional architectural simulation. It has been adopted by several research groups, such as the Teraflux European project (http://www.teraflux.eu).

LOW-POWER SERVERS

In 2008 I again returned to my origin on SoC (system-on-chip) research. While my previous work led to the adoption of high-performance cores (like VLIW architectures) in embedded processors, I started looking at the opposite trends: using devices inspired by embedded and mobile markets for ultra-energy efficient scalable servers.

From 2008 to 2014, I worked on research and advanced development of SoC integration technologies dedicated to low energy servers and scale-out architectures. This was a core contribution that inspired and led to HP’s “Moonshot Project” launch of high-density low-power servers in Nov 2011 and product launch in Apr 2013. The Moonshot project pioneered the use of customized processors and accelerators in servers, and was first in industry to adopt a diverse portfolio of computing engines, including x86, ARM processors, FPGAs and DSPs. The Moonshot line eventually evolved into HPE’s Edgeline products, which currently represents HPE’s portfolio to address the converged IT/OT edge, including AI/ML accelerators.

EXASCALE COMPUTING

Since 2016, I have been the principal investigator of the Exascale Computing Program “PathForward” Project at HPE, developing technology and research to advance Exascale computing system (https://bit.ly/ecp-pf-hpe). The HPE Path Forward project accelerates technologies to improve application performance and developer productivity, while maximizing energy efficiency and reliability of an Exascale system. It improves the competitiveness of Exascale system proposals, where application performance figures of merit will be the most important criteria, and not theoretical peak. It identifies the most promising technology options that are expected to reach maturity in time (2021-2023) for the design of the next generation Exascale systems. The renewed interest in HPC at HPE led to the Cray acquisition in 2018, which I was a key contributor to. HPE+Cray is now the undisputed leader in high-performance computing, with all three US DoE Exascale machine awarded to HPE.

ARTIFICIAL INTELLIGENGE

Since late 2019, I took a new role to lead the "AI Lab" group in Hewlett Packard Labs. The charter of the group is to advance technology and solutions for machine learning, advance research AI for science, develop AI data foundations, and explore novel AI/ML acceleration techniques.