HTTPS://WWW.NEXTPLATFORM.COM/2018/10/25/BROADER-REDUCED-PRECISION-HPC-ON-HORIZON/
BROADER REDUCED PRECISION HPC ON HORIZON
The rise in interest in deep learning chips for training and inference has reignited interest in how reduced precision compute can cut down on energy, bandwidth, and other constraints inherent to double-precision.
In some areas like high performance computing, however, where double-precision is the standard for nearly all applications, making the shift from double to single (and lower) has been the subject of debate due to performance and accuracy concerns. In addition the standard cadre of HPC oriented processors, there are hardware options to slice precision in half when applications allow. We have already spent quite a bit of time on this topic as it relates to deep learning-oriented HPC applications, prototype systems (also including this one) that offer opportunities to explore mixed precision, forthcoming architectures that provide new precision opportunities from Intel and others.
With the application performance issue in mind, a team from Tokyo Tech and RIKEN shot holes in the idea that most HPC codes require double precision in an extensive benchmarking effort that surveys across different types of supercomputing codes with differing requirements. And for that matter, the team also reiterates that the major metrics used for gauging performance and efficiency of HPC applications is all double-precision floating point-based, which might not be the most reliable metric for real efficiencies going forward,
The full report compares two architectures with quite different floating point emphasis, Intel’s Knights Landing and the newer Knights Mill chips, the latter of which provides the ability to reduce precision and therefore offer the basis for comparison on real world HPC applications. In the series of applications tested, the team found that it is quite possible to reduce precision from double to single without significant performance loss.
“Lower precision units occupy less area (up to 3X going from double to single precision fused-multiply-accumulate) leading to more on-chip resources (more instruction-level parallelism), potentially lowered energy consumption, and a definitive decrease in eternal memory bandwidth pressure (i.e., more values per unit of bandwidth). The gains—up to four times over their double precision variants with little loss in accuracy—are attractive and clear.”
Ultimately, this work means that there might be greater demand for mixed precision capabilities on future HPC oriented processors, something that most of the industry is already working toward. With current Volta generation GPUs providing this potential as accelerators, Knights Mill, and Fujitu’s ARM-based processors, among others, the real footwork will have to be done by centers as they re-evaluate their codes and how reduced precision might reduce the impact of Moore’s Law declines.
Relative floating-point performance (FP32 and FP64 Gflop/s accumulated) of KNL/KNM in comparison to dual-socket Broadwell-EP (see KEYrel, left y-axis) and Absolute achieved Gflop/s w.r.t dominant FP operations (cf. Fig. 1) in comparison to theoretical peak performance listed in Tab.
“Given that these applications are presumably optimized, and still achieve this low FP efficiency, implies a limited relevance of FP unit’s availability. The figure shows that the majority of codes have comparable performance on KNM versus KNL. Notable mentions are: a) CANDLE which benefits from VNNI units in mixed precision, b) MiFE, NekB, and XSBn which improve probably due to increased core count and KNM’s higher CPU frequency, and c) some memory-bound applications (i.e., AMG, HPCG, and MTri) which get slower supposedly due to the difference in peak throughput in addition to the increased core count causing higher competition for bandwidth.”
The authors say the study points toward a growing need to re-iterate and re-think architecture design decisions in high-performance computing, especially with respect to precision. “Do we really need the amount of double-precision compute that modern processors offer? Our results on the Intel Xeon Phi twins points towards a ’No’, and we hope that this work inspires other researchers to also challenge the floating-point to silicon distribution for the available and future general-purpose processors, graphical processors, or accelerators in HPC systems.”
The full results, benchmark methodology, and other details can be found here.
AUTHOR: CADE METFZ.
ORIGINAL STORY FROM WIRED: HTTPS://WWW.WIRED.COM/2016/10/AI-CHANGING-MARKET-COMPUTER-CHIPS/
CADE METZ BUSINESS
DATE OF PUBLICATION: 10.28.16.10.28.16 TIME OF PUBLICATION: 7:00 AM.7:00 AM
CLAYTON COTTERELL FOR WIRED
IN LESS THAN 12 hours, three different people offered to pay me if I’d spend an hour talking to a stranger on the phone.
All three said they’d enjoyed reading an article I’d written about Google building a new computer chip for artificial intelligence, and all three urged me to discuss the story with one of their clients. Each described this client as the manager of a major hedge fund, but wouldn’t say who it was.
The requests came from what are called expert networks—research firms that connect investors with people who can help them understand particular markets and provide a competitive edge (sometimes, it seems, through insider information). These expert networks wanted me to explain how Google’s AI processor would affect the chip market. But first, they wanted me to sign a non-disclosure agreement. I declined.
These unsolicited, extremely specific, high-pressure requests—which arrived about three week ago—underscore the radical changes underway in the enormously lucrative computer chip market, changes driven by the rise of artificial intelligence. Those hedge fund managers see these changes coming, but aren’t quite sure how they’ll play out.
Of course, no one is quite sure how they’ll play out.
Today, Internet giants like Google, Facebook, Microsoft, Amazon, and China’s Baidu are exploring a wide range of chip technologies that can drive AI forward, and the choices they make will shift the fortunes of chipmakers like Intel and nVidia. But at this point, even the computer scientists within those online giants don’t know what the future holds.
These companies run their online services from data centers packed with thousands of servers, each driven by a chip called a central processing unit, or CPU. But as they embrace a form of AI called deep neural networks, these companies are supplementing CPUs with other processors. Neural networks can learn tasks by analyzing vast amounts of data, including everything from identifing faces and objects in photos to translating between languages, and they require more than just CPU power.
And so Google built the Tensor Processing Unit, or TPU. Microsoft is using a processor called a field programmable gate array, or FPGA. Myriad companies employ machines equipped with vast numbers of graphics processing units, or GPUs. And they’re all looking at a new breed of chip that could accelerate AI from inside smartphones and other devices.
Any choice these companies make matters, because their online operations are so vast. They buy and operate far more computer hardware than anyone else on Earth, a gap that will only widen with the continued importance of cloud computing. If Google chooses one processor over another, it can fundamentally shift the chip industry.
The TPU poses a threat to companies like Intel and nVidia because Google makes this chip itself. But GPUs also play an enormous role within Google and its ilk, and nVidia is the primary manufacturer of these specialized chips. Meanwhile, Intel has inserted itself into the mix by acquiring Altera, the company that sells all those FPGAs to Microsoft. At $16.7 billion, it was Intel’s largest acquisition ever, which underscores just how much the chip market is changing.
But sorting all this out is difficult—in part because neutral networks operate in two stages. The first is the training stage, where a company like Google trains the neural network to perform a given task, like recognizing faces in photos or translating from one language to another. The second is the execution stage, where people like you and me actually use the neural net—where we, say, post a photo of our high school reunion to Facebook and it automatically tags everyone in it. These two stages are quite different, and each requires a different style of processing.
Today, GPUs are the best option for training. Chipmakers designed GPUs to render images for games and other highly graphical applications, but in recent years, companies like Google discovered these chips can also provide an energy-efficient means of juggling the mind-boggling array of calculations required to train a neural network. This means they can train more neural nets with less hardware. Microsoft AI researcher XD Huang calls GPUs “the real weapon.” Recently, his team completed a system that can recognize certain conversational speech as well as humans, and it took them about a year. Without GPUs, he says, it would have taken five. After Microsoft published a research paper on this system, he opened a bottle of champagne at the home of Jen-Hsun Huang, the CEO of nVidia.
But companies also need chips that can rapidly execute neural networks, a process called inference. Google built the TPU specifically for this. Microsoft uses FPGAs. And Baidu is using GPUs, which aren’t as well suited to inference as they are to training, but can do the job with the right software in place.
At the same time, others are building chips to help execute neural networks on smartphones and other devices. IBM is building such a chip, though some wonder how effective it might be. And Intel has agreed to acquire Movidius, a company that is already pushing chips into devices.
Intel understands that the market is changing. Four years ago, the chip maker told us it sells more server processors to Google than it sells to all but four other companies—so it sees firsthand how Google and its ilk can shift the chip market. As a result, it’s now placing bets everywhere. Beyond snapping up Altera and Movidius, it has agreed to buy a third AI chip company called Nervana.
That makes sense, because the market is only starting to develop. “We’re now at the precipice of the next big wave of growth,” Intel vice president Jason Waxman recently told me, “and that’s going to be driven by artificial intelligence.” The question is where the wave will take us.