Home

GPU acceleration for Static Timing

Analysis

Final project for CS240A

(Earlier Compiled by Vinay Shivakumar for HW1)

This web page provides a high level overview of an emerging application of GPU computing to the field of integrated circuit design. A lot of EDA (Electronic Design Automation) analysis problems can be formulated as SPARSE MATRIX PROBLEMS. Though lot of research is underway to exploit the power of the GPU for EDA , no commercial tool is available at the moment which implements this technique. A few research papers have been published in this regard and the results seem to be very encouraging.

We plan to do this as our final project for CS240A

What is Static Timing Analysis ?

Static Timing analysis is a technique used in digital circuit design to analyse if the circuit will satisfy timing constraints. Timing closure refers to a set of optimization techniques which modify the circuit layout to meet the defined performance goals.

Below is a picture of a portion of a digital circuit.

UFF1 , UFF2 and UFF3 are flip flops which are the sequential elements. These flip flops help to store a circuit state. The input at D is sampled and stored whenever the signal at the input CLK rises from logic 0 to logic 1(positive edge triggered). The remaining gates forms the combinational logic which implement a certain logic function. The aim of static timing analysis is to verify that the input arrives at the D input before the clock CLK rises (setup time constraint) and that it stays steady for a certain duration (hold constraint). As observed from the diagram above , there could be several paths ending at a flip flop.

Why is it important to speed up timing analysis ?

Due to shrinking transistor sizes, the size of integrated circuits keep increasing. Also due to variations and other higer order effects , it becomes necessary to verify the timing at more and more "timing corners" (Process (fast slow) , Voltage , temperature , wire parasitics). Hence several timing analysis runs need to be done to make sure that all possible combinations of these are covered.

Moreover , since it is hard to accurately predict the result of a specific optimization step , timing closure is an iterative process , which involves running timing analysis several times after each optimization step. It is one of the most frequently performed task in physical design of integrated circuits.

It will be shown later on that by intelligent problem formulation , a speedup of upto 50x can be obtained by using a GPU for STA.

Sparse Matrix based STA

One algorithm for timing analysis which has been proposed is the sparse matrix based STA. In this method , the circuit timing is represented as a matrix operation as shown below. A is a 2 dimensional matrix , each element being 1 if the path involves the gate. This other matrix dpath is a one dimensional matrix comprising of the gate delays. The path delay matrix is equal to the product of these 2 matrices.

The path with maximum delay can be obtained as the max(dpath)

The matrix A is a sparse matrix , which means most of the elements in the matrix are same constant (0 in this case).

Sparse Matrix Vector Multiplication (SMVP)

Sparse matrix vector multiplication has been a hard problem to parallelise. There is a lot of existing work in this regard. In a sparse matrix , direct multiplication is inefficient as most of the operations result in a zero. So sparse matrices are usually represented by a pair of matrices , one specifying a column number and the other the element specifying the value.

There has been a lot of prior work for implementing sparse matrix multiplication on the GPU. By reordering the operations as described below good speed up has been reported (Yangdong Steve Deng's presentation)

Results

The ablove described algorithm has been implemented on a NVIDIA GTX 280 GPU containing 240 CUDA cores operating with a graphics clock of 602Mhz Below are the results published (Yangdong Steve Deng')

1Million -gate ASIC: 60 min@CPU vs. 1 min@GPU !!