ProperCAD Environment for Parallel VLSI CAD Applications (P. Banerjee)

Next: Funding Up: RESEARCH ACTIVITIES Previous: Funding

ProperCAD Environment for Parallel VLSI CAD Applications (P. Banerjee)

As VLSI circuits become more complex, the computational requirements for performing various CAD tasks increase almost exponentially. It has become clear that new multiprocessor computers are required to solve the problems ahead.
The objectives of this research are to develop efficient parallel algorithms for VLSI CAD tasks that can utilize the computing power of a wide range of parallel platforms that are becoming available to the community, with the eventual goal of reducing the design turnaround time of complex chips of the future.
In this research we are investigating parallel algorithms for placement, routing, layout verification, logic synthesis, test generation, and fault simulation, and behavioral simulation. The parallel algorithms are designed such that they are portable across a range of parallel machines, including multiprocessor workstations, shared memory multiprocessors, message passing multiprocessors, and networks of workstations.
We are achieving portability of our parallel CAD algorithms by developing our applications on top of the ProperCAD library and the Message-Passing Interface.
In this research we are investigating scalable, data partitioned parallel algorithms for placement, routing, layout verification, logic synthesis, test generation, and fault simulation, and behavioral simulation.
Specifically, we have already developed the following packages. Work is ongoing in many of them: (1) ProperEXT: VLSI circuit extraction for flattened layouts, (2) ProperDRC: Design rule checking for flattened layouts, (3) ProperPLACE: Standard cell placement using simulated annealing, (4) ProperROUTE: Global and detailed routing, (5) ProperSYN: Logic synthesis using transduction method, (6) ProperMIS: Logic synthesis using the MIS method, (7) ProperSIS: Sequential logic synthesis, (8) ProperSTATE: State assignment of finite-state machines based on MUSTANG, (9) ProperJEDI: State assignment of finite-state machines based on JEDI, (10) ProperHLS: High level synthesis, (11) ProperTEST: Test generation for sequential circuits based on the TGEN algorithm, (12) ProperHITEC: Test generation using the HITEC algorithm, (13) ProperGATEST: Test generation for sequential circuits based on a genetic algorithm, (14) ProperPROOFS: Fault simulaton using PROOFS algorithm, (15) ProperVHDL: Behavioral simulation.
We are developing our parallel applications on distributed memory multicomputers such as the IBM SP-2 and the Intel Paragon, shared memory multiprocessors such as the SUN SPARCserver 1000, the IBM J40, and the SGI Origin 2000, and networks of SUN and HP workstations.
More information on the ProperCAD project can be obtained from the following URL on the World Wide Web: http:/www.ece.nwu.edu/cpdc/ProperCAD/pcad.html.
In the past year, we have obtained the following significant results.
Accomplishment 1
We have developed various parallel algorithms for global routing of standard cell designs based on the Timberwolf 6.0 global router. We have developed three different approaches for parallelizing the global routing problem. The first approach partitions the nets across the processors, and each processor performs a global routing in parallel. The second approach partitions the chip area among the processors by rows, and each processor performs routing of all the nets belong to its region. A third approach uses a hybrid approach where part of the algorithm is performed using net decomposition, and part of the algorithm is performed using an area decomposition. We have experimentally evaluated the performance of all three parallel algorithms. The hybrid algorithm has been found to obtain the best speedups (about 6.5 on 8 processors on a SUN SparcCenter 1000) and has minimized the quality degradation of the routing to less than 2% of the serial algorithm for various benchmark circuits.
Accomplishment 2
Three different algorithms for the algebraic factorization procedures in combinational logic synthesis within the MIS system have been developed. The first algorithm uses circuit replication and uses a divide-and-conquer strategy to follow the same search path as the sequential algorithm. A second algorithm uses totally independent factorization on different circuit partitions, A third algorithm uses a novel L-shaped partitioning strategy which allows for some interaction among the kernels in various partitions. All the algorithms have been implemented on a SUN SPARCSERVER 1000. For a large circuit having 14,000 literals, the third algorithm runs 11.5 times faster than the sequential algorithm with less than 0.2% degradation in the quality of the results.
Accomplishment 3
Three new parallel test generation algorithms for sequential circuit test generation based on a genetic algorithm called GATEST have been developed. The first algorithm called ProperGATEST1 performs parallelization using data decomposition by partitioning the populations in the genetic algorithm across the processors, and obtains speedups of about 6.8 on 8 processors of a SPARCSERVER-1000. These results have been reported without any degradation in the quality of the solutions from the sequential algorithm. The second algorithm, ProperGATEST2, uses a parallel search strategy where each processor executes the sequential genetic algorithm with a different seed, and uses migration to share information between processors. Speedups of about 5.3 on 8 processors have been obtained on a SUN SPARCSERVER 1000 with qualities that are comparable to the serial algorithm. The third algorithm, ProperGATEST3, is a subpopulation based version of ProperGATEST2, where subpopulations are distributed across processors and information is migrated from one processor to another. Speedups of about 7.2 on 8 processors have been obtained on a SUN SPARCSERVER 1000 with qualities that are comparable to the serial algorithm. These algorithms will enable the generation of tests for very complex circuits of the future.
Accomplishment 4
We have also developed various parallel algorithms for fault simulation. While previous approaches to parallel fault simulation have used circuit parallelism or fault parallelism approaches, in the past year, we have developed scalable parallel test-set partitioned algorithms for fault simulation in a series of implementations called SPITFIRE. The basic idea in this approach is to partition the test sets among the processors so that each processor performs fault simulation on its own set of inputs but on the entire circuit and on the entire list of faults. While this approach is easily applicable to combinational circuits, the approach is not directly applicable in sequential circuits where there are state dependencies across time frames, i.e. the state of the circuit at the present time frame may depend on the state of the circuit in all previous time frames. We proposed a technique of allowing for some overlaps of test vectors among the different test partitions in the various processors. By experimenting with the degree of overlap, it was possible to control the quality of the results (fault coverage) and the speedups obtained. We developed six variants of this algorithm and one of those actually combined fault parallelism and test set parallelism very effectively. The most efficient algorithms produced average speedups of about 6.5 on 8 processors of a SPARCServer 1000 multiprocessor on several large sequential benchmarks.
Accomplishment 5
We have developed an efficient compiled event-driven simulation algorithm for VHDL simulations. Two approaches to parallelization on shared memory multiprocessors were developed. The first one was based on fine grained task scheduling where each task corresponded to a straightline sequence of VHDL code without any wait statements. The second appproach was based on a coarse grained partitioning of the program by identifying fan-in cones of the circuits being simulated. Both approaches were evaluated on a set of benchmark examples. Speedups of about 3 to 4 were meaasured on 8 processors of a SUN SPARCServer 1000 multiprocessor.

Funding

Next: Funding Up: RESEARCH ACTIVITIES Previous: Funding
CPDC Webmasters
Wed Dec 10 16:19:42 CST 1997